Part V: CobraHelpParser -- Parsing Go CLI Help Output
One parser handles Docker, Docker Compose, Podman, and any cobra-based CLI -- 5 different binaries, same
IHelpParser.
The Realization
Docker is written in Go. Docker Compose is written in Go. Podman is written in Go. The GitLab CLI (glab) is written in Go. And they all use the same CLI framework: cobra.
That means their --help output follows a predictable structure -- predictable enough to parse with a single IHelpParser implementation.
I did not plan this. I started by writing a parser for Docker's help text. Then I pointed it at Docker Compose, and it worked. Then I pointed it at Podman -- same thing. Then glab. At that point I renamed the class from DockerHelpParser to CobraHelpParser and accepted the gift that the Go ecosystem had accidentally given me.
This post is about that parser: the state machine, the flag anatomy, the type mapping from Go to C#, and the edge cases that 97 scraped versions revealed. If you have not read the BinaryWrapper post yet, go there first -- it defines the IHelpParser interface and the three-phase pipeline that this parser plugs into.
The IHelpParser Interface
The contract is simple. I covered it briefly in the BinaryWrapper post, but here it is again because it is the only thing CobraHelpParser needs to satisfy:
public interface IHelpParser
{
string Name { get; }
CommandNode ParseHelp(string helpText, string commandPath);
bool CanParse(string helpText);
}public interface IHelpParser
{
string Name { get; }
CommandNode ParseHelp(string helpText, string commandPath);
bool CanParse(string helpText);
}Three members. Name is a human-readable identifier ("cobra" in our case). ParseHelp takes raw help text -- the full stdout of docker container run --help -- and the command path ("docker container run") and returns a CommandNode. CanParse is the auto-detection hook: given an unknown blob of help text, can this parser handle it?
The scraper calls CanParse first. If multiple parsers claim they can handle the text, the scraper uses the one registered first. In practice, CobraHelpParser is registered first for Docker and Compose because we know those are cobra binaries. The auto-detection matters more for the generic BinaryWrapper scenario where someone points the scraper at an unknown binary.
The CommandNode Model
The parser's output is a tree of CommandNode objects:
public record CommandNode(
string Name,
string? Description,
IReadOnlyList<CommandNode> SubCommands,
IReadOnlyList<CommandOption> Options);
public record CommandOption(
string LongName,
string? ShortName,
string? Description,
string? DefaultValue,
OptionValueKind ValueKind,
string ClrType,
bool IsRequired);
public enum OptionValueKind
{
Flag, // --detach (no value, boolean)
Single, // --name string (one value)
List, // --env list (repeatable)
}public record CommandNode(
string Name,
string? Description,
IReadOnlyList<CommandNode> SubCommands,
IReadOnlyList<CommandOption> Options);
public record CommandOption(
string LongName,
string? ShortName,
string? Description,
string? DefaultValue,
OptionValueKind ValueKind,
string ClrType,
bool IsRequired);
public enum OptionValueKind
{
Flag, // --detach (no value, boolean)
Single, // --name string (one value)
List, // --env list (repeatable)
}CommandNode is recursive: a node can have subcommands, and each subcommand is itself a CommandNode with its own options and possibly more subcommands. The scraper walks this tree by calling docker {subcommand} --help for each discovered subcommand, parsing each response, and stitching the results into a tree.
CommandOption is where the interesting work happens. Every flag in the help text gets parsed into one of these records. The parser must figure out the long name, optional short name, Go type (mapped to a CLR type), whether it takes a value or is a boolean flag, whether it has a default, and the description. That is a lot to extract from a line of plain text.
OptionValueKind drives code generation downstream. Flag options get bool properties and --flag/no-flag toggle behavior. Single options get a property of their CLR type. List options get List<T> properties and can appear multiple times on the command line.
Cobra Help Format Anatomy
Before I show the parser, you need to see what it is parsing. Here is a truncated docker container run --help from Docker 24.0.0:
Usage: docker container run [OPTIONS] IMAGE [COMMAND] [ARG...]
Aliases:
docker container run, docker run
Create and run a new container from an image
Options:
--add-host list Add a custom host-to-IP mapping
(host:ip)
-a, --attach list Attach to STDIN, STDOUT or STDERR
--blkio-weight uint16 Block IO (relative weight),
between 10 and 1000, or 0 to
disable (default 0)
--blkio-weight-device list Block IO weight (relative device
weight) (default [])
--cap-add list Add Linux capabilities
--cap-drop list Drop Linux capabilities
--cgroupns string Cgroup namespace to use
(host|private)
'host': Run the container in
the Docker host's
cgroup namespace
'private': Run the container in
its own private cgroup
namespace
'': Use the cgroup
namespace as
configured by the
default-cgroupns-mode
option on the daemon
(default)
-d, --detach Run container in background and
print container ID
-e, --env list Set environment variables
--env-file list Read in a file of environment
variables
-h, --hostname string Container host name
-i, --interactive Keep STDIN open even if not
attached
-m, --memory bytes Memory limit
--memory-swappiness int Tune container memory swappiness
(0 to 100) (default -1)
--name string Assign a name to the container
--network network Connect a container to a network
-p, --publish list Publish a container's port(s) to
the host
--pull string Pull image before running
("always", "missing", "never")
(default "missing")
--restart string Restart policy to apply when a
container exits (default "no")
--rm Automatically remove the container
when it exits
-t, --tty Allocate a pseudo-TTY
-v, --volume list Bind mount a volume
-w, --workdir string Working directory inside the
container
... (90+ flags total)Usage: docker container run [OPTIONS] IMAGE [COMMAND] [ARG...]
Aliases:
docker container run, docker run
Create and run a new container from an image
Options:
--add-host list Add a custom host-to-IP mapping
(host:ip)
-a, --attach list Attach to STDIN, STDOUT or STDERR
--blkio-weight uint16 Block IO (relative weight),
between 10 and 1000, or 0 to
disable (default 0)
--blkio-weight-device list Block IO weight (relative device
weight) (default [])
--cap-add list Add Linux capabilities
--cap-drop list Drop Linux capabilities
--cgroupns string Cgroup namespace to use
(host|private)
'host': Run the container in
the Docker host's
cgroup namespace
'private': Run the container in
its own private cgroup
namespace
'': Use the cgroup
namespace as
configured by the
default-cgroupns-mode
option on the daemon
(default)
-d, --detach Run container in background and
print container ID
-e, --env list Set environment variables
--env-file list Read in a file of environment
variables
-h, --hostname string Container host name
-i, --interactive Keep STDIN open even if not
attached
-m, --memory bytes Memory limit
--memory-swappiness int Tune container memory swappiness
(0 to 100) (default -1)
--name string Assign a name to the container
--network network Connect a container to a network
-p, --publish list Publish a container's port(s) to
the host
--pull string Pull image before running
("always", "missing", "never")
(default "missing")
--restart string Restart policy to apply when a
container exits (default "no")
--rm Automatically remove the container
when it exits
-t, --tty Allocate a pseudo-TTY
-v, --volume list Bind mount a volume
-w, --workdir string Working directory inside the
container
... (90+ flags total)That is 60+ flags for a single command, and I have truncated it. The full output has over 90 flags. Every line follows the same structure, with five identifiable sections.
Section 1: Usage Line
Usage: docker container run [OPTIONS] IMAGE [COMMAND] [ARG...]Usage: docker container run [OPTIONS] IMAGE [COMMAND] [ARG...]The Usage: line gives us the full command path (docker container run) and the argument syntax. The parser uses this to validate that the command path matches what the scraper expects. The [OPTIONS] placeholder tells us that flags will follow. IMAGE is a positional argument -- not a flag, not optional. [COMMAND] and [ARG...] are optional positional arguments.
I extract the command path from here, but I do not attempt to parse the argument syntax into a formal grammar. Positional arguments in Docker are too varied and too contextual to model generically. The generated API handles them as string[] trailing arguments.
Section 2: Aliases
Aliases:
docker container run, docker runAliases:
docker container run, docker runThis tells us that docker run is an alias for docker container run. The parser captures aliases because they affect how users invoke the command -- and because the scraper needs to avoid scraping the same command twice through different paths. If we already scraped docker container run, we skip docker run.
Section 3: Description
Create and run a new container from an imageCreate and run a new container from an imageFree-form text between the aliases (or usage line, if no aliases) and the first section header. This becomes the Description property on the CommandNode and eventually a /// <summary> XML doc comment on the generated C# class.
Section 4: Options
The bulk of the output. Every flag, its type, its description, and its default value. This is where the parser earns its keep, and I will dedicate an entire section to flag parsing below.
Section 5: Commands (for non-leaf commands)
For non-leaf commands like docker container --help, there is an Available Commands: section instead of (or in addition to) Options::
Available Commands:
attach Attach local standard input, output, and error streams
cp Copy files/folders between a container and the local filesystem
create Create a new container
exec Execute a command in a running container
inspect Display detailed information on one or more containers
kill Kill one or more running containers
logs Fetch the logs of a container
ls List containers
rm Remove one or more containers
run Create and run a new container from an image
start Start one or more stopped containers
stop Stop one or more running containers
... (24 commands total)Available Commands:
attach Attach local standard input, output, and error streams
cp Copy files/folders between a container and the local filesystem
create Create a new container
exec Execute a command in a running container
inspect Display detailed information on one or more containers
kill Kill one or more running containers
logs Fetch the logs of a container
ls List containers
rm Remove one or more containers
run Create and run a new container from an image
start Start one or more stopped containers
stop Stop one or more running containers
... (24 commands total)The parser extracts these subcommand names. The scraper then recursively calls docker container {subcommand} --help for each one, parses the result, and attaches it as a child CommandNode.
The Parsing State Machine
CobraHelpParser is a line-by-line state machine. It reads the help text one line at a time, transitions between states based on section headers, and accumulates data into temporary buffers that ultimately become CommandNode fields.
State Enum
private enum ParserState
{
Initial,
Usage,
Aliases,
Description,
Options,
GlobalOptions,
Commands,
AdditionalHelp,
}private enum ParserState
{
Initial,
Usage,
Aliases,
Description,
Options,
GlobalOptions,
Commands,
AdditionalHelp,
}Eight states. The parser starts in Initial and transitions forward as it encounters section headers. It never goes backward -- cobra's help output is always ordered the same way: Usage, Aliases, Description, Options/Flags, Global Flags, Available Commands, Additional Help Topics.
The State Machine Diagram
Section Detection
State transitions are driven by lines that end with : (sometimes with leading whitespace stripped). Here is the detection logic:
private static ParserState? DetectSectionHeader(string trimmedLine)
{
return trimmedLine switch
{
"Usage:" => ParserState.Usage,
"Aliases:" => ParserState.Aliases,
"Options:" or "Flags:" => ParserState.Options,
"Global Flags:" or "Global Options:" => ParserState.GlobalOptions,
"Available Commands:" or "Commands:" => ParserState.Commands,
"Additional help topics:" => ParserState.AdditionalHelp,
_ => null,
};
}private static ParserState? DetectSectionHeader(string trimmedLine)
{
return trimmedLine switch
{
"Usage:" => ParserState.Usage,
"Aliases:" => ParserState.Aliases,
"Options:" or "Flags:" => ParserState.Options,
"Global Flags:" or "Global Options:" => ParserState.GlobalOptions,
"Available Commands:" or "Commands:" => ParserState.Commands,
"Additional help topics:" => ParserState.AdditionalHelp,
_ => null,
};
}Pattern matching. Clean. The "Flags:" variant appears in some older cobra versions. "Global Options:" appears in a few custom cobra templates. The parser handles both.
Why not use string.EndsWith(":") and be more generic? Because that would also match lines inside multi-line descriptions that happen to end with a colon. A strict allowlist of known section headers is safer. If a future cobra version introduces a new section header, I add one line to this switch statement and reparse.
The Main Loop
public CommandNode ParseHelp(string helpText, string commandPath)
{
var state = ParserState.Initial;
var lines = helpText.Split('\n');
string? usageLine = null;
var aliases = new List<string>();
var descriptionLines = new List<string>();
var options = new List<CommandOption>();
var globalOptions = new List<CommandOption>();
var subcommands = new List<CommandNode>();
CommandOption? pendingOption = null;
for (var i = 0; i < lines.Length; i++)
{
var line = lines[i];
var trimmed = line.TrimEnd();
// Check for section header transition
var nextState = DetectSectionHeader(trimmed.TrimStart());
if (nextState.HasValue)
{
// Flush any pending multi-line option
FlushPendingOption(ref pendingOption, options, globalOptions, state);
state = nextState.Value;
continue;
}
switch (state)
{
case ParserState.Initial:
// Look for usage on same line as "Usage:"
if (trimmed.StartsWith("Usage:"))
{
usageLine = trimmed["Usage:".Length..].Trim();
state = ParserState.Usage;
}
break;
case ParserState.Usage:
if (string.IsNullOrWhiteSpace(trimmed))
state = ParserState.Description;
else
usageLine = trimmed.Trim();
break;
case ParserState.Aliases:
if (string.IsNullOrWhiteSpace(trimmed))
state = ParserState.Description;
else
aliases.AddRange(
trimmed.Split(',')
.Select(a => a.Trim())
.Where(a => a.Length > 0));
break;
case ParserState.Description:
if (!string.IsNullOrWhiteSpace(trimmed))
descriptionLines.Add(trimmed.Trim());
break;
case ParserState.Options:
case ParserState.GlobalOptions:
ParseOptionLine(
trimmed,
ref pendingOption,
state == ParserState.GlobalOptions
? globalOptions
: options);
break;
case ParserState.Commands:
ParseCommandLine(trimmed, subcommands);
break;
case ParserState.AdditionalHelp:
// Ignored -- these are hints like
// "Run 'docker COMMAND --help' for more"
break;
}
}
// Flush final pending option
FlushPendingOption(ref pendingOption, options, globalOptions, state);
var name = ExtractCommandName(commandPath);
var description = string.Join(" ", descriptionLines);
// Merge global options into options list
options.AddRange(globalOptions);
return new CommandNode(name, description, subcommands, options);
}public CommandNode ParseHelp(string helpText, string commandPath)
{
var state = ParserState.Initial;
var lines = helpText.Split('\n');
string? usageLine = null;
var aliases = new List<string>();
var descriptionLines = new List<string>();
var options = new List<CommandOption>();
var globalOptions = new List<CommandOption>();
var subcommands = new List<CommandNode>();
CommandOption? pendingOption = null;
for (var i = 0; i < lines.Length; i++)
{
var line = lines[i];
var trimmed = line.TrimEnd();
// Check for section header transition
var nextState = DetectSectionHeader(trimmed.TrimStart());
if (nextState.HasValue)
{
// Flush any pending multi-line option
FlushPendingOption(ref pendingOption, options, globalOptions, state);
state = nextState.Value;
continue;
}
switch (state)
{
case ParserState.Initial:
// Look for usage on same line as "Usage:"
if (trimmed.StartsWith("Usage:"))
{
usageLine = trimmed["Usage:".Length..].Trim();
state = ParserState.Usage;
}
break;
case ParserState.Usage:
if (string.IsNullOrWhiteSpace(trimmed))
state = ParserState.Description;
else
usageLine = trimmed.Trim();
break;
case ParserState.Aliases:
if (string.IsNullOrWhiteSpace(trimmed))
state = ParserState.Description;
else
aliases.AddRange(
trimmed.Split(',')
.Select(a => a.Trim())
.Where(a => a.Length > 0));
break;
case ParserState.Description:
if (!string.IsNullOrWhiteSpace(trimmed))
descriptionLines.Add(trimmed.Trim());
break;
case ParserState.Options:
case ParserState.GlobalOptions:
ParseOptionLine(
trimmed,
ref pendingOption,
state == ParserState.GlobalOptions
? globalOptions
: options);
break;
case ParserState.Commands:
ParseCommandLine(trimmed, subcommands);
break;
case ParserState.AdditionalHelp:
// Ignored -- these are hints like
// "Run 'docker COMMAND --help' for more"
break;
}
}
// Flush final pending option
FlushPendingOption(ref pendingOption, options, globalOptions, state);
var name = ExtractCommandName(commandPath);
var description = string.Join(" ", descriptionLines);
// Merge global options into options list
options.AddRange(globalOptions);
return new CommandNode(name, description, subcommands, options);
}A few things to notice:
pendingOption: Multi-line descriptions require buffering. When we parse a flag line, we do not immediately emit it -- we hold it inpendingOption. If the next line is a continuation (heavily indented, no--prefix), we append to the pending option's description. If the next line starts a new flag, we flush the pending option and start a new one.Global options merge: Cobra separates
Options:(command-specific) fromGlobal Flags:(inherited from parent commands). I merge them into one list because the generated C# API needs all flags on the command, regardless of where cobra categorizes them. The code generator can separate them later if needed -- the JSON output preserves theisGlobalflag.Alias handling: Aliases are comma-separated on a single line.
docker container run, docker runbecomes two entries. The scraper uses this to build a deduplication set.Description accumulation: The description can span multiple lines between the aliases section and the options section. I join them with spaces.
Command Line Parsing
Commands are simpler than flags. Each line in the Available Commands: section follows this pattern:
commandname Description text here commandname Description text hereTwo or more leading spaces, the command name, a gap of two or more spaces, then the description. The parsing logic:
private static void ParseCommandLine(
string line,
List<CommandNode> subcommands)
{
if (string.IsNullOrWhiteSpace(line))
return;
var trimmed = line.TrimStart();
if (trimmed.Length == 0)
return;
// Find the gap between command name and description
// Command names don't contain spaces; the gap is 2+ spaces
var match = Regex.Match(trimmed, @"^(\S+)\s{2,}(.+)$");
if (!match.Success)
return;
var name = match.Groups[1].Value;
var description = match.Groups[2].Value.Trim();
subcommands.Add(new CommandNode(
name,
description,
SubCommands: Array.Empty<CommandNode>(),
Options: Array.Empty<CommandOption>()));
}private static void ParseCommandLine(
string line,
List<CommandNode> subcommands)
{
if (string.IsNullOrWhiteSpace(line))
return;
var trimmed = line.TrimStart();
if (trimmed.Length == 0)
return;
// Find the gap between command name and description
// Command names don't contain spaces; the gap is 2+ spaces
var match = Regex.Match(trimmed, @"^(\S+)\s{2,}(.+)$");
if (!match.Success)
return;
var name = match.Groups[1].Value;
var description = match.Groups[2].Value.Trim();
subcommands.Add(new CommandNode(
name,
description,
SubCommands: Array.Empty<CommandNode>(),
Options: Array.Empty<CommandOption>()));
}The regex is simple: one or more non-whitespace characters (the command name), two or more whitespace characters (the gap), then the rest (the description). Command names never contain spaces in cobra -- they are single tokens like run, build, ls, network.
The resulting CommandNode has empty subcommands and options. Those get filled in when the scraper recursively invokes --help on each discovered subcommand.
Flag Line Parsing: The Hard Part
This is where CobraHelpParser earns its complexity budget. A flag line in cobra help output packs five pieces of information into a single line of plain text, using alignment and whitespace as the only delimiters.
Flag Line Anatomy
-d, --detach Run container in background
^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^
| | |
| Long name Description
Short name (optional)
--blkio-weight uint16 Block IO weight (default 0)
^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^^^^
| | |
Type hint Description Default value -d, --detach Run container in background
^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^
| | |
| Long name Description
Short name (optional)
--blkio-weight uint16 Block IO weight (default 0)
^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^^^^
| | |
Type hint Description Default valueA flag line has:
- Optional short name:
-d,at the start. A single dash, a single letter, a comma, a space. Not all flags have short names. - Long name:
--detachor--blkio-weight. Double dash, then a name that can contain hyphens. Always present. - Optional type hint:
string,int,uint16,list,stringArray,duration, etc. If absent, the flag is a boolean toggle. - Description: Free text. Can be very long and span multiple continuation lines.
- Optional default value: In parentheses at the end of the description:
(default 0)or(default "missing")or(default []).
The Flag Parser
private static readonly Regex FlagLineRegex = new(
@"^\s+" + // leading whitespace
@"(?:(-\w),\s+)?" + // optional short name: -d,
@"(--[\w][\w-]*)" + // long name: --detach
@"(?:\s+(\S+))?" + // optional type: string, int, list
@"(?:\s{2,}(.+))?$", // description (after 2+ space gap)
RegexOptions.Compiled);
private static readonly Regex DefaultValueRegex = new(
@"\(default\s+(.+?)\)\s*$",
RegexOptions.Compiled);private static readonly Regex FlagLineRegex = new(
@"^\s+" + // leading whitespace
@"(?:(-\w),\s+)?" + // optional short name: -d,
@"(--[\w][\w-]*)" + // long name: --detach
@"(?:\s+(\S+))?" + // optional type: string, int, list
@"(?:\s{2,}(.+))?$", // description (after 2+ space gap)
RegexOptions.Compiled);
private static readonly Regex DefaultValueRegex = new(
@"\(default\s+(.+?)\)\s*$",
RegexOptions.Compiled);Two regexes. The first handles the structural components of a flag line. The second extracts the default value from the description.
The parsing itself:
private void ParseOptionLine(
string line,
ref CommandOption? pendingOption,
List<CommandOption> target)
{
// Is this a continuation line?
// Continuation lines are indented but don't start with -
if (pendingOption != null && IsContinuationLine(line))
{
pendingOption = pendingOption with
{
Description = pendingOption.Description + " "
+ line.Trim(),
};
return;
}
// Flush previous option
if (pendingOption != null)
{
target.Add(FinalizeOption(pendingOption));
pendingOption = null;
}
// Try to parse as a new flag line
var match = FlagLineRegex.Match(line);
if (!match.Success)
return;
var shortName = match.Groups[1].Success
? match.Groups[1].Value
: null;
var longName = match.Groups[2].Value;
var goType = match.Groups[3].Success
? match.Groups[3].Value
: null;
var description = match.Groups[4].Success
? match.Groups[4].Value.Trim()
: null;
pendingOption = new CommandOption(
LongName: longName,
ShortName: shortName,
Description: description,
DefaultValue: null, // extracted during finalize
ValueKind: MapValueKind(goType),
ClrType: MapClrType(goType),
IsRequired: false); // cobra doesn't mark required in help
}
private static bool IsContinuationLine(string line)
{
if (string.IsNullOrWhiteSpace(line))
return false;
// Continuation lines have heavy indentation (30+ spaces)
// and do NOT start with -- or -X,
var trimmed = line.TrimStart();
var indent = line.Length - trimmed.Length;
return indent >= 30
&& !trimmed.StartsWith("--")
&& !Regex.IsMatch(trimmed, @"^-\w,");
}
private static CommandOption FinalizeOption(CommandOption option)
{
if (option.Description == null)
return option;
// Extract default value from description
var defaultMatch = DefaultValueRegex.Match(option.Description);
if (!defaultMatch.Success)
return option;
var defaultValue = defaultMatch.Groups[1].Value.Trim('"');
var cleanDescription = option.Description[..defaultMatch.Index].TrimEnd();
return option with
{
DefaultValue = defaultValue,
Description = cleanDescription,
};
}private void ParseOptionLine(
string line,
ref CommandOption? pendingOption,
List<CommandOption> target)
{
// Is this a continuation line?
// Continuation lines are indented but don't start with -
if (pendingOption != null && IsContinuationLine(line))
{
pendingOption = pendingOption with
{
Description = pendingOption.Description + " "
+ line.Trim(),
};
return;
}
// Flush previous option
if (pendingOption != null)
{
target.Add(FinalizeOption(pendingOption));
pendingOption = null;
}
// Try to parse as a new flag line
var match = FlagLineRegex.Match(line);
if (!match.Success)
return;
var shortName = match.Groups[1].Success
? match.Groups[1].Value
: null;
var longName = match.Groups[2].Value;
var goType = match.Groups[3].Success
? match.Groups[3].Value
: null;
var description = match.Groups[4].Success
? match.Groups[4].Value.Trim()
: null;
pendingOption = new CommandOption(
LongName: longName,
ShortName: shortName,
Description: description,
DefaultValue: null, // extracted during finalize
ValueKind: MapValueKind(goType),
ClrType: MapClrType(goType),
IsRequired: false); // cobra doesn't mark required in help
}
private static bool IsContinuationLine(string line)
{
if (string.IsNullOrWhiteSpace(line))
return false;
// Continuation lines have heavy indentation (30+ spaces)
// and do NOT start with -- or -X,
var trimmed = line.TrimStart();
var indent = line.Length - trimmed.Length;
return indent >= 30
&& !trimmed.StartsWith("--")
&& !Regex.IsMatch(trimmed, @"^-\w,");
}
private static CommandOption FinalizeOption(CommandOption option)
{
if (option.Description == null)
return option;
// Extract default value from description
var defaultMatch = DefaultValueRegex.Match(option.Description);
if (!defaultMatch.Success)
return option;
var defaultValue = defaultMatch.Groups[1].Value.Trim('"');
var cleanDescription = option.Description[..defaultMatch.Index].TrimEnd();
return option with
{
DefaultValue = defaultValue,
Description = cleanDescription,
};
}The key insight is the continuation line detection. Cobra aligns descriptions to a column (usually around column 39). When a description is too long, it wraps to the next line at the same column. That means continuation lines have 30+ spaces of leading indentation and do not start with -- or -X,. The parser detects this and appends to the pending option instead of starting a new one.
This handles the --cgroupns case from earlier, where the description spans four lines describing each possible value.
Multi-Line Description Example
Consider this flag from the real output:
--cgroupns string Cgroup namespace to use
(host|private)
'host': Run the container in
the Docker host's
cgroup namespace
'private': Run the container in
its own private cgroup
namespace
'': Use the cgroup
namespace as
configured by the
default-cgroupns-mode
option on the daemon
(default) --cgroupns string Cgroup namespace to use
(host|private)
'host': Run the container in
the Docker host's
cgroup namespace
'private': Run the container in
its own private cgroup
namespace
'': Use the cgroup
namespace as
configured by the
default-cgroupns-mode
option on the daemon
(default)The first line matches FlagLineRegex: long name --cgroupns, type string, description starts with "Cgroup namespace to use". The next 13 lines are all continuation lines -- they have 30+ spaces of indentation and do not start with --. Each one gets appended to the description.
The final description is one long string: "Cgroup namespace to use (host|private) 'host': Run the container in the Docker host's cgroup namespace 'private': Run the container in its own private cgroup namespace '': Use the cgroup namespace as configured by the default-cgroupns-mode option on the daemon".
The last (default) on the final line does match DefaultValueRegex -- but the captured value is an empty string, which is correct: the default is the empty string, meaning "use the daemon's configured namespace mode."
Go Type to C# Type Mapping
Cobra attaches Go type hints to flag definitions. These appear in the help output after the flag name. The parser maps them to CLR types:
private static string MapClrType(string? goType) => goType switch
{
null or "" => "bool",
"string" => "string",
"int" => "int",
"int64" => "long",
"uint" or "uint64" => "ulong",
"uint16" => "ushort",
"uint32" => "uint",
"float64" => "double",
"duration" => "string",
"list" => "List<string>",
"stringArray" => "List<string>",
"strings" => "List<string>",
"stringToString" => "Dictionary<string,string>",
"ulimit" => "string",
"bytes" => "string",
"mount" => "string",
"network" => "string",
"gpu-request" => "string",
"decimal" => "double",
"filter" => "string",
"command" => "string",
"ip" => "string",
_ => "string", // unknown types default to string
};
private static OptionValueKind MapValueKind(string? goType) => goType switch
{
null or "" => OptionValueKind.Flag,
"list" => OptionValueKind.List,
"stringArray" => OptionValueKind.List,
"strings" => OptionValueKind.List,
"stringToString" => OptionValueKind.List,
_ => OptionValueKind.Single,
};private static string MapClrType(string? goType) => goType switch
{
null or "" => "bool",
"string" => "string",
"int" => "int",
"int64" => "long",
"uint" or "uint64" => "ulong",
"uint16" => "ushort",
"uint32" => "uint",
"float64" => "double",
"duration" => "string",
"list" => "List<string>",
"stringArray" => "List<string>",
"strings" => "List<string>",
"stringToString" => "Dictionary<string,string>",
"ulimit" => "string",
"bytes" => "string",
"mount" => "string",
"network" => "string",
"gpu-request" => "string",
"decimal" => "double",
"filter" => "string",
"command" => "string",
"ip" => "string",
_ => "string", // unknown types default to string
};
private static OptionValueKind MapValueKind(string? goType) => goType switch
{
null or "" => OptionValueKind.Flag,
"list" => OptionValueKind.List,
"stringArray" => OptionValueKind.List,
"strings" => OptionValueKind.List,
"stringToString" => OptionValueKind.List,
_ => OptionValueKind.Single,
};The full type mapping table:
| Cobra Type | Go Type | C# Type | OptionValueKind |
|---|---|---|---|
| (absent) | bool | bool |
Flag |
string |
string | string |
Single |
int |
int | int |
Single |
int64 |
int64 | long |
Single |
uint16 |
uint16 | ushort |
Single |
uint32 |
uint32 | uint |
Single |
uint / uint64 |
uint / uint64 | ulong |
Single |
float64 |
float64 | double |
Single |
duration |
time.Duration | string |
Single |
list |
[]string | List<string> |
List |
stringArray |
[]string | List<string> |
List |
strings |
[]string | List<string> |
List |
stringToString |
map[string]string | Dictionary<string,string> |
List |
ulimit |
custom | string |
Single |
bytes |
custom | string |
Single |
mount |
custom | string |
Single |
network |
custom | string |
Single |
gpu-request |
custom | string |
Single |
filter |
custom | string |
Single |
A few decisions worth explaining:
duration and bytes map to string, not TimeSpan or long. Go durations ("10s", "5m30s") and Docker memory values ("512m", "2g") have their own formats. I pass them through as strings -- the generated builders provide strongly typed overloads (TimeSpan, long) that format correctly, but the underlying option stays string to avoid lossy translation.
list and stringArray both map to List<string>. Cobra has two slice types that behave identically from the parser's perspective -- both produce repeatable flags like -e FOO=1 -e BAR=2.
stringToString maps to Dictionary<string,string>. This is cobra's map[string]string. On the command line: --label key=value --label key2=value2.
Unknown types default to string. If Docker introduces a new cobra type, the parser does not crash. It maps to string, logs a warning, and I add the type on the next scrape cycle.
Real Scraped Output: Docker vs Docker Compose
Docker Compose uses cobra too, but its help output has different characteristics. Here is docker compose up --help from Compose v2.24.0:
Usage: docker compose up [OPTIONS] [SERVICE...]
Create and start containers
Options:
--abort-on-container-exit Stops all containers if any container
was stopped. Incompatible with -d
--attach stringArray Restrict attaching to the specified
services. Incompatible with
--attach-dependencies.
--build Build images before starting
containers
-d, --detach Detached mode: Run containers in the
background
--dry-run Execute command in dry run mode
--force-recreate Recreate containers even if their
configuration and image haven't changed
--no-deps Don't start linked services
--pull string Pull image before running
("always"|"missing"|"never"|"build")
(default "policy")
--remove-orphans Remove containers for services not
defined in the Compose file
--scale scale Scale SERVICE to NUM instances.
Overrides the `scale` setting in the
Compose file if present.
-t, --timeout int Use this timeout in seconds for
container shutdown (default 0)
--wait Wait for services to be
running|healthy. Implies detached mode.
-w, --watch Watch source code and rebuild/refresh
containers when files are updated.Usage: docker compose up [OPTIONS] [SERVICE...]
Create and start containers
Options:
--abort-on-container-exit Stops all containers if any container
was stopped. Incompatible with -d
--attach stringArray Restrict attaching to the specified
services. Incompatible with
--attach-dependencies.
--build Build images before starting
containers
-d, --detach Detached mode: Run containers in the
background
--dry-run Execute command in dry run mode
--force-recreate Recreate containers even if their
configuration and image haven't changed
--no-deps Don't start linked services
--pull string Pull image before running
("always"|"missing"|"never"|"build")
(default "policy")
--remove-orphans Remove containers for services not
defined in the Compose file
--scale scale Scale SERVICE to NUM instances.
Overrides the `scale` setting in the
Compose file if present.
-t, --timeout int Use this timeout in seconds for
container shutdown (default 0)
--wait Wait for services to be
running|healthy. Implies detached mode.
-w, --watch Watch source code and rebuild/refresh
containers when files are updated.Same cobra format. Same section headers. Same flag line structure. CobraHelpParser handles it identically. But there are differences worth noting:
No aliases section: Compose commands do not have aliases the way Docker commands do (docker container run / docker run). The parser handles the absence of the Aliases: section by transitioning directly from Usage to Description.
stringArray instead of list: Compose prefers stringArray for its repeatable flags (--attach, --no-attach). Docker prefers list. Both map to List<string> in C#.
scale as a type: The --scale flag has type scale, which is a custom cobra type. The parser maps it to string via the fallback rule. The actual format is service=num pairs -- the generated builder provides a Dictionary<string,int> overload that serializes correctly.
Global flags: Compose inherits global flags from the docker compose parent command (--file, --project-name, --project-directory, --profile, --env-file, etc.). Those appear in the Global Flags: section when you run docker compose up --help. The parser captures them and merges them into the option list.
The point: one parser, zero changes, two different CLI tools. The cobra format is the format. Docker and Compose differ in content, not in structure.
The Recursive Scrape Flowchart
CobraHelpParser parses one command at a time. The scraper orchestrates the recursion:
Docker has three levels of nesting: docker -> docker container -> docker container run. Compose has two: docker compose -> docker compose up. The scraper does not hardcode depth -- it just follows subcommands until there are none.
For Docker 24.0.0, this produces 180+ CommandNode objects in a single tree. The scraper serializes the tree to JSON, one file per version: docker-24.0.0.json. That JSON file is what the source generator reads at build time.
The Other Parsers: Why One Parser Cannot Rule Them All
CobraHelpParser handles cobra-based CLIs. But BinaryWrapper supports other CLIs too -- Packer, Vagrant, PodmanCompose (Python), and others. Each has its own help format, and each requires its own parser.
Here is a brief comparison:
| Parser | Framework | Used By | Section Header | Flag Format |
|---|---|---|---|---|
| CobraHelpParser | Go/cobra | Docker, Compose, Podman, glab | Available Commands: |
-s, --name type |
| StandardHelpParser | GNU-style | Generic CLIs | Commands: |
--name=VALUE |
| ArgparseHelpParser | Python argparse | PodmanCompose | optional arguments: |
-s, --name VALUE |
| PackerHelpParser | HashiCorp custom | Packer | Flat subcommand list | -name=value |
| VagrantHelpParser | Ruby custom | Vagrant | Indented tree | --name VALUE |
| GlabHelpParser | Custom cobra template | GitLab CLI | Modified cobra sections | Same as cobra |
Each format is different enough that a single parser would devolve into a mess of heuristics. Cobra uses -s, --name type. GNU-style uses --name=VALUE. Python argparse uses -s, --name VALUE with optional arguments: as the section header. Packer uses single-dash -name=value with Available commands are:. Vagrant uses an indented tree under Common commands:.
The decision tree for auto-detection:
Each parser's CanParse method checks for its framework's signature patterns. CobraHelpParser looks for "Available Commands:" combined with flag lines that match the -s, --name type pattern. ArgparseHelpParser looks for "optional arguments:". PackerHelpParser looks for "Available commands are:" (note: lowercase c, singular are). VagrantHelpParser looks for "Common commands:".
The auto-detection is a best-effort heuristic. For known binaries -- Docker, Compose, Podman, Packer, Vagrant -- the scraper configuration explicitly specifies which parser to use. Auto-detection is the fallback for unknown binaries.
The Reparse Workflow
Here is a scenario I hit regularly: I discover that CobraHelpParser mishandles a rare flag format. Maybe a new Docker version introduces a flag with a type I have not seen, or a description wraps in an unexpected way. I fix the parser. Now I need to verify the fix against all 97 scraped versions without re-running the containers.
This is why the scraper saves raw help text alongside the parsed JSON.
The scraper produces two outputs per version:
scrape/
docker-24.0.0.json # Parsed CommandNode tree
docker-24.0.0.help/ # Raw help text, one file per command
docker.txt
docker-container.txt
docker-container-run.txt
docker-container-ls.txt
docker-image.txt
docker-image-build.txt
...scrape/
docker-24.0.0.json # Parsed CommandNode tree
docker-24.0.0.help/ # Raw help text, one file per command
docker.txt
docker-container.txt
docker-container-run.txt
docker-container-ls.txt
docker-image.txt
docker-image-build.txt
...The .help/ directory contains the raw stdout of every --help invocation. One file per command path, named with hyphens replacing spaces. For Docker 24.0.0, that is 180+ text files.
The --reparse flag tells the scraper to skip container operations and re-run the parser against cached help text:
dotnet run --project src/Scraper -- \
--binary docker \
--reparse \
--output scrape/dotnet run --project src/Scraper -- \
--binary docker \
--reparse \
--output scrape/This reads every .txt file in every .help/ directory, re-parses it with the updated CobraHelpParser, and overwrites the .json files. It takes about 2 seconds for all 97 Docker versions. Compare that to the 30+ minutes it takes to rebuild containers and re-scrape.
The reparse workflow serves two purposes:
- Development loop: Fix a parser bug, reparse, diff the JSON output, verify the fix.
- Regression detection: After any parser change, reparse ALL versions and diff. If any previously correct JSON changes in a way I did not expect, I have a regression.
I diff the JSON output with a simple script:
# Before: save current JSON as baseline
cp -r scrape/*.json baseline/
# After parser change: reparse
dotnet run --project src/Scraper -- --binary docker --reparse
# Diff
diff -r baseline/ scrape/ --include="*.json" | head -100# Before: save current JSON as baseline
cp -r scrape/*.json baseline/
# After parser change: reparse
dotnet run --project src/Scraper -- --binary docker --reparse
# Diff
diff -r baseline/ scrape/ --include="*.json" | head -100If the diff shows only the fixes I intended, I commit. If it shows unexpected changes, I investigate. This has caught at least a dozen regressions over the life of the project.
CanParse: Auto-Detecting Cobra Output
The CanParse method is CobraHelpParser's gate. Given a blob of help text from an unknown binary, it returns true if the text looks like cobra output:
public bool CanParse(string helpText)
{
if (string.IsNullOrWhiteSpace(helpText))
return false;
var lines = helpText.Split('\n');
var hasUsage = false;
var hasCobraFlags = false;
var hasCommands = false;
foreach (var line in lines)
{
var trimmed = line.TrimStart();
if (trimmed.StartsWith("Usage:"))
hasUsage = true;
if (trimmed is "Available Commands:" or "Commands:")
hasCommands = true;
if (trimmed is "Options:" or "Flags:" or "Global Flags:")
hasCobraFlags = true;
// Look for cobra-style flag format: -X, --name
if (FlagLineRegex.IsMatch(line))
hasCobraFlags = true;
}
// Cobra output always has Usage: and either commands or flags
return hasUsage && (hasCobraFlags || hasCommands);
}public bool CanParse(string helpText)
{
if (string.IsNullOrWhiteSpace(helpText))
return false;
var lines = helpText.Split('\n');
var hasUsage = false;
var hasCobraFlags = false;
var hasCommands = false;
foreach (var line in lines)
{
var trimmed = line.TrimStart();
if (trimmed.StartsWith("Usage:"))
hasUsage = true;
if (trimmed is "Available Commands:" or "Commands:")
hasCommands = true;
if (trimmed is "Options:" or "Flags:" or "Global Flags:")
hasCobraFlags = true;
// Look for cobra-style flag format: -X, --name
if (FlagLineRegex.IsMatch(line))
hasCobraFlags = true;
}
// Cobra output always has Usage: and either commands or flags
return hasUsage && (hasCobraFlags || hasCommands);
}The heuristic is deliberately conservative: cobra output always has Usage: plus either flags or commands. I would rather return false and let another parser try than return true and produce garbage. False negatives mean specifying the parser explicitly. False positives mean silently corrupted command trees.
Malformed Help Handling
Not all help text is clean. Over 97 Docker versions and 57 Compose versions, I have seen experimental commands with truncated output, plugin commands with custom cobra templates, old Docker 17.x formatting, and commands that error instead of printing help. The parser's philosophy: extract what you can, skip what you cannot, never crash.
public CommandNode ParseHelp(string helpText, string commandPath)
{
try
{
return ParseHelpCore(helpText, commandPath);
}
catch (Exception ex)
{
_logger.LogWarning(
"Failed to parse help for {Command}: {Error}. " +
"Returning empty CommandNode.",
commandPath, ex.Message);
return new CommandNode(
Name: ExtractCommandName(commandPath),
Description: null,
SubCommands: Array.Empty<CommandNode>(),
Options: Array.Empty<CommandOption>());
}
}public CommandNode ParseHelp(string helpText, string commandPath)
{
try
{
return ParseHelpCore(helpText, commandPath);
}
catch (Exception ex)
{
_logger.LogWarning(
"Failed to parse help for {Command}: {Error}. " +
"Returning empty CommandNode.",
commandPath, ex.Message);
return new CommandNode(
Name: ExtractCommandName(commandPath),
Description: null,
SubCommands: Array.Empty<CommandNode>(),
Options: Array.Empty<CommandOption>());
}
}Inside ParseHelpCore, unrecognized lines in the Options state are silently skipped -- no crash, no exception. Empty help text returns an empty CommandNode. Missing Usage lines fall back to the commandPath parameter. Truncated output returns whatever was parsed so far. Partial data is better than no data.
Post-parse validation catches parser bugs: duplicate long names (a strong signal that a continuation line was misidentified as a new flag), missing descriptions, and suspiciously empty nodes all generate warnings in the scrape log.
Edge Cases from 97 Docker Versions
Scraping 97 versions of Docker (and 57 of Compose) is the best stress test a parser can get. Here are the edge cases that forced parser changes:
Edge Case 1: Flags with No Description
Some Docker commands have flags with no description at all:
--oom-score-adj int --oom-score-adj intJust the name and type, no gap, no description. The regex handles this because the description group is optional ((?:\s{2,}(.+))?$). But the continuation line detector needs to not treat the next real flag line as a continuation of this one.
Edge Case 2: Plugin Commands
Docker plugins like buildx and scout are cobra-based but register as plugins. Their help output is slightly different:
Usage: docker buildx build [OPTIONS] PATH | URL | -
Start a build
Build Flags:
--add-host strings Additional custom host-to-IP
mapping (format: "host:ip")
--allow strings Allow extra privileged
entitlement (e.g.,
"network.host",
"security.insecure")Usage: docker buildx build [OPTIONS] PATH | URL | -
Start a build
Build Flags:
--add-host strings Additional custom host-to-IP
mapping (format: "host:ip")
--allow strings Allow extra privileged
entitlement (e.g.,
"network.host",
"security.insecure")Note "Build Flags:" instead of "Options:". I added this to the section header detection:
_ when trimmedLine.EndsWith("Flags:") => ParserState.Options,_ when trimmedLine.EndsWith("Flags:") => ParserState.Options,Any line ending with "Flags:" transitions to the Options state. This is slightly more permissive than the strict allowlist approach, but it is safe because this pattern only appears as a section header in cobra output.
Edge Case 3: Deprecated Flags
Some flags have DEPRECATED in their description:
--link list Add link to another container
(DEPRECATED) --link list Add link to another container
(DEPRECATED)The parser does not strip this marker. It flows into the Description field and eventually into the generated C# code as a [Obsolete] attribute. The code generator detects (DEPRECATED) in the description and adds the attribute.
Edge Case 4: Hidden Commands
Some Docker versions have hidden commands that do not appear in Available Commands: but respond to --help if you know the name. The parser cannot discover these -- they are invisible in the help text. I handle this with a supplementary list of known hidden commands per binary, maintained manually.
Edge Case 5: Flag Names with Dots
The Docker daemon has flags like --log-opt max-size=10m where the option name in some help outputs contains a dot: --storage-opt dm.basesize. The regex [\w][\w-]* matches word characters and hyphens. Dots are not word characters. I extended the pattern:
@"(--[\w][\w.-]*)" // long name: --name, --storage-opt@"(--[\w][\w.-]*)" // long name: --name, --storage-optThe dot only appeared in a few daemon-level flags, but without this fix, those flags silently disappeared from the parsed output.
Putting It All Together
Here is the complete flow from raw help text to serialized JSON, showing how CobraHelpParser fits into the larger scraping pipeline:
// In the scraper's recursive walk
async Task ScrapeCommand(
string binaryPath,
string commandPath,
IHelpParser parser,
CancellationToken ct)
{
// 1. Execute --help
var helpText = await _processRunner.RunAsync(
binaryPath,
$"{commandPath} --help",
ct);
// 2. Parse the help text
var node = parser.ParseHelp(helpText, commandPath);
// 3. Save raw help text for reparse
var helpFileName = commandPath.Replace(' ', '-') + ".txt";
await File.WriteAllTextAsync(
Path.Combine(_helpDir, helpFileName),
helpText, ct);
// 4. Recurse into subcommands
var children = new List<CommandNode>();
foreach (var sub in node.SubCommands)
{
var childPath = $"{commandPath} {sub.Name}";
var childNode = await ScrapeCommand(
binaryPath, childPath, parser, ct);
children.Add(childNode);
}
// 5. Return the node with fully populated children
return node with
{
SubCommands = children,
};
}// In the scraper's recursive walk
async Task ScrapeCommand(
string binaryPath,
string commandPath,
IHelpParser parser,
CancellationToken ct)
{
// 1. Execute --help
var helpText = await _processRunner.RunAsync(
binaryPath,
$"{commandPath} --help",
ct);
// 2. Parse the help text
var node = parser.ParseHelp(helpText, commandPath);
// 3. Save raw help text for reparse
var helpFileName = commandPath.Replace(' ', '-') + ".txt";
await File.WriteAllTextAsync(
Path.Combine(_helpDir, helpFileName),
helpText, ct);
// 4. Recurse into subcommands
var children = new List<CommandNode>();
foreach (var sub in node.SubCommands)
{
var childPath = $"{commandPath} {sub.Name}";
var childNode = await ScrapeCommand(
binaryPath, childPath, parser, ct);
children.Add(childNode);
}
// 5. Return the node with fully populated children
return node with
{
SubCommands = children,
};
}The parser is stateless. It takes text in, returns a CommandNode out. The scraper owns the recursion, the file I/O, and the tree assembly. This separation means I can unit test the parser with raw text fixtures and integration test the scraper with a mock IProcessRunner.
Performance
Parser performance does not matter much -- we parse at design time, not runtime. But for reference: a single command parse takes ~0.1ms, a full Docker version (180+ commands) takes ~15ms, and reparsing all 97 versions takes ~1.5 seconds. The regex is compiled (RegexOptions.Compiled), but even without that, string splitting on a few hundred lines of text is not where bottlenecks live.
Testing Strategy
CobraHelpParser has 80+ unit tests in three categories: fixture tests (real help text from specific Docker/Compose/Podman versions saved as embedded resources), edge case tests (synthetic help text exercising multi-line descriptions, missing fields, unusual types), and cross-binary consistency tests (verifying the same parser produces equivalent structures for Docker, Compose, and Podman).
[Theory]
[InlineData("docker-24.0.0-container-run")]
[InlineData("docker-20.10.0-container-run")]
[InlineData("compose-2.24.0-up")]
[InlineData("podman-4.9.0-run")]
public void ParsesRealHelpText(string fixtureName)
{
var helpText = LoadFixture(fixtureName);
var node = _parser.ParseHelp(helpText, fixtureName);
node.Name.Should().NotBeNullOrEmpty();
node.Options.Should().NotBeEmpty();
node.Options.Should().AllSatisfy(o =>
o.LongName.Should().StartWith("--"));
}
[Fact]
public void DockerAndPodmanProduceSameStructure()
{
var dockerNode = _parser.ParseHelp(
LoadFixture("docker-24.0.0-container-run"), "docker container run");
var podmanNode = _parser.ParseHelp(
LoadFixture("podman-4.9.0-run"), "podman run");
var commonFlags = new[] { "--detach", "--name", "--env", "--volume" };
foreach (var flag in commonFlags)
{
dockerNode.Options.Should().Contain(o => o.LongName == flag);
podmanNode.Options.Should().Contain(o => o.LongName == flag);
}
}[Theory]
[InlineData("docker-24.0.0-container-run")]
[InlineData("docker-20.10.0-container-run")]
[InlineData("compose-2.24.0-up")]
[InlineData("podman-4.9.0-run")]
public void ParsesRealHelpText(string fixtureName)
{
var helpText = LoadFixture(fixtureName);
var node = _parser.ParseHelp(helpText, fixtureName);
node.Name.Should().NotBeNullOrEmpty();
node.Options.Should().NotBeEmpty();
node.Options.Should().AllSatisfy(o =>
o.LongName.Should().StartWith("--"));
}
[Fact]
public void DockerAndPodmanProduceSameStructure()
{
var dockerNode = _parser.ParseHelp(
LoadFixture("docker-24.0.0-container-run"), "docker container run");
var podmanNode = _parser.ParseHelp(
LoadFixture("podman-4.9.0-run"), "podman run");
var commonFlags = new[] { "--detach", "--name", "--env", "--volume" };
foreach (var flag in commonFlags)
{
dockerNode.Options.Should().Contain(o => o.LongName == flag);
podmanNode.Options.Should().Contain(o => o.LongName == flag);
}
}The fixture tests are regression tests -- if a parser change breaks an existing fixture, the test fails. The cross-binary tests verify the abstraction holds: same interface, same behavior, different binaries.
What The Parser Does Not Do
It is worth being explicit about the boundaries:
The parser does not discover hidden flags. Cobra has a concept of hidden flags that do not appear in --help output. If Docker hides an experimental flag, the parser cannot see it. This is fine -- if it is hidden, I probably should not generate a typed API for it.
The parser does not validate flag semantics. It does not know that --memory expects a value like "512m" or that --restart only accepts "no", "always", "unless-stopped", "on-failure". Semantic validation is the code generator's job, informed by hardcoded allowlists and (default ...) patterns in descriptions.
The parser does not resolve flag conflicts. Some Docker flags are mutually exclusive (--network host and --publish). Cobra does not express this in help text, so the parser cannot extract it. Conflict detection is manual, maintained in a supplementary configuration file.
The parser does not handle non-cobra formats. That is what the other five parsers are for.
Closing
One parser. Five binaries. Thousands of commands. CobraHelpParser is the most exercised parser in the BinaryWrapper suite -- and the one that demonstrates why scraping --help is a viable strategy for generating typed APIs.
The cobra framework gave Go CLI tools a consistent, machine-readable help format. CobraHelpParser exploits that consistency with an 8-state state machine, a single regex for flag lines, and a type mapping table that converts Go types to CLR types. The reparse workflow turns 97 cached versions into a regression test suite that validates every parser change against every Docker release from 17.x to 27.x.
The parsed CommandNode tree is the input to the next stage: the Roslyn source generator that turns it into C# code. That is Part VI: Build Time -- The Source Generator for CLI Commands.
Previous: Part IV: Design Time -- Scraping 57 Docker Compose Versions | Next: Part VI: Build Time -- The Source Generator for CLI Commands
Back to series index