Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

Part IV: The Source Generator -- Parsing Ruby, Merging Versions

This is the heart of the project: a Roslyn Source Generator that reads ~80 Ruby template files, parses each one into a structured model, merges all versions into a single superset with version metadata, and emits ~400 C# classes. The input is commented-out Ruby. The output is strongly-typed C# with IntelliSense.

The Generator Entry Point

The generator follows the IIncrementalGenerator pattern. It discovers gitlab-*.rb files registered as AdditionalFiles, collects them all, and orchestrates the pipeline:

[Generator]
public sealed class GitLabOmnibusGenerator : IIncrementalGenerator
{
    public void Initialize(IncrementalGeneratorInitializationContext context)
    {
        var rbFiles = context.AdditionalTextsProvider
            .Where(static f =>
            {
                var name = System.IO.Path.GetFileName(f.Path);
                return name.StartsWith("gitlab-") && name.EndsWith(".rb");
            });

        context.RegisterSourceOutput(rbFiles.Collect(), static (ctx, files) =>
        {
            if (files.IsDefaultOrEmpty || files.Length == 0)
                return;

            var ns = "FrenchExDev.Net.GitLab.DockerCompose";
            Generate(ctx, ns, files);
        });
    }
}

The Generate method runs a four-step pipeline:

  1. Parse each .rb file into a GitLabRbModel
  2. Merge all models into a UnifiedGitLabRbModel
  3. Emit version metadata, config classes, builder classes
  4. Emit rendering metadata for the runtime renderer
private static void Generate(SourceProductionContext ctx, string ns,
    ImmutableArray<AdditionalText> files)
{
    // 1. Parse each versioned .rb file
    var models = new List<GitLabRbModel>();
    foreach (var file in files)
    {
        var version = GitLabRbParser.ExtractVersion(
            System.IO.Path.GetFileName(file.Path));
        var model = GitLabRbParser.Parse(
            file.GetText(ctx.CancellationToken)!.ToString(), version);
        models.Add(model);
    }

    // 2. Merge across versions
    var unified = GitLabRbVersionMerger.Merge(models);

    // 3. Emit version metadata + attributes
    ctx.AddSource("GitLabOmnibusVersions.g.cs", ...);

    // 4. Emit per-prefix config classes + builders
    foreach (var prefixGroup in unified.PrefixGroups)
    {
        var className = NamingHelper.PrefixToClassName(prefixGroup.Prefix);
        EmitClassAndBuilder(ctx, ns, className, prefixGroup.Root, ...);
    }

    // 5. Emit root GitLabOmnibusConfig + builder
    // 6. Emit rendering metadata
}

Parsing Ruby Configuration

The GitLabRbParser is a regex-based parser purpose-built for the gitlab.rb.template format. It does not parse arbitrary Ruby -- it handles the specific patterns used in GitLab's configuration templates.

The Challenge

The input is not a schema. It is commented-out Ruby code with inline documentation:

################################################################################
## GitLab NGINX
##! Docs: https://docs.gitlab.com/omnibus/settings/nginx.html
################################################################################

##! Most root users won't need this setting.
# nginx['listen_port'] = nil

# nginx['proxy_set_headers'] = {
#   "Host" => "$http_host_with_default",
#   "X-Real-IP" => "$remote_addr",
# }

# external_url 'http://gitlab.example.com'

The parser must handle:

  • Settings: # prefix['key'] = value (the bulk of the file)
  • Nested keys: # prefix['key1']['key2'] = value
  • Standalone URLs: # external_url 'https://...' (no prefix, no brackets)
  • Multi-line hashes: # prefix['key'] = { ... } spanning multiple lines
  • YAML blocks: # prefix['key'] = YAML.load <<-'EOS' ... EOS
  • Doc comments: ##! description text (attached to the next setting)
  • Section headers: ### Section Name ### (ignored -- grouping is by prefix)

Core Regex Patterns

Four regex patterns handle 99% of the input:

// # prefix['key'] = value
// # prefix['k1']['k2'] = value
private static readonly Regex SettingRegex = new Regex(
    @"^#\s+([a-z_]+)(\['.+?'\](?:\['.+?'\])*)\s*=\s*(.+)$",
    RegexOptions.Compiled);

// Extract bracket keys: ['key1']['key2'] → ["key1", "key2"]
private static readonly Regex BracketKeyRegex = new Regex(
    @"\['([^']+)'\]",
    RegexOptions.Compiled);

// # external_url 'value' or # registry_external_url 'value'
private static readonly Regex StandaloneRegex = new Regex(
    @"^#?\s*(external_url|registry_external_url|pages_external_url|" +
    @"mattermost_external_url|gitlab_kas_external_url|runtime_dir)" +
    @"\s+['""](.+?)['""]",
    RegexOptions.Compiled);

// YAML.load <<-'EOS'
private static readonly Regex YamlBlockStartRegex = new Regex(
    @"^#\s+([a-z_]+)\['([^']+)'\]\s*=\s*YAML\.load\s+<<-'EOS'",
    RegexOptions.Compiled);

The Parse Loop

The parser processes the file line by line, maintaining a pendingDoc variable for doc comments:

public static GitLabRbModel Parse(string content, string version)
{
    var model = new GitLabRbModel { Version = version };
    var prefixGroups = new Dictionary<string, GitLabRbPrefixGroup>();
    string? pendingDoc = null;
    int i = 0;

    while (i < lines.Length)
    {
        var line = lines[i].TrimEnd();

        // Banner lines (##! doc comments, ### sections)
        if (line.StartsWith("##"))
        {
            if (line.StartsWith("##!") || line.StartsWith("###!"))
                pendingDoc = /* accumulate doc text */;
            i++; continue;
        }

        // Standalone URLs (external_url, registry_external_url, ...)
        var standaloneMatch = StandaloneRegex.Match(line);
        if (standaloneMatch.Success)
        {
            model.StandaloneUrls.Add(new GitLabRbStandaloneUrl
            {
                RubyKey = standaloneMatch.Groups[1].Value,
                ExampleValue = standaloneMatch.Groups[2].Value,
                DocComment = pendingDoc,
            });
            pendingDoc = null; i++; continue;
        }

        // YAML block: prefix['key'] = YAML.load <<-'EOS' ... EOS
        // → treated as opaque string

        // Setting: prefix['key'] = value
        var settingMatch = SettingRegex.Match(line);
        if (settingMatch.Success)
        {
            var prefix = settingMatch.Groups[1].Value;
            var keys = ExtractBracketKeys(settingMatch.Groups[2].Value);
            var valueRaw = StripInlineComment(settingMatch.Groups[3].Value);

            // Multi-line hash? Collect until braces balance
            if (IsHashStart(valueRaw))
            {
                var hashContent = CollectMultiLineHash(lines, ref i, valueRaw);
                ParseHashIntoNode(/* target node */, hashContent);
                continue;
            }

            // Simple scalar → infer type from value
            leaf.LeafType = InferValueType(valueRaw);
            leaf.ExampleValue = valueRaw.Trim();
            leaf.DocComment = pendingDoc;
        }
    }
    return model;
}

Type Inference

The parser infers the Ruby value type from the example value:

internal static GitLabRbValueType InferValueType(string value)
{
    var v = value.Trim();

    if (v == "true" || v == "false") return GitLabRbValueType.Boolean;
    if (v == "nil")                  return GitLabRbValueType.Nil;
    if (v == "{}")                   return GitLabRbValueType.StringDict;
    if (v == "[]")                   return GitLabRbValueType.StringList;

    // Quoted string
    if ((v.StartsWith("'") && v.EndsWith("'")) ||
        (v.StartsWith("\"") && v.EndsWith("\"")))
        return GitLabRbValueType.String;

    // Array
    if (v.StartsWith("[")) return GitLabRbValueType.StringList;

    // Hash
    if (v.StartsWith("{")) return GitLabRbValueType.StringDict;

    // Number
    if (long.TryParse(v, out var longVal))
        return longVal > int.MaxValue ? GitLabRbValueType.Long
                                      : GitLabRbValueType.Integer;

    if (double.TryParse(v, ...)) return GitLabRbValueType.Float;

    return GitLabRbValueType.String; // fallback
}

These Ruby types map to C# types:

Ruby Value GitLabRbValueType C# Type
true, false Boolean bool?
nil Nil string?
123 Integer int?
9999999999 Long long?
0.75 Float double?
'text', "text" String string?
['a', 'b'] StringList List<string>?
[0.1, 0.5] FloatList List<double>?
{ 'k' => 'v' } StringDict Dictionary<string, string?>?

Multi-Line Hash Parsing

Some settings span multiple lines:

# nginx['proxy_set_headers'] = {
#   "Host" => "$http_host_with_default",
#   "X-Real-IP" => "$remote_addr",
#   "X-Forwarded-For" => "$proxy_add_x_forwarded_for",
# }

The parser detects an opening { without a matching } on the same line, then collects subsequent lines until braces balance:

private static string CollectMultiLineHash(string[] lines, ref int i,
    string firstLineValue)
{
    var content = firstLineValue;
    int depth = CountChar(content, '{') - CountChar(content, '}');
    i++;

    while (i < lines.Length && depth > 0)
    {
        var raw = lines[i].TrimEnd();
        if (raw.StartsWith("#"))
            raw = raw.Substring(1).TrimStart();

        content += "\n" + raw;
        depth += CountChar(raw, '{') - CountChar(raw, '}');
        i++;
    }

    return content;
}

The collected hash content is then parsed into child nodes, supporting both Ruby string-key hashes ('key' => value) and symbol-key hashes (key: value):

private static void ParseHashIntoNode(GitLabRbObjectNode node, string hashContent)
{
    var isSymbolKey = trimmed.Contains(":") && !trimmed.Contains("=>");

    if (isSymbolKey)
        ParseSymbolKeyHash(node, trimmed);   // { listen_addr: 'value' }
    else
        ParseStringKeyHash(node, trimmed);   // { 'key' => 'value' }
}

The Parsed Model

After parsing, each version produces a GitLabRbModel:

internal sealed class GitLabRbModel
{
    public string Version { get; set; } = "";
    public List<GitLabRbPrefixGroup> PrefixGroups { get; set; } = new();
    public List<GitLabRbStandaloneUrl> StandaloneUrls { get; set; } = new();
}

internal sealed class GitLabRbPrefixGroup
{
    public string Prefix { get; set; } = "";          // "nginx"
    public GitLabRbObjectNode Root { get; set; };      // tree of settings
}

internal sealed class GitLabRbObjectNode
{
    public string Name { get; set; } = "";
    public string? DocComment { get; set; }
    public Dictionary<string, GitLabRbObjectNode> Children { get; set; } = new();
    public GitLabRbValueType? LeafType { get; set; }   // null = branch node
    public string? ExampleValue { get; set; }
    public bool IsArrayOfObjects { get; set; }
}

Leaf nodes have a LeafType (the inferred Ruby value type). Branch nodes have Children (a dictionary of sub-nodes). This mirrors JSON Schema's object/property distinction but derived from Ruby syntax patterns.

The 55 Ruby Prefixes

Each prefix in the gitlab.rb template maps to a separate C# config class. The naming conversion handles special cases:

public static string PrefixToClassName(string prefix)
{
    switch (prefix)
    {
        case "gitlab_rails":      return "GitLabRailsConfig";
        case "gitlab_workhorse":  return "GitLabWorkhorseConfig";
        case "gitlab_shell":      return "GitLabShellConfig";
        case "gitlab_kas":        return "GitLabKasConfig";
        case "gitlab_pages":      return "GitLabPagesConfig";
        case "geo_secondary":     return "GeoSecondaryConfig";
        case "node_exporter":     return "NodeExporterConfig";
        case "pages_nginx":       return "PagesNginxConfig";
        case "registry_nginx":    return "RegistryNginxConfig";
        // ... 30+ more special cases
        default: return ToPascalCase(prefix) + "Config";
    }
}

Representative prefixes and their generated classes:

Ruby Prefix C# Class Setting Count
gitlab_rails GitLabRailsConfig ~200
nginx NginxConfig ~30
postgresql PostgresqlConfig ~25
redis RedisConfig ~15
registry RegistryConfig ~20
prometheus PrometheusConfig ~15
gitaly GitalyConfig ~20
puma PumaConfig ~10
sidekiq SidekiqConfig ~10
mattermost MattermostConfig ~50
letsencrypt LetsencryptConfig ~5
gitlab_kas GitLabKasConfig ~10
... ... ...
Total 55 classes ~500 settings

Version Merging

The merger takes ~80 parsed models and produces a single unified superset. The algorithm is the same used by SchemaVersionMerger in the Docker Compose Bundle:

internal static class GitLabRbVersionMerger
{
    public static UnifiedGitLabRbModel Merge(List<GitLabRbModel> models)
    {
        models.Sort((a, b) => NamingHelper.CompareVersions(a.Version, b.Version));

        var firstVersion = models[0].Version;
        var lastVersion = models[models.Count - 1].Version;

        // For each prefix across all versions:
        foreach (var prefix in allPrefixes)
        {
            var first = models.First(m => /* has this prefix */);
            var last = models.Last(m => /* has this prefix */);

            // Collect all versioned root nodes for this prefix
            var versionedRoots = /* ... */;

            // Recursive merge of property trees
            var mergedRoot = MergeNodes(versionedRoots, firstVersion, lastVersion);

            unified.PrefixGroups.Add(new UnifiedPrefixGroup
            {
                Prefix = prefix,
                Root = mergedRoot,
                SinceVersion = first.Version == firstVersion ? null : first.Version,
                UntilVersion = last.Version == lastVersion ? null : last.Version,
            });
        }

        return unified;
    }
}

The Merge Algorithm

For each property (leaf or branch) across all versions:

  1. Find the first version where it appears → SinceVersion
  2. Find the last version where it appears ��� UntilVersion
  3. If it appears in both the first and last tracked version → both are null (meaning "all tracked versions")
private static UnifiedObjectNode MergeNodes(
    List<(string Version, GitLabRbObjectNode Node)> versionedNodes,
    string firstVersion, string lastVersion)
{
    var allChildKeys = versionedNodes
        .SelectMany(vn => vn.Node.Children.Keys)
        .Distinct().ToList();

    foreach (var childKey in allChildKeys)
    {
        var childVersions = versionedNodes
            .Where(vn => vn.Node.Children.ContainsKey(childKey))
            .ToList();

        var firstChildVersion = childVersions[0].Version;
        var lastChildVersion = childVersions[^1].Version;

        var mergedChild = MergeNodes(/* recurse */);
        mergedChild.SinceVersion =
            firstChildVersion == firstVersion ? null : firstChildVersion;
        mergedChild.UntilVersion =
            lastChildVersion == lastVersion ? null : lastChildVersion;

        merged.Children[childKey] = mergedChild;
    }

    return merged;
}

Concrete Example

Consider a setting tracked across three versions:

Setting v15.0.5 v16.0.8 v18.10.1
nginx['listen_port'] present present present
gitlab_kas['enable'] absent absent present
high_availability['mountpoint'] present present absent

After merging:

Setting SinceVersion UntilVersion Meaning
nginx['listen_port'] null null Exists in all tracked versions
gitlab_kas['enable'] "18.10.1" null New in v18.10.1, still present
high_availability['mountpoint'] null "16.0.8" Removed after v16.0.8

These version annotations flow through to the emitted C# properties as [SinceVersion] and [UntilVersion] attributes.

Class Emission

The emitter walks the unified model and produces C# classes:

public static string EmitModelClass(
    string ns, string className, UnifiedObjectNode node,
    List<PropertyInfo> properties, string? sinceVersion, string? untilVersion)
{
    var sb = new StringBuilder();
    sb.AppendLine("// <auto-generated/>");
    sb.AppendLine("#nullable enable");
    sb.AppendLine($"namespace {ns};");

    if (sinceVersion != null)
        sb.AppendLine($"[SinceVersion(\"{sinceVersion}\")]");
    if (untilVersion != null)
        sb.AppendLine($"[UntilVersion(\"{untilVersion}\")]");

    sb.AppendLine("[ExcludeFromCodeCoverage]");
    sb.AppendLine($"public partial class {className}");
    sb.AppendLine("{");

    foreach (var prop in properties)
    {
        if (prop.DocComment != null)
            sb.AppendLine($"    /// <summary>{EscapeXml(prop.DocComment)}</summary>");
        if (prop.SinceVersion != null)
            sb.AppendLine($"    [SinceVersion(\"{prop.SinceVersion}\")]");
        if (prop.UntilVersion != null)
            sb.AppendLine($"    [UntilVersion(\"{prop.UntilVersion}\")]");

        sb.AppendLine($"    public {prop.CSharpType} {prop.Name} {{ get; set; }}");
    }

    sb.AppendLine("}");
    return sb.ToString();
}

The generator recursively walks the property tree:

  • Leaf node → simple property on the parent class
  • Branch node → sub-class + property of that sub-class type
  • Array of objectsList<SubClass> property with a dedicated item class

Each class gets a companion Builder class emitted by the shared BuilderEmitter from FrenchExDev.Net.Builder.SourceGenerator.Lib -- the same builder infrastructure used across the monorepo (see Builder Pattern).

Naming Conversion

Ruby uses snake_case. C# uses PascalCase. The NamingHelper handles both directions:

// Ruby → C# (for property names)
public static string ToPascalCase(string snakeCase)
{
    // "listen_port" → "ListenPort"
    // "smtp_enable_starttls_auto" → "SmtpEnableStarttlsAuto"
    var sb = new StringBuilder();
    bool capitalizeNext = true;

    for (int i = 0; i < snakeCase.Length; i++)
    {
        var c = snakeCase[i];
        if (c == '_' || c == '-' || c == '.')
        {
            capitalizeNext = true;
            continue;
        }
        sb.Append(capitalizeNext ? char.ToUpperInvariant(c) : c);
        capitalizeNext = false;
    }

    return sb.ToString();
}

The reverse direction (PascalCase → snake_case) is used by the renderer at runtime and handles edge cases like consecutive uppercase letters (DBdb, not d_b).

What Gets Emitted

For ~80 input files, the generator produces approximately:

Output Count Description
Config classes ~55 One per Ruby prefix (e.g., NginxConfig.g.cs)
Sub-config classes ~150 Nested classes for hash/object settings
Builder classes ~200 One per config class + sub-class
Version metadata 1 GitLabOmnibusVersions.g.cs
Version attributes 2 SinceVersionAttribute, UntilVersionAttribute
Rendering metadata 1 GitLabRbMetadata.g.cs (prefix/key/type mappings)
Root config 1 GitLabOmnibusConfig.g.cs
Root builder 1 GitLabOmnibusConfigBuilder.g.cs
Total ~400 All in obj/Generated/

The next part examines what these generated files look like -- the typed models, the builder API, and the IntelliSense experience.

⬇ Download