Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

The Problem in Detail

In a typical YAML format like Docker Compose, keys are namespaced:

# Docker Compose: structured
services:
  web:
    image: nginx
  db:
    image: postgres

In GitLab CI, there's no such namespacing:

# GitLab CI: flat
stages: [build, test]
variables:
  CI: "true"
build:
  script: echo building
test:
  script: echo testing

stages and variables are reserved keywords. build and test are job names. They coexist at the same YAML level. The only way to distinguish them is by knowing the set of reserved keywords.

How the Schema Encodes This

The JSON Schema uses properties for reserved keys and additionalProperties for everything else:

{
  "type": "object",
  "properties": {
    "stages": { ... },
    "variables": { "$ref": "#/definitions/globalVariables" },
    "include": { ... },
    "default": { ... },
    "workflow": { ... }
    // ... other reserved keys
  },
  "additionalProperties": {
    "$ref": "#/definitions/job"
  }
}

The additionalProperties declaration says: "any key not in properties is a job definition." This is how a YAML validator knows that build: should conform to the job schema.

How the C# Model Represents This

The generated GitLabCiFile class uses explicit properties for reserved keys and a dictionary for jobs:

public partial class GitLabCiFile
{
    // Reserved keys as typed properties
    public List<object>? Stages { get; set; }
    public Dictionary<string, object?>? Variables { get; set; }
    public object? Include { get; set; }
    public GitLabCiDefault? Default { get; set; }
    public GitLabCiWorkflow? Workflow { get; set; }
    // ... other reserved properties

    // Arbitrary job names as dictionary
    public Dictionary<string, GitLabCiJob>? Jobs { get; set; }

    // Forward-compatible catch-all
    public Dictionary<string, object?>? Extensions { get; set; }
}

How the Writer Flattens This Back

The writer must reverse this split, producing a flat YAML document:

private static Dictionary<string, object?> BuildRootDictionary(
    GitLabCiFile ciFile)
{
    var dict = new Dictionary<string, object?>();

    // 1. Reserved keys first (ordered)
    if (ciFile.Stages is not null)     dict["stages"] = ciFile.Stages;
    if (ciFile.Variables is not null)  dict["variables"] = ciFile.Variables;
    if (ciFile.Include is not null)    dict["include"] = ciFile.Include;
    if (ciFile.Default is not null)    dict["default"] = ciFile.Default;
    if (ciFile.Workflow is not null)   dict["workflow"] = ciFile.Workflow;

    // 2. Extensions (unknown root properties)
    if (ciFile.Extensions is not null)
        foreach (var kvp in ciFile.Extensions)
            dict[kvp.Key] = kvp.Value;

    // 3. Jobs (merged at root level)
    if (ciFile.Jobs is not null)
        foreach (var kvp in ciFile.Jobs)
            dict[kvp.Key] = kvp.Value;

    return dict;
}

The ordering is deliberate:

  1. Reserved keys come first → conventional YAML structure
  2. Extensions next → for custom root-level properties
  3. Jobs last → all job definitions follow configuration

How the Reader Separates This

The reader performs the inverse operation:

// Known reserved keys
private static readonly HashSet<string> ReservedKeys =
    new(StringComparer.OrdinalIgnoreCase)
{
    "stages", "variables", "include", "default", "workflow",
    "image", "services", "before_script", "after_script", "cache",
    "spec", "pages",
};

// For each key in the raw YAML:
foreach (var kvp in raw)
{
    if (ReservedKeys.Contains(kvp.Key))
        continue;              // Reserved → already handled
    if (kvp.Key.StartsWith("."))
        continue;              // Template → skip
    // Everything else → it's a job
    jobs[kvp.Key] = DeserializeJob(kvp.Value);
}

Why This Design?

The alternative would be to deserialize directly to GitLabCiFile using YamlDotNet's object mapping. But this fails because:

  1. YamlDotNet can't distinguish reserved keys from jobs at the same level
  2. The Jobs property isn't in the YAML — jobs are at root level
  3. Union types (script: string | list) need custom converters
  4. Unknown properties (future GitLab keys) would cause errors

The two-phase approach (raw parse → typed extraction) handles all of these gracefully.

Diagram
Reserved keys land on typed properties, dot-prefixed templates are dropped, and everything else is routed into the Jobs dictionary — the two-phase reader encodes exactly this mapping.

Deep Dive: Union Type Handling

Union types are the most challenging aspect of mapping JSON Schema to C#. GitLab CI uses them extensively, and each union requires a different strategy.

Strategy 1: List for String-or-Array

The most common union: script: "echo hello" (string) vs script: ["echo hello", "echo world"] (array).

Schema:

{
  "script": {
    "oneOf": [
      { "type": "string" },
      { "type": "array", "items": { "type": "string" } }
    ]
  }
}

C# Type: List<string>?

How it works:

  • The StringOrListConverter handles deserialization: a scalar becomes a single-element list
  • Serialization always writes as a YAML sequence (list)
  • The C# API uses List<string> consistently — no ambiguity

Trade-off: When serializing, a single-line script like echo hello always becomes:

script:
- echo hello

Instead of the shorter script: echo hello. This is valid but less compact.

Strategy 2: Config Class for String-or-Object

Some properties accept either a string shorthand or a full object:

# String form
environment: production

# Object form
environment:
  name: production
  url: https://app.example.com
  on_stop: stop-production

Schema:

{
  "environment": {
    "oneOf": [
      { "type": "string" },
      {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "url": { "type": "string" },
          "on_stop": { "type": "string" }
        }
      }
    ]
  }
}

C# Type: GitLabCiJobTemplateEnvironmentConfig?

How it works:

  • The source generator detects oneOf[string, object with properties]
  • It generates an inline class (GitLabCiJobTemplateEnvironmentConfig) from the object's properties
  • The C# API always uses the object form
  • When reading YAML, the string form is handled by the catch-all fallback in the reader

Trade-off: The string shorthand environment: production doesn't round-trip perfectly — it would be serialized as the object form with just the name property.

Strategy 3: Primitive for String-or-Primitive

Unions like string | boolean or string | integer lose the string variant:

# Boolean form
allow_failure: true

# Extended form (supported by GitLab but not modeled)
allow_failure:
  exit_codes: [137, 255]

C# Type: bool?

How it works:

  • The generator picks the non-string primitive type
  • The string variant is dropped
  • The extended object form is lost entirely

Trade-off: Significant type information is lost. Properties like allow_failure with exit_codes can't be expressed through the typed API. The Extensions dictionary can be used as a fallback.

Strategy 4: Object? for Complex Unions

When none of the above strategies apply, the fallback is object?:

public object? Include { get; set; }

This handles include: which can be a string, a list of strings, a list of objects, or a single object — too many variants to model cleanly.

Union Type Decision Tree

Diagram
The union-type resolver walks the property's oneOf shape through seven ordered checks, stopping at the first match and falling back to string only when no richer representation applies.

Deep Dive: The Builder Pattern in Detail

The generated builders are more sophisticated than simple property setters. They leverage the FrenchExDev.Net.Builder framework for async building, validation, reference tracking, and dictionary composition.

AbstractBuilder Base Class

Every generated builder inherits from AbstractBuilder<T>, which provides:

public abstract class AbstractBuilder<T> where T : class
{
    // Core build pipeline
    public async Task<Result<Reference<T>>> BuildAsync(
        CancellationToken cancellationToken = default);
    public async Task<Result<Reference<T>>> BuildAsync(
        VisitedObjects visitedObjects,
        CancellationToken cancellationToken = default);
    public T Build(); // Synchronous shorthand

    // Template methods for subclasses
    protected abstract Task<Result<ValidationResult>> ValidateAsync(
        CancellationToken cancellationToken = default);
    protected abstract Task<Result<Reference<T>>> Instantiate(
        Reference<T> reference,
        VisitedObjects visitedObjects,
        CancellationToken cancellationToken = default);
    protected abstract Exception BuildException(
        Result<ValidationResult> validationResult);
}

Reference for Circular Dependency Resolution

The Reference<T> wrapper enables building object graphs with circular references. For GitLab CI this isn't typically needed, but the framework supports it for other use cases:

var result = await builder.BuildAsync();

// Access the built object through the reference
var ciFile = result.ValueOrThrow().Resolved();

DictionaryBuilder<K, V, B> for Typed Collections

When a dictionary's value type has a corresponding builder, the generated code uses DictionaryBuilder<K, V, B>:

// Three ways to build the Jobs dictionary:

// 1. Direct dictionary assignment
builder.WithJobs(new Dictionary<string, GitLabCiJob>
{
    ["build"] = new GitLabCiJob { Script = new List<string> { "echo hi" } }
});

// 2. Dictionary builder callback
builder.WithJobs(jobs => jobs
    .With("build", buildJob => buildJob
        .WithScript(new List<string> { "npm ci" }))
    .With("test", testJob => testJob
        .WithScript(new List<string> { "npm test" })));

// 3. Individual job builder (accumulates across calls)
builder
    .WithJob("build", job => job.WithScript(new List<string> { "npm ci" }))
    .WithJob("test", job => job.WithScript(new List<string> { "npm test" }))
    .WithJob("deploy", job => job.WithScript(new List<string> { "deploy.sh" }));

The third form is particularly elegant — each WithJob call adds to the same internal DictionaryBuilder, allowing incremental construction across multiple fluent calls.

Validation Pipeline

Every property gets a virtual validation method:

// Generated in GitLabCiFileBuilder:
protected virtual IEnumerable<Exception>? ValidateSpec(GitLabCiSpec? value)
    => null;
protected virtual IEnumerable<Exception>? ValidateStages(List<object>? value)
    => null;
protected virtual IEnumerable<Exception>? ValidateStagesItem(
    object item, int index)
    => null;

The ValidateAsync override calls each validator and collects errors:

protected override Task<Result<ValidationResult>> ValidateAsync(
    CancellationToken cancellationToken = default)
{
    var result = new ValidationResult();
    var type = typeof(GitLabCiFileBuilder);

    foreach (var err in ValidateStages(Stages)
        ?? Array.Empty<Exception>())
        result.AddError(new MemberName(nameof(Stages), type), err);

    // Collection items validated individually
    if (Stages is not null)
    {
        var idx = 0;
        foreach (var item in Stages)
        {
            foreach (var err in ValidateStagesItem(item, idx)
                ?? Array.Empty<Exception>())
                result.AddError(
                    new MemberName($"Stages[{idx}]", type), err);
            idx++;
        }
    }

    // ... repeat for every property ...

    return Task.FromResult(Result<ValidationResult>.Success(result));
}

This design means:

  • Default: no validation — all validators return null
  • Custom validation via subclass — override any validator
  • Collection-level AND item-level — validate both the list and individual items
  • Non-throwing — errors are collected, not thrown
  • Extensible — add new validators without modifying generated code

Instantiation

The CreateInstance method is a simple object initializer:

protected virtual GitLabCiFile CreateInstance()
{
    return new GitLabCiFile
    {
        Spec = Spec,
        Image = Image,
        Services = Services,
        BeforeScript = BeforeScript,
        AfterScript = AfterScript,
        Variables = Variables,
        Cache = Cache,
        Default = Default,
        Stages = Stages,
        Include = Include,
        Pages = Pages,
        Workflow = Workflow,
        Jobs = Jobs,
        Extensions = Extensions,
    };
}

The virtual modifier allows subclasses to customize instantiation — for example, applying defaults or post-processing.


Deep Dive: Incremental Source Generator Mechanics

The GitLabCiBundleGenerator is an incremental source generator, not a traditional one. This distinction matters for IDE performance.

Traditional vs Incremental Generators

Traditional generators (ISourceGenerator) run on every keystroke in the IDE. For a project with 11 JSON schema files and 61 generated files, this would cause noticeable lag.

Incremental generators (IIncrementalGenerator) use a pipeline model. They declare what inputs they depend on, and Roslyn only re-runs the generator when those inputs change.

public void Initialize(IncrementalGeneratorInitializationContext context)
{
    // Declare dependency: only JSON files matching our pattern
    var schemaFiles = context.AdditionalTextsProvider
        .Where(static f =>
            Path.GetFileName(f.Path).StartsWith("gitlab-ci-") &&
            f.Path.EndsWith(".json"));

    // Register output: only runs when schema files change
    context.RegisterSourceOutput(schemaFiles.Collect(),
        static (ctx, files) => { /* generation logic */ });
}

Input Tracking

The generator depends on:

  • AdditionalTextsProvider — the JSON schema files
  • .Collect() — gathers all matching files into an array

Changes to C# source files, project references, or other non-schema files do not trigger regeneration. Only adding, removing, or modifying a gitlab-ci-*.json file causes the generator to re-run.

Cancellation

The generator checks ctx.CancellationToken in each loop iteration:

foreach (var file in files)
{
    ctx.CancellationToken.ThrowIfCancellationRequested();
    // ... parse schema
}

This is important for IDE responsiveness. If the user starts typing while the generator is running, Roslyn cancels the current run and schedules a new one.

Error Recovery

Rather than crashing silently, the generator catches all exceptions and emits a diagnostic comment file:

catch (System.Exception ex)
{
    ctx.AddSource("GenerateError.g.cs",
        SourceText.From(
            $"// Generator error: {ex.GetType().Name}: {ex.Message}\n" +
            $"// {ex.StackTrace?.Replace("\n", "\n// ")}\n",
            Encoding.UTF8));
}

This means:

  • The build doesn't fail mysteriously
  • The error is visible in the generated output
  • Developers can inspect obj/Generated/ to see exactly what went wrong

Inspecting Generated Code

The main library's .csproj includes:

<EmitCompilerGeneratedFiles>true</EmitCompilerGeneratedFiles>
<CompilerGeneratedFilesOutputPath>
    $(BaseIntermediateOutputPath)/Generated
</CompilerGeneratedFilesOutputPath>

This writes all generated .g.cs files to obj/Generated/. You can open these files in your IDE to see exactly what the generator produced — with full syntax highlighting, navigation, and debugging support.


Deep Dive: YamlDotNet Integration

GitLab.Ci.Yaml uses YamlDotNet for YAML serialization and deserialization. Understanding the integration points helps explain some design decisions.

Serializer Configuration

private static readonly ISerializer Serializer = new SerializerBuilder()
    .WithNamingConvention(UnderscoredNamingConvention.Instance)
    .ConfigureDefaultValuesHandling(
        DefaultValuesHandling.OmitNull |
        DefaultValuesHandling.OmitEmptyCollections)
    .DisableAliases()
    .Build();
Setting Effect Why
UnderscoredNamingConvention BeforeScriptbefore_script GitLab CI uses snake_case
OmitNull null properties not emitted Clean YAML output
OmitEmptyCollections Empty lists/dicts not emitted Clean YAML output
DisableAliases No YAML anchors/aliases Explicit, portable output

Deserializer Configuration

The reader uses two deserializers:

Raw deserializer (untyped):

var rawDeserializer = new DeserializerBuilder().Build();

No special configuration — just parse YAML into dictionaries and lists.

Job deserializer (typed):

private static readonly IDeserializer JobDeserializer =
    new DeserializerBuilder()
        .WithNamingConvention(UnderscoredNamingConvention.Instance)
        .WithTypeConverter(new StringOrListConverter())
        .IgnoreUnmatchedProperties()
        .Build();
Setting Effect Why
UnderscoredNamingConvention before_scriptBeforeScript Map YAML keys to C# properties
WithTypeConverter StringOrListConverter registered Handle script: "string"List<string>
IgnoreUnmatchedProperties Unknown YAML keys silently skipped Forward compatibility

Why Two Deserializers?

The flat root structure makes it impossible to deserialize directly to GitLabCiFile:

  1. Raw deserializer handles the flat structure — all keys become dictionary entries
  2. Job deserializer handles the typed structure — each job entry gets its own deserialization pass

The re-serialize/re-deserialize pattern for jobs:

// Convert untyped mapping back to YAML
var jobYaml = serializer.Serialize(kvp.Value);
// Then deserialize as typed GitLabCiJob
var job = JobDeserializer.Deserialize<GitLabCiJob>(jobYaml);

This works because YamlDotNet's serializer can handle Dictionary<object, object?> (the raw type) just fine, producing clean YAML that the typed deserializer can then parse.

StringOrListConverter: Edge Cases

The converter handles several edge cases:

  1. Null scalar → returns null
  2. Single string → wraps in List<string> { value }
  3. YAML sequence → reads all scalars into a list
  4. Nested sequences (multi-line commands) → skips via SkipThisAndNestedEvents()
  5. Writing → always emits as block sequence (never scalar)

The "always emit as list" decision means the round-trip isn't perfectly lossless — a single-line script: echo hello becomes:

script:
- echo hello

This is semantically identical but syntactically different. The trade-off favors consistency (always list) over minimal output.


Deep Dive: Cross-References with Other FrenchExDev.Net Libraries

GitLab.Ci.Yaml doesn't exist in isolation. It shares patterns and infrastructure with several other libraries in the FrenchExDev.Net ecosystem.

Shared Pattern: Four-Project Architecture

Library Attributes Design SourceGenerator Bundle/Library
BinaryWrapper [BinaryWrapper] CLI scraper Command tree → C# Typed CLI client
DockerCompose.Bundle [ComposeBundle] Schema downloader JSON Schema → C# Compose models
GitLab.Ci.Yaml [GitLabCiBundle] Schema downloader JSON Schema → C# CI models

All three follow the same decomposition. The source generators are structurally different (BinaryWrapper processes command trees, DockerCompose and GitLab process JSON Schemas), but the architectural pattern is identical.

Shared Infrastructure: Builder Framework

All three libraries use FrenchExDev.Net.Builder:

  • AbstractBuilder<T> — base class for fluent builders
  • DictionaryBuilder<K, V> — for building dictionaries inline
  • DictionaryBuilder<K, V, B> — for typed value builders
  • ValidationResult — for collecting validation errors
  • Reference<T> — for circular reference resolution
  • BuilderEmitter — for generating builder source code

The Builder.SourceGenerator.Lib project is referenced by all three source generators as an analyzer dependency. It provides the BuilderEmitModel and BuilderPropertyModel types that the generators populate, and the BuilderEmitter.Emit() method that produces the actual builder .g.cs code.

Shared Infrastructure: Wrapper.Versioning

The Design projects for all three libraries use FrenchExDev.Net.Wrapper.Versioning:

  • DesignPipelineRunner<T> — orchestrates version-based processing
  • GitHubReleasesVersionCollector / GitLabReleasesVersionCollector — discover versions from APIs
  • VersionFilters — filter strategies (latest patch per minor, etc.)
  • DesignPipeline<T> — composable middleware for download/save

Shared Infrastructure: Result Pattern

All libraries use FrenchExDev.Net.Result for error handling:

  • Result<T> — success or failure with error collection
  • No exceptions for expected failures
  • Composable with LINQ-like operations
  • Used in builders: BuildAsync() returns Result<Reference<T>>

See the Result Pattern post for details.

Shared Pattern: Version-Aware Code Generation

Both BinaryWrapper and GitLab.Ci.Yaml use multi-version merging with [SinceVersion] / [UntilVersion] annotations:

Aspect BinaryWrapper GitLab.Ci.Yaml
Version source CLI --help scraping JSON Schema download
Merge tool VersionDiffer.Merge() SchemaVersionMerger.Merge()
Annotations [SinceVersion], [UntilVersion] [SinceVersion], [UntilVersion]
Runtime check VersionGuard throws on mismatch Reflection-based check
Scope Commands + options Definitions + properties

Shared Pattern: Contributor/Composition

Both DockerCompose.Bundle and GitLab.Ci.Yaml use the contributor pattern:

// DockerCompose.Bundle
public interface IComposeFileContributor
{
    void Contribute(ComposeFile composeFile);
}

// GitLab.Ci.Yaml
public interface IGitLabCiContributor
{
    void Contribute(GitLabCiFile ciFile);
}

Same interface shape, same extension method pattern (.Apply()), same composition model. If you've used one, you already know how to use the other.

Diagram
Three code-generation libraries share the same four base components — builder, result, versioning, and emit helpers — which is why their surface APIs feel identical once you have learned one.

Updating to New GitLab Versions

When GitLab releases a new version (e.g., 18.11), here's the complete update workflow:

Step 1: Run the Design CLI

dotnet run --project src/FrenchExDev.Net.GitLab.Ci.Yaml.Design

The CLI queries the GitLab API, discovers the new v18.11.0-ee tag, downloads the schema, and saves it to schemas/gitlab-ci-v18.11.0.json.

Step 2: Rebuild

dotnet build src/FrenchExDev.Net.GitLab.Ci.Yaml

The source generator:

  1. Detects the new JSON file via AdditionalFiles
  2. Parses 12 schemas (was 11)
  3. Merges them into a unified schema
  4. Regenerates all 61+ .g.cs files
  5. New properties get [SinceVersion("18.11.0")]

Step 3: Run Tests

dotnet test test/FrenchExDev.Net.GitLab.Ci.Yaml.Tests

All 20 tests should pass — the merge algorithm only adds properties, never removes them.

Step 4: Inspect Changes

Check obj/Generated/ for new or modified files:

  • GitLabCiSchemaVersions.g.cs will have "18.11.0" added to _versions
  • Latest will be "18.11.0"
  • Any new properties will appear with [SinceVersion("18.11.0")]

Step 5: Commit

git add src/FrenchExDev.Net.GitLab.Ci.Yaml/schemas/gitlab-ci-v18.11.0.json
git commit -m "Add GitLab CI schema v18.11.0"

That's it. Five commands, zero manual code changes. The entire API surface updates automatically.


Build Time

The source generator processes 11 JSON schemas (each ~50-100KB) and emits 61 C# files. Typical generation time: < 500ms. This is fast because:

  • JSON parsing uses System.Text.Json (fast, low-allocation)
  • String building uses StringBuilder (no string concatenation)
  • Incremental generation — only re-runs when schemas change
  • No Roslyn syntax tree construction — output is raw strings

Runtime Performance

At runtime, the library has:

  • Zero reflection — all types are generated at compile time
  • Zero schema parsing — no JSON processing at runtime
  • Minimal allocations — models are simple POCOs
  • Fast serialization — YamlDotNet with pre-built serializer/deserializer instances

Memory Footprint

A typical GitLabCiFile with 5 jobs uses:

  • ~2KB for the model object graph
  • ~5KB for the serialized YAML string
  • No caching or pooling needed

1. Incomplete Union Type Support

Some union types lose information:

  • allow_failure: { exit_codes: [137] } → can't be expressed (only bool? available)
  • include: "file.yml" → can't be distinguished from include: [{ local: "file.yml" }]
  • retry: { max: 2, when: ["runner_system_failure"] } → mapped to Dictionary<string, object?>

Workaround: Use the Extensions dictionary for unsupported forms.

2. Template Keys Not Preserved

YAML keys starting with . (templates) are skipped during deserialization:

.base_job:
  image: node:20
  tags: [docker]

build:
  extends: .base_job
  script: [npm ci]

The .base_job template won't appear in the parsed GitLabCiFile. Job extends references will be preserved as strings.

3. No YAML Comment Preservation

Reading and writing YAML discards comments. This is a fundamental limitation of YamlDotNet.

4. Root-Level Reserved Key Ordering

The writer always emits reserved keys in a fixed order (stages, variables, include, default, workflow). The original ordering from a parsed YAML file is not preserved.

5. Object-Typed Properties

Properties mapped to object? (like Include) or List<object> (like Stages) lose type safety. These are the schema's most complex union types where no clean C# representation exists.


⬇ Download