Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

Multi-Version Schema Merging

A team running multiple K8s minors needs C# code that compiles against all of them. A team that uses Argo Rollouts needs the same code to compile against v1.5 in legacy and v1.7.2 in production. Both problems are the same problem: a property may exist in some versions and not others, and may change shape across versions. The solution is also the same: per-property [SinceVersion] and [UntilVersion] annotations driven by a merger that walks every input schema and tracks property lineage.

This chapter explains how SchemaVersionMerger (reused from GitLab.Ci.Yaml) is adapted to operate on Kubernetes inputs across two layers — core K8s minors and CRD bundle tags — with one consistent versioning identifier scheme.

The thesis: CRDs are not special

CRDs are typed the same way core K8s types are typed. Same merger. Same annotations. Same analyzer. The only difference is the versioning identifier format: core K8s uses bare semver minors (1.27, 1.28, 1.31), CRDs use bundle-prefixed tags (argo-rollouts/v1.7.0, keda/v2.14.0, cert-manager/v1.15.0).

Why prefix? Because CRD bundles version independently of K8s itself. Argo Rollouts v1.7.2 may target K8s 1.27 → 1.31, KEDA v2.14.0 may target 1.28 → 1.31, and they're not comparable on the same semver axis. The bundle prefix gives each ecosystem its own version space.

Diagram
One merger, two version-identifier families — core K8s minors and CRD bundle tags — collapse into a single unified schema with per-property lineage.

What SchemaVersionMerger does, in one diagram

For a given type (say V1Pod), the merger walks every input version of that type, tracks each property's first appearance, last appearance, and any shape changes between, and emits one merged property record with the lineage. Then OpenApiV3SchemaEmitter (Part 6) reads the merged record and emits one C# property with the appropriate attributes.

Diagram
For a single property, the merger walks every input minor, records when each shape variant appeared, and emits one discriminated union with per-variant lineage.

The merger does not care that the inputs came from .json files. Same code works for .yaml CRD inputs after the dispatcher has converted them to JsonNode.

The data model

// Kubernetes.Dsl.SourceGenerator/Schema/UnifiedSchema.cs
public sealed record UnifiedSchema(
    IReadOnlyList<UnifiedType> Types);

public sealed record UnifiedType(
    string Namespace,           // "Kubernetes.Dsl.Api.Core.V1"
    string Name,                // "V1Pod"
    string ApiVersion,          // "v1" or "argoproj.io/v1alpha1"
    string Kind,                // "Pod"
    VersionRange Lineage,       // when this type itself existed
    bool IsStorageVersion,      // CRDs only
    IReadOnlyList<UnifiedProperty> Properties);

public sealed record UnifiedProperty(
    string Name,
    UnifiedTypeRef Type,
    bool IsRequired,
    object? Default,
    VersionRange Lineage,       // when this property existed in this type
    DeprecationInfo? Deprecation);

public sealed record VersionRange(
    VersionId Since,
    VersionId? Until);          // null = still present

public sealed record VersionId(
    string Scheme,              // "k8s" or "crd"
    string Bundle,              // "" for k8s, "argo-rollouts" for CRDs
    string Version);            // "1.27" or "v1.7.2"
{
    public override string ToString() => Scheme switch
    {
        "k8s" => Version,                  // "1.27"
        "crd" => $"{Bundle}/{Version}",    // "argo-rollouts/v1.7.2"
        _ => $"{Scheme}:{Bundle}/{Version}"
    };
}

public sealed record DeprecationInfo(VersionId Since, string Reason);

VersionRange carries everything [SinceVersion] and [UntilVersion] need. VersionId.ToString() is the format the attributes use, and the format the [KubernetesBundle] attribute's TargetClusterCompatibility accepts.

The merger algorithm (sketch)

// Kubernetes.Dsl.SourceGenerator/Schema/SchemaVersionMerger.cs
public static class SchemaVersionMerger
{
    public static UnifiedSchema MergeCore(
        IReadOnlyList<ParsedSchema> coreSchemas,
        KubernetesBundleConfig config)
    {
        // Group by (apiVersion, kind), sort each group by k8s minor.
        var byKind = coreSchemas
            .SelectMany(s => EnumerateTypes(s, scheme: "k8s"))
            .GroupBy(t => (t.ApiVersion, t.Kind));

        var merged = new List<UnifiedType>();

        foreach (var group in byKind)
        {
            var ordered = group.OrderBy(t => t.SourceVersion).ToList();
            var first = ordered[0];
            var last = ordered[^1];

            var mergedProps = MergeProperties(ordered);

            merged.Add(new UnifiedType(
                Namespace: NamespaceFor(group.Key.ApiVersion),
                Name: NameFor(group.Key.ApiVersion, group.Key.Kind),
                ApiVersion: group.Key.ApiVersion,
                Kind: group.Key.Kind,
                Lineage: new VersionRange(first.SourceVersion, last.SourceVersion),
                IsStorageVersion: false,
                Properties: mergedProps));
        }

        return new UnifiedSchema(merged);
    }

    public static UnifiedSchema MergeCrds(
        IEnumerable<(string Path, CrdSchemaSlice Slice)> crdSlices,
        KubernetesBundleConfig config)
    {
        // Group by (group, kind, version) to merge across bundle tags.
        // CrdSchemaSlice.Group + Kind + Version come from the envelope walker.
        // The bundle name and tag come from the file path.
        var typed = crdSlices.Select(pair =>
        {
            var (bundleName, tag) = ParseBundlePath(pair.Path);
            var slice = pair.Slice;
            return new
            {
                Slice = slice,
                Source = new VersionId("crd", bundleName, tag),
            };
        }).ToList();

        var byKind = typed.GroupBy(x => (
            ApiGroup: x.Slice.Group,
            Kind: x.Slice.Kind,
            x.Slice.Version));

        var merged = new List<UnifiedType>();

        foreach (var group in byKind)
        {
            var ordered = group.OrderBy(x => x.Source.Version, NaturalSemver).ToList();
            var first = ordered[0];
            var last = ordered[^1];

            var mergedProps = MergeCrdProperties(ordered);

            merged.Add(new UnifiedType(
                Namespace: CrdNamespaceFor(group.Key.ApiGroup),
                Name: NameFor(group.Key.Kind, group.Key.Version),
                ApiVersion: $"{group.Key.ApiGroup}/{group.Key.Version}",
                Kind: group.Key.Kind,
                Lineage: new VersionRange(first.Source, last.Source),
                IsStorageVersion: ordered.Any(x => x.Slice.IsStorageVersion),
                Properties: mergedProps));
        }

        return new UnifiedSchema(merged);
    }

    private static IReadOnlyList<UnifiedProperty> MergeProperties(
        IReadOnlyList<TypedSchema> ordered)
    {
        var byName = new Dictionary<string, List<TypedSchema>>();
        foreach (var v in ordered)
            foreach (var prop in v.Properties)
                (byName[prop.Name] ??= new()).Add(v);

        var result = new List<UnifiedProperty>();

        foreach (var (name, presence) in byName)
        {
            var firstSeen = presence[0].SourceVersion;
            var lastSeen = presence[^1].SourceVersion;
            var stillPresent = lastSeen.Equals(ordered[^1].SourceVersion);
            var until = stillPresent ? null : (VersionId?)lastSeen;

            // Use the latest schema definition for the type/required/default,
            // since a later version is the truth.
            var latest = presence[^1].PropertyByName(name);

            result.Add(new UnifiedProperty(
                Name: name,
                Type: latest.TypeRef,
                IsRequired: latest.IsRequired,
                Default: latest.Default,
                Lineage: new VersionRange(firstSeen, until),
                Deprecation: DetectDeprecation(presence, latest)));
        }

        return result;
    }
}

The actual code is longer (it handles oneOf/allOf lineage, type changes between versions, default-value changes, and the nullable: true flip), but the shape is the same as the GitLab.Ci.Yaml.SourceGenerator.SchemaVersionMerger it's adapted from. The reuse here is genuine: the merger doesn't care whether the lineage units are GitLab versions, K8s minors, or CRD bundle tags. It only cares that they sort.

Generated annotations

For the V1Pod.Spec.Containers[].Lifecycle.PreStop example above, the merger produces:

// V1Container.g.cs (excerpt, generated)
public sealed partial class V1Container
{
    [SinceVersion("1.27")]
    public V1LifecycleHandler? Lifecycle { get; set; }
    // ...
}

public sealed partial class V1LifecycleHandler
{
    [SinceVersion("1.27")]
    public V1ExecAction? Exec { get; set; }

    [SinceVersion("1.28")]
    public V1HttpGetAction? HttpGet { get; set; }

    [SinceVersion("1.29")]
    public V1TcpSocketAction? TcpSocket { get; set; }

    [SinceVersion("1.29")]
    public V1SleepAction? Sleep { get; set; }
}

For a CRD example — say a property that was renamed in argo-rollouts/v1.7.0:

// V1Alpha1RolloutSpec.g.cs (excerpt, generated)
public sealed partial class V1Alpha1RolloutSpec
{
    [SinceVersion("argo-rollouts/v1.5.0")]
    [UntilVersion("argo-rollouts/v1.6.999")]
    [Deprecated("Renamed to ProgressDeadlineSeconds in argo-rollouts/v1.7.0")]
    public int? ProgressDeadlineAbortSeconds { get; set; }

    [SinceVersion("argo-rollouts/v1.7.0")]
    public int? ProgressDeadlineSeconds { get; set; }
}

The [Deprecated] carries human-readable context lifted from the upstream CRD's schema description (when present) or auto-generated from the lineage.

The [StorageVersion] flag

CRDs declare a "storage version" — the version the API server persists to etcd. Multiple served: true versions can exist simultaneously (the API server handles conversion via webhooks), but only one is the storage version. The merger lifts this flag from the envelope:

// Kubernetes.Dsl.Crds.ArgoProj.V1Alpha1.V1Alpha1Rollout (generated)
[KubernetesResource(ApiVersion = "argoproj.io/v1alpha1", Kind = "Rollout")]
[CustomResourceDefinition("rollouts.argoproj.io")]
[StorageVersion]
public sealed partial class V1Alpha1Rollout : IKubernetesObject<V1ObjectMeta>
{
    public V1ObjectMeta Metadata { get; set; } = new();
    public RolloutSpec Spec { get; set; } = new();
}

KUB082 (a CRD analyzer in Part 11) flags any code that constructs a non-storage-version variant for a write operation that needs to persist — which is most operations. Storage version selection becomes a compile-time concern instead of a footgun.

The [KubernetesBundle].TargetClusterCompatibility story

Every [SinceVersion] and [UntilVersion] annotation feeds KUB020, the deprecated-apiVersion analyzer. The user declares which clusters they target on their assembly:

[assembly: KubernetesBundle(
    Groups = "core/v1, apps/v1, networking.k8s.io/v1",
    KubernetesVersion = "1.31",
    Crds = new[] { "argo-rollouts" },
    TargetClusterCompatibility = new[] { "1.27", "1.30", "1.31", "argo-rollouts/v1.6.0", "argo-rollouts/v1.7.2" })]

The analyzer (Part 11) cross-references every property access against this compatibility set:

User code Analyzer behavior
pod.Spec.Containers[0].Lifecycle.Sleep = ... (since 1.29) KUB020 warning: "property requires k8s 1.29+, but target compatibility includes 1.27"
rollout.Spec.ProgressDeadlineAbortSeconds = ... (until argo-rollouts/v1.6.999) KUB020 warning: "property removed in argo-rollouts/v1.7.0, but target compatibility includes argo-rollouts/v1.7.2"
pod.Spec.Containers[0].Image = ... (always present) OK

The user can suppress on a per-call basis or fix the code. Either way, the drift becomes visible at compile time instead of breaking on apply.

Performance

The merger is the most expensive single phase of the SG. Mitigations:

  1. Group-by once. All input schemas are grouped by (apiVersion, kind) (or (group, kind, version) for CRDs) in one pass.
  2. Per-type caching. Types whose underlying schemas haven't changed across builds are reused from the SG's incremental cache.
  3. Lazy property merging. MergeProperties only walks the properties of types actually requested by the user's [KubernetesBundle].Groups filter.

For the v0.1 slice (~80 types × 5 K8s minors = ~400 type-versions), the merger runs in well under 100 ms in cold-start scenarios. For v0.5 (full ~600 types × 5 minors + ~150 CRD types × ~3 tags = ~3450 type-versions), it's a few hundred milliseconds. Part 8 has the actual benchmark numbers.


Previous: Part 4: SchemaInputReader — Parsing YAML and JSON Into One Tree Next: Part 6: Code Emission and Special Types

⬇ Download