Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

Schema Acquisition

Schemas are not downloaded at build time. They are downloaded once, checked into git, then read locally on every subsequent build. This chapter walks through where each schema comes from upstream, the refresh CLI that fetches them, and the _sources.json audit trail.

The thesis: schemas are content, not dependencies

A NuGet package versions code. A schemas/ directory in the repo versions data. The data — K8s OpenAPI dumps and CRD YAML — is small (~5 MB for the v0.1 slice, ~40 MB at v0.5 with full multi-version), readable, diffable, and pinnable by SHA-256. There is no value in fetching it on every CI run. There is significant value in seeing the diff in a PR when somebody bumps a CRD bundle from v1.7.0 to v1.7.2.

Diagram
Schemas are repo content, not dependencies — the network is touched on demand by the downloader, never by the build.

Refreshing schemas is rare (a K8s minor every ~3 months, a CRD bundle bump on demand). Building from existing schemas is what 99% of dotnet build invocations do.

Source URLs

Asset Upstream URL pattern Format
Core OpenAPI v3, per API group https://raw.githubusercontent.com/kubernetes/kubernetes/release-{minor}/api/openapi-spec/v3/{group-path}_{version}_openapi.json JSON
Argo Rollouts CRDs https://raw.githubusercontent.com/argoproj/argo-rollouts/{tag}/manifests/crds/rollout-crd.yaml YAML
Prometheus Operator CRDs https://github.com/prometheus-operator/prometheus-operator/raw/{tag}/example/prometheus-operator-crd/* YAML
KEDA CRDs https://github.com/kedacore/keda/raw/{tag}/config/crd/bases/* YAML
cert-manager CRDs https://github.com/cert-manager/cert-manager/releases/download/{tag}/cert-manager.crds.yaml YAML
Gatekeeper CRDs https://github.com/open-policy-agent/gatekeeper/raw/{tag}/config/crd/bases/* YAML
Istio security CRDs https://raw.githubusercontent.com/istio/istio/{tag}/manifests/charts/base/files/crd-all.gen.yaml YAML
Litmus Chaos CRDs https://github.com/litmuschaos/chaos-operator/raw/{tag}/deploy/crds/* YAML

All upstreams are Apache-2.0 licensed. The downloader writes a LICENSE-NOTICES file alongside the schemas with attribution.

Each schema is stored on disk in its native upstream format. Core K8s OpenAPI dumps are JSON because kubernetes/kubernetes ships them as JSON. CRD bundles are YAML because every CRD project ships them as YAML. There is no normalization step; the format dispatcher in the SG (Part 4) handles both.

The downloader CLI + library

Kubernetes.Dsl.Schemas.Downloader is a small netstandard2.0 library. Kubernetes.Dsl.Design is the console front-end that calls it. Neither is distributed as a NuGet package — they're run from source by maintainers refreshing schemas.

// Kubernetes.Dsl.Schemas.Downloader public surface
public interface ISchemaDownloader
{
    Task<DownloadResult> DownloadCoreAsync(KubernetesMinor minor, string outputDir, CancellationToken ct);
    Task<DownloadResult> DownloadCrdBundleAsync(CrdBundleSpec bundle, string outputDir, CancellationToken ct);
}

public sealed record CrdBundleSpec(string Name, string Tag, IReadOnlyList<Uri> CrdUrls);

public sealed record DownloadResult(
    IReadOnlyList<string> WrittenFiles,
    string Sha256,
    Uri SourceUri,
    string LicenseSpdx);

The CRD download path writes the YAML as-is. No parsing. No transformation. SHA-256 is computed over the raw bytes:

public async Task<DownloadResult> DownloadCrdBundleAsync(
    CrdBundleSpec bundle, string outputDir, CancellationToken ct)
{
    var written = new List<string>();

    foreach (var url in bundle.CrdUrls)
    {
        var bytes = await _http.GetByteArrayAsync(url, ct);
        var fileName = Path.GetFileName(url.LocalPath); // e.g., rollout-crd.yaml
        var targetPath = Path.Combine(outputDir, bundle.Name, bundle.Tag, fileName);
        Directory.CreateDirectory(Path.GetDirectoryName(targetPath)!);
        await File.WriteAllBytesAsync(targetPath, bytes, ct);
        written.Add(targetPath);
    }

    var sha256 = ComputeBundleSha256(written);
    UpdateSourcesJson(bundle, written, sha256);
    return new DownloadResult(written, sha256, bundle.CrdUrls[0], "Apache-2.0");
}

CLI usage

# Add a new K8s minor
dotnet run --project Kubernetes.Dsl.Design -- fetch --k8s 1.32

# Add a CRD bundle pinned to a tag
dotnet run --project Kubernetes.Dsl.Design -- fetch --crd argo-rollouts@v1.7.2

# Add multiple tags of the same CRD bundle (for multi-version Since/Until annotations)
dotnet run --project Kubernetes.Dsl.Design -- fetch --crd argo-rollouts@v1.5.0,v1.6.0,v1.7.0,v1.7.2

# Refresh everything in _sources.json
dotnet run --project Kubernetes.Dsl.Design -- fetch --all

# Verify checksums of checked-in schemas (CI-friendly, no network)
dotnet run --project Kubernetes.Dsl.Design -- verify

# Verify + check that upstream URLs still resolve (HEAD requests)
dotnet run --project Kubernetes.Dsl.Design -- verify --check-upstream

# Add an in-house CRD (no URL, no checksum enforcement)
dotnet run --project Kubernetes.Dsl.Design -- fetch --crd-file ./acme-widget-crd.yaml

Refresh sequence

Diagram
Refreshing schemas is an explicit, developer-driven ritual — one HTTPS round-trip per API group, then git commit, then every future build is offline.

The _sources.json audit trail

Every fetched file gets a row in schemas/_sources.json with its URL, tag, SHA-256, license, fetch date, and format:

{
  "schemas/k8s/1.31/api.v1.json": {
    "url":     "https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.31/api/openapi-spec/v3/api__v1_openapi.json",
    "tag":     "release-1.31",
    "sha256":  "9f1c3a8b...c2",
    "license": "Apache-2.0",
    "fetched": "2026-04-07",
    "kind":    "openapi-v3",
    "format":  "json"
  },
  "schemas/crds/argo-rollouts/v1.7.2/rollout-crd.yaml": {
    "url":     "https://raw.githubusercontent.com/argoproj/argo-rollouts/v1.7.2/manifests/crds/rollout-crd.yaml",
    "tag":     "v1.7.2",
    "sha256":  "1de8a9f4...44",
    "license": "Apache-2.0",
    "fetched": "2026-04-07",
    "kind":    "crd-yaml",
    "format":  "yaml",
    "served":  ["v1alpha1"],
    "storage": "v1alpha1"
  },
  "schemas/crds/local/acme-widget-crd.yaml": {
    "local":   true,
    "kind":    "crd-yaml",
    "format":  "yaml",
    "served":  ["v1"],
    "storage": "v1"
  }
}

The format field is informational only — the SG dispatches on file extension (Part 4). The served and storage fields exist for CRDs and are populated at fetch time by reading the YAML envelope once.

The local: true shortcut bypasses URL and SHA-256 enforcement. In-house CRDs that you maintain in your own repo don't need upstream tracking.

CI verification

The verify command runs in CI with no network access (default). It checks:

Check Failure mode Exit code
Every file in schemas/**/*.json and schemas/**/*.yaml has an entry in _sources.json "Untracked schema file: schemas/k8s/1.31/foo.json" 2
Every entry in _sources.json exists on disk "Missing schema: schemas/k8s/1.31/api.v1.json (declared in _sources.json)" 2
SHA-256 of each file matches _sources.json "Checksum mismatch: schemas/k8s/1.31/api.v1.json (someone hand-edited?)" 3
_sources.json itself is valid JSON with required fields "Malformed _sources.json: missing 'license' field" 4
Every .yaml/.yml CRD file is parseable by YamlDotNet "Invalid YAML: schemas/crds/argo-rollouts/v1.7.2/rollout-crd.yaml at line 47 col 12" 5
Every CRD YAML has at least one served: true version with schema.openAPIV3Schema set "No served versions with schema in schemas/crds/argo-rollouts/v1.7.2/rollout-crd.yaml" 6
(opt-in via --check-upstream) URLs still resolve (HEAD 200) "Upstream gone: https://...rollout-crd.yaml (404)" 7

The optional --check-upstream flag is for a scheduled job that catches link rot. The default verify is a hermetic check that runs in seconds.

Checked-in repo layout

schemas/                                            (committed to git, NOT in obj/)
├── _sources.json
├── LICENSE-NOTICES
├── k8s/
│   ├── 1.27/
│   │   ├── api.v1.json
│   │   ├── apis.apps.v1.json
│   │   ├── apis.networking.k8s.io.v1.json
│   │   ├── apis.rbac.authorization.k8s.io.v1.json
│   │   ├── apis.batch.v1.json
│   │   ├── apis.autoscaling.v2.json
│   │   └── ...
│   ├── 1.28/ ...
│   ├── 1.29/ ...
│   ├── 1.30/ ...
│   └── 1.31/ ...
└── crds/
    ├── argo-rollouts/
    │   ├── v1.5.0/rollout-crd.yaml
    │   ├── v1.6.0/rollout-crd.yaml
    │   ├── v1.7.0/rollout-crd.yaml
    │   └── v1.7.2/rollout-crd.yaml
    ├── prometheus-operator/v0.75.0/servicemonitor.yaml
    ├── prometheus-operator/v0.75.0/prometheusrule.yaml
    ├── keda/v2.14.0/scaledobject.yaml
    ├── cert-manager/v1.15.0/certificate.yaml
    ├── gatekeeper/v3.16.0/constrainttemplate.yaml
    ├── istio/v1.22.0/peerauthentication.yaml
    └── litmus/v3.10.0/chaosengine.yaml

Refresh workflow summary

Trigger Command Touches network? Touches cluster?
Bump K8s minor dotnet run --project Kubernetes.Dsl.Design -- fetch --k8s 1.32 yes (github.com) no
Add CRD bundle dotnet run --project Kubernetes.Dsl.Design -- fetch --crd argo-rollouts@v1.7.2 yes (github.com) no
Refresh everything dotnet run --project Kubernetes.Dsl.Design -- fetch --all yes (github.com) no
Verify checksums (CI) dotnet run --project Kubernetes.Dsl.Design -- verify no no
Normal dotnet build (nothing) no no

In-house CRDs

Real teams have private CRDs. The plan: drop the CRD YAML in schemas/crds/local/ (or any directory), add a local: true row to _sources.json, the SG picks it up like any other CRD. The downloader has a helper:

dotnet run --project Kubernetes.Dsl.Design -- fetch --crd-file ./acme-widget-crd.yaml

This copies the file into schemas/crds/local/, computes its SHA-256, adds the row, and lets the SG ingest it on the next build. Same [SinceVersion]/[UntilVersion] story as upstream CRDs (Part 5) — drop multiple versions in schemas/crds/local/{tag}/ and the merger annotates per-property changes.


Previous: Part 2: High-Level Architecture — Two Tracks, Eight Projects Next: Part 4: SchemaInputReader — Parsing YAML and JSON Into One Tree

⬇ Download