Part 09: requirements-behavioral-check

What it reifies

@frenchexdev/requirements-behavioral-check is the only Tier-1 sibling whose claim is behavioural, not structural. The other analyzers walk the AST; this one runs the code. Specifically, it mutates the code, runs the tests, and reports whether the mutation was killed (the tests caught the bug) or survived (the tests did not catch the bug). The package is the engine behind a single but consequential question: given a passing test, would it actually fail if the code it claims to verify were broken?

The maintainer's position on why this matters is recorded in project_behavioral_grounding_requirements.md: AST analysis can confirm that a @Verifies method exists and contains an expect call, but only mutation testing can confirm that the expect is checking the right invariant against the right input. Static analysis moves the declarative lie one level deeper; mutation breaks the chain.

The package's package.json description: "Behavioral-check via mutation testing — orchestrates Stryker scoped per Feature.AC. Three pure layers (plan / build-config / classify-verdict)."

The public surface

The src/index.ts is one line, re-exporting the single capability module:

export * from './behavioral-check';

The module exposes three pure layers that compose into a behavioural run. The plan layer:

import type { BindingsManifest, SymbolTarget } from '@frenchexdev/requirements-scanner';

export interface MutationPlan {
  featureId: string;
  acName: string;
  mutate: SymbolTarget[];
  testFiles: string[];
  testFilter: string;
}

export declare function planBehavioralCheck(
  manifest: BindingsManifest,
  selection?: { feature?: string; ac?: string },
): MutationPlan[];

planBehavioralCheck is the pure planner. It takes the scanner's manifest and produces one MutationPlan per Feature.AC pair: which symbols to mutate (the ones the test method actually calls), which test files to run against the mutated code, and which test filter to pass to Stryker so only the relevant tests run. No I/O, no Stryker, no surprise.

The build-config layer:

import type { StrykerOptions } from '@stryker-mutator/api/core';

export declare function buildStrykerConfig(
  plan: MutationPlan,
  baseConfig: Partial<StrykerOptions>,
): StrykerOptions;

buildStrykerConfig takes a plan and a base Stryker config (usually stryker.conf.mjs for the package) and produces the per-Feature.AC config object the runner consumes. Still pure — config in, config out.

The classify-verdict layer:

import type { MutantResult } from '@stryker-mutator/api/core';

export type Verdict = 'killed' | 'survived' | 'timed-out' | 'no-coverage' | 'inconclusive';

export interface AcVerdict {
  featureId: string;
  acName: string;
  verdict: Verdict;
  mutants: MutantResult[];
  killRate: number;
}

export declare function classifyVerdict(
  plan: MutationPlan,
  mutants: MutantResult[],
): AcVerdict;

export declare function aggregateByAc(
  verdicts: AcVerdict[],
): Record<string, Record<string, AcVerdict>>;

classifyVerdict takes the Stryker output for one plan and decides what to call it. aggregateByAc collapses per-mutant results into a Feature × AC matrix the compliance reporter can render.

The three layers compose: a caller plans, builds a config per plan, hands the config to Stryker, takes the output, and classifies. Each layer is pure; only the consumer that wires them invokes the Stryker runner.

Where it sits

Tier 1, sibling of scanner and test-smells. Depends on requirements-scanner (for the manifest the planner walks), on requirements-requirements (for the vocabulary), and — uniquely in the family — on three @stryker-mutator/* packages as direct runtime dependencies:

"dependencies": {
  "@frenchexdev/requirements-core": "workspace:*",
  "@frenchexdev/requirements-scanner": "workspace:*",
  "@stryker-mutator/core": "^9.6.1",
  "@stryker-mutator/vitest-runner": "^9.6.1",
  "@stryker-mutator/typescript-checker": "^9.6.1",
  "@frenchexdev/requirements-requirements": "workspace:*"
}

That classification — Stryker as dependencies, not devDependencies — is a deliberate correction the roadmap calls out. Pre-split, requirements-lib listed Stryker as devDeps because it conflated "the lib runs Stryker in its own tests" with "the lib orchestrates Stryker for consumers". The first is dev-only; the second is the lib's runtime job. Phase 1c of the roadmap corrects the misclassification as part of the extraction: orchestrating Stryker IS the package's job, so its plugins are runtime deps.

Two things the package must not do:

Run mutations itself. The three layers plan, configure, and classify; the actual Stryker invocation happens in the consumer (the CLI's behavioral-check subcommand, or a CI plugin). That separation keeps the package testable without a Stryker runtime.
Embed a Stryker config. The base config is supplied by the caller; this package only enriches it per-plan. A project with custom test runners (jest instead of vitest, jasmine instead of either) supplies its own base and the package adapts.

A concrete call-site

The CLI's behavioral-check subcommand is the canonical caller:

import {
  planBehavioralCheck,
  buildStrykerConfig,
  classifyVerdict,
  aggregateByAc,
} from '@frenchexdev/requirements-behavioral-check';
import { scanTestBindings } from '@frenchexdev/requirements-scanner';
import { Stryker } from '@stryker-mutator/core';
import { fs } from '@frenchexdev/requirements-core/ports';

const manifest = await scanTestBindings(fs, { testDir: 'test', srcDir: 'src' });
const baseConfig = await loadStrykerConfig('stryker.conf.mjs');

const plans = planBehavioralCheck(manifest, { feature: 'COMPLIANCE-CORE' });

const verdicts = [];
for (const plan of plans) {
  const config = buildStrykerConfig(plan, baseConfig);
  const stryker = new Stryker(config);
  const mutants = await stryker.runMutationTest();
  verdicts.push(classifyVerdict(plan, mutants));
}

const matrix = aggregateByAc(verdicts);

The pure layers are everywhere except the stryker.runMutationTest() call. The caller chooses to loop over plans serially (as shown) or to drive them in parallel; the package does not mandate a strategy. The verdict matrix feeds back into the compliance reporter for the next run.

Why it is its own package

Three arguments, the first being the strongest.

First, its dependency surface is incompatible with the rest of Tier 1. The @stryker-mutator/* plugins drag in their own peer-dependency cone — a different version of mocha, the Stryker plugin interface, the mutation operator catalogue, the source-map tooling. None of that is needed by the scanner, by test-smells, by compliance, by trace, or by spec-io. Bundling Stryker into requirements-lib made every consumer of any analyzer pay for the mutation infrastructure. Extracting it isolates the cost: only consumers that want to run mutations install the plugins.

Second, the three-layer separation is its own teaching tool. The pre-split behavioral-check.ts mixed planning, config-building, and Stryker invocation in a single function. After the split, the three layers are physical and visible in the exports: planning is pure data transformation, config-building is pure enrichment, classification is pure interpretation. Anyone reading the package learns the pattern: separate what to do (plan) from how to do it (config) from what it meant (verdict). That pattern recurs in requirements-sync (Part 14) and in requirements-wizards (Part 15).

Third, mutation testing is an opt-in workflow. A solo developer iterating on a feature does not run mutation tests on every save; they run them before publishing or before merging a refactor. The package's npm scripts do not run mutations by default — there is no behavioral-check step in npm test. The CLI exposes a dedicated requirements behavioral-check subcommand that the user invokes explicitly, with explicit scoping (--feature COMPLIANCE-CORE, --ac orphanDetectionFiresOnUnboundFeature). Keeping the package out of the default install path means it does not slow down the inner loop.

The package is also the natural future home for the "behavioural-grounding requirement" described in project_behavioral_grounding_requirements.md: a future requirements behavioral-check --strict subcommand that walks the report and demands every AC has a mutation kill-rate above a threshold. That work is in plan, scoped to a single AC of the requirements-lib equivalent, and will land here.

The next page covers the package that turns the manifest into a navigable picture: requirements-trace.

`[` or `Alt+S`	Focus sidebar navigation
`]` or `Alt+C`	Focus main content
`↑` `↓`	Navigate between sidebar items
`Enter`	Open page / toggle section
`Space`	Toggle section expand/collapse
`Escape`	Close overlay / sidebar

`Ctrl+K`	Open search
`?`	Show this help

`Ctrl+=` or `Ctrl+↑`	Increase font size
`Ctrl+−` or `Ctrl+↓`	Decrease font size
`f`	Open console font selector

`Ctrl+⇧+=` or `Ctrl+⇧+↑`	Browser zoom in
`Ctrl+⇧+−` or `Ctrl+⇧+↓`	Browser zoom out
`Ctrl+⇧+0`	Reset browser zoom

`Tab`	Focus a diagram or image
`Enter`	Open full size overlay
`+` `−`	Zoom in / out (in overlay)
`Escape`	Close overlay, return focus

Part 09: requirements-behavioral-check📋

What it reifies📋

The public surface📋

Where it sits📋

A concrete call-site📋

Why it is its own package📋