Part II: BDD Frameworks — Cucumber, Gherkin, SpecFlow

Behavior-Driven Development is the closest thing to typed specifications that most teams have encountered. Both put requirements in a structured format. Both link them to tests. The difference is where the structure lives: in a plain-text file parsed at runtime, or in the type system checked at compile time.

What BDD Is

BDD is a development methodology — not just a testing tool — that emerged from Dan North's work in 2003. Its core insight: specifications should be written in a language that developers, testers, and business stakeholders all understand. The tooling (Cucumber, SpecFlow, behave, Behat) implements this through .feature files written in Gherkin syntax and step definitions that execute the scenarios.

The promise: living documentation that is both human-readable and machine-executable.

How It Links Requirements to Tests

A .feature file describes a feature and its scenarios using structured natural language:

# navigation.feature
Feature: SPA Navigation and Deep Links
  Users navigate between pages using the table of contents,
  browser history, and direct URLs.

  Scenario: TOC click loads the corresponding page
    Given I am on the home page
    When I click the TOC item "Binary Wrapper"
    Then the page content should contain "BinaryWrapper"
    And the URL should contain "binary-wrapper"

  Scenario: Back button restores previous page
    Given I am on the home page
    And I navigate to "Binary Wrapper"
    And I navigate to "Skills"
    When I press the browser back button
    Then the page content should contain "BinaryWrapper"

  Scenario: Direct URL loads correct page
    Given I navigate directly to "/content/blog/binary-wrapper.html"
    Then the page content should contain "BinaryWrapper"
    And the URL should be bookmarkable

Step definitions implement each Given/When/Then line:

// steps/navigation.steps.ts
import { Given, When, Then } from '@cucumber/cucumber';

Given('I am on the home page', async function () {
  await this.page.goto('/');
  await this.page.waitForLoadState('networkidle');
});

When('I click the TOC item {string}', async function (item: string) {
  await this.page.click(`.toc-item:has-text("${item}")`);
  await this.page.waitForTimeout(500);
});

Then('the page content should contain {string}', async function (text: string) {
  const content = await this.page.locator('#markdown-output h1').textContent();
  expect(content).toContain(text);
});

Then('the URL should contain {string}', async function (fragment: string) {
  expect(this.page.url()).toContain(fragment);
});

When('I press the browser back button', async function () {
  await this.page.goBack();
  await this.page.waitForTimeout(1000);
});

// ... more step definitions

The Cucumber runner parses the .feature file, matches each line to a step definition via regex or expression matching, and executes the steps in sequence.

What BDD Gets Right

BDD's strengths are real and important:

Cross-functional communication. A PM, a QA engineer, and a developer can all read Given I am on the home page / When I click the TOC item "Binary Wrapper" / Then the page content should contain "BinaryWrapper". Try reading @Implements<NavigationFeature>('tocClickLoadsPage') as a non-developer. BDD wins here, decisively.

Structured thinking. The Given/When/Then template forces teams to think in terms of preconditions, actions, and outcomes. This is a powerful discipline even if you throw away the tooling.

Living documentation. A well-maintained .feature file IS the specification. It's both the document and the test. When it passes, the spec is satisfied. When it fails, the spec is violated.

Ecosystem maturity. Cucumber has plugins for every major IDE, integrations with every major CI system, and reporting tools that generate HTML documentation from .feature files. The ecosystem is 15+ years old.

Scenario Outlines. Data-driven testing is built in:

  Scenario Outline: Navigation to <page> loads correct content
    Given I am on the home page
    When I click the TOC item "<page>"
    Then the page content should contain "<heading>"

    Examples:
      | page           | heading        |
      | Binary Wrapper | BinaryWrapper  |
      | Skills         | Skills         |
      | DDD            | Domain-Driven  |

Where BDD Breaks

Typos Are Caught at Runtime, Not Compile Time

A typo in a step definition is caught when the scenario runs — not when you write it:

  Scenario: TOC click loads page
    Given I am on the home page
    When I click the TOC itme "Binary Wrapper"    # typo: "itme"
    Then the page content should contain "BinaryWrapper"

Cucumber will report Undefined step: "I click the TOC itme {string}" — but only after you run the test suite. In a large project with hundreds of scenarios, this feedback loop can be minutes, not seconds.

Compare:

@Implements<NavigationFeature>('tocClickLoadPage')
// TypeScript error: 'tocClickLoadPage' is not assignable to
// keyof NavigationFeature. Did you mean 'tocClickLoadsPage'?

The compiler catches it before you save the file. The IDE underlines it in red. Zero-latency feedback.

Rename Safety Is Partial

Rename a step definition? Every .feature file that references it via the old phrasing breaks — silently, until you run the tests. IDEs with Cucumber plugins can sometimes propagate renames, but the support is inconsistent:

IntelliJ with the Cucumber plugin: decent rename support for Java step definitions
VS Code with Cucumber extension: partial, often misses edge cases
SpecFlow in Visual Studio: reasonable for C# projects
Python behave: no IDE rename support

Compare: renaming an abstract method in a typed spec feature class triggers TypeScript's rename refactoring across every @Implements reference automatically, in every file, with zero manual intervention.

No Compile-Time Completeness Check

A feature file with 8 scenarios and step definitions for only 5 will show 3 "undefined" steps — at runtime. There's no build-time check that says "your Navigation feature has 8 ACs and only 5 have step definitions."

More critically: BDD has no concept of a canonical AC list. The .feature file IS the list. If you forget to write a scenario for "deep links resolve correctly," nothing tells you it's missing. The feature file defines what's tested, not what should be tested.

In typed specs, the abstract class defines what SHOULD be tested. The scanner compares what IS tested against what SHOULD be. The gap is visible and quantified.

The Indirection Layer

BDD introduces a three-layer chain: .feature file → step definition matcher → test code. Each layer is connected by strings:

Feature file          Step definition        Test implementation
"I click {string}" → regex match           → page.click(selector)

This indirection has costs:

Maintenance: changing a step's wording requires updating both the .feature file and the step definition
Debugging: when a scenario fails, you navigate from the .feature file to the step definition to the actual assertion — three files, not one
Reuse complexity: shared steps across features require careful wording to avoid regex collisions
Expression ambiguity: "I click the button" and "I click the button {string}" can collide in subtle ways

Typed specs have a two-layer chain: feature class → test method. The link is a type reference, not a string match.

The Conversation Gap

Here's the uncomfortable truth about BDD in practice: most teams skip the conversation.

BDD's value proposition is that business stakeholders, QA, and developers collaboratively write .feature files — the "Three Amigos" meeting. In practice, developers write Gherkin as an alternative test syntax. The PM never reads the .feature files. QA reads them occasionally. The collaborative benefit — which is BDD's strongest argument — evaporates.

When this happens, you're left with a testing framework that adds an indirection layer (step definitions) without adding type safety. The ceremony of Given/When/Then becomes overhead without payoff.

This is not a criticism of BDD as a methodology. It's an observation about how BDD is used in most teams. If your team actually does the Three Amigos conversation, BDD's natural-language format is a genuine advantage over typed specs.

Scale Challenges

At scale, BDD .feature files accumulate problems:

Step definition explosion: 200 scenarios might need 500+ step definitions with careful naming to avoid collisions
Slow feedback: running all scenarios to check for undefined steps takes minutes in large suites
Feature file organization: which .feature file owns which scenarios? How do you handle cross-cutting features?
Step reuse vs. readability: generic steps are reusable but obscure; specific steps are readable but proliferate

Side-by-Side: The Same Feature, Two Approaches

BDD (Gherkin + Cucumber)

# navigation.feature (specification)
Feature: SPA Navigation and Deep Links

  Scenario: TOC click loads page
    Given I am on the home page
    When I click the TOC item "Binary Wrapper"
    Then the page content should contain "BinaryWrapper"

  Scenario: Back button restores previous page
    ...

  Scenario: Direct URL loads correct page
    ...

// steps/navigation.steps.ts (implementation)
Given('I am on the home page', async function () { ... });
When('I click the TOC item {string}', async function (item) { ... });
Then('the page content should contain {string}', async function (text) { ... });

Specification: separate .feature file (human-readable)
Link: regex/expression matching (runtime-checked)
Completeness: scenarios define what's tested (no canonical AC list)

Typed Specs (TypeScript)

// requirements/features/navigation.ts (specification)
export abstract class NavigationFeature extends Feature {
  readonly id = 'NAV';
  readonly title = 'SPA Navigation + Deep Links';
  readonly priority = Priority.Critical;

  /** Clicking a TOC entry loads the corresponding page. */
  abstract tocClickLoadsPage(): ACResult;

  /** The browser back button restores the previous page state. */
  abstract backButtonRestores(): ACResult;

  /** Navigating directly to a URL loads the correct page. */
  abstract directUrlLoads(): ACResult;
}

// test/e2e/navigation.spec.ts (implementation)
@FeatureTest(NavigationFeature)
class NavigationTests {
  @Implements<NavigationFeature>('tocClickLoadsPage')
  async 'clicking TOC loads page'({ page }) {
    await page.goto('/');
    await page.click('.toc-item[data-path="content/blog/binary-wrapper.md"]');
    await expect(page.locator('#markdown-output h1')).toContainText('BinaryWrapper');
  }
}

Specification: abstract class in the same codebase (developer-readable)
Link: keyof T type constraint (compile-time-checked)
Completeness: abstract methods define what SHOULD be tested; scanner reports gaps

How Typed Specs Differ

Dimension	BDD (Gherkin)	Typed Specifications
Specification format	Plain text (`.feature` file)	Abstract class (`.ts` file)
Readable by PMs/QA	Yes	No (developer-only)
Link mechanism	Regex/expression match	`keyof T` type constraint
Typo detection	Runtime (undefined step)	Compile-time (type error)
Rename propagation	Partial (IDE plugin)	Full (IDE refactor)
Canonical AC list	No (scenarios ARE the list)	Yes (abstract methods)
Completeness check	No	Scanner reports gaps
Indirection layers	3 (feature → step def → code)	2 (feature → test)
Scale story	Step definition explosion	Abstract class explosion (mitigated by code generation)

Honest Credit

BDD is the better choice when:

Non-developers need to read and co-author specifications. Gherkin's natural-language format is genuinely more accessible than abstract classes.
The Three Amigos conversation actually happens. If PMs, QA, and developers collaboratively define scenarios, BDD's collaborative value is real and significant.
Regulatory environments require human-readable test documentation. .feature files double as living documentation that auditors can review.
The team is already invested in BDD tooling. Migrating away from a working Cucumber setup has a real cost.

Typed specs are the better choice when:

Developers own the specifications. If the PM writes user stories in Jira and the developer translates them to code, the .feature file is an unnecessary intermediate step.
Compile-time safety matters. If you want zero-latency feedback on broken links, the type system beats regex matching.
Completeness checking matters. If you need to answer "which ACs are NOT tested?", typed specs provide this and BDD doesn't.
The team values simplicity. 75 lines of decorator infrastructure versus a full Cucumber setup with step definitions, hooks, world objects, and reporters.

Previous: Part I: Jira, Azure DevOps, and Linear Next: Part III: Test Management Platforms — Allure, TestRail, Zephyr, and the two-sources-of-truth problem.

`[` or `Alt+S`	Focus sidebar navigation
`]` or `Alt+C`	Focus main content
`↑` `↓`	Navigate between sidebar items
`Enter`	Open page / toggle section
`Space`	Toggle section expand/collapse
`Escape`	Close overlay / sidebar

`Ctrl+K`	Open search
`?`	Show this help

`Ctrl+=` or `Ctrl+↑`	Increase font size
`Ctrl+−` or `Ctrl+↓`	Decrease font size
`f`	Open console font selector

`Ctrl+⇧+=` or `Ctrl+⇧+↑`	Browser zoom in
`Ctrl+⇧+−` or `Ctrl+⇧+↓`	Browser zoom out
`Ctrl+⇧+0`	Reset browser zoom

`Tab`	Focus a diagram or image
`Enter`	Open full size overlay
`+` `−`	Zoom in / out (in overlay)
`Escape`	Close overlay, return focus

Part II: BDD Frameworks — Cucumber, Gherkin, SpecFlow📋

What BDD Is📋

How It Links Requirements to Tests📋

What BDD Gets Right📋

Where BDD Breaks📋

Typos Are Caught at Runtime, Not Compile Time📋

Rename Safety Is Partial📋

No Compile-Time Completeness Check📋

The Indirection Layer📋

The Conversation Gap📋

Scale Challenges📋

Side-by-Side: The Same Feature, Two Approaches📋

BDD (Gherkin + Cucumber)📋

Typed Specs (TypeScript)📋

How Typed Specs Differ📋

Honest Credit📋