Part II: BDD Frameworks — Cucumber, Gherkin, SpecFlow
Behavior-Driven Development is the closest thing to typed specifications that most teams have encountered. Both put requirements in a structured format. Both link them to tests. The difference is where the structure lives: in a plain-text file parsed at runtime, or in the type system checked at compile time.
What BDD Is
BDD is a development methodology — not just a testing tool — that emerged from Dan North's work in 2003. Its core insight: specifications should be written in a language that developers, testers, and business stakeholders all understand. The tooling (Cucumber, SpecFlow, behave, Behat) implements this through .feature files written in Gherkin syntax and step definitions that execute the scenarios.
The promise: living documentation that is both human-readable and machine-executable.
How It Links Requirements to Tests
A .feature file describes a feature and its scenarios using structured natural language:
# navigation.feature
Feature: SPA Navigation and Deep Links
Users navigate between pages using the table of contents,
browser history, and direct URLs.
Scenario: TOC click loads the corresponding page
Given I am on the home page
When I click the TOC item "Binary Wrapper"
Then the page content should contain "BinaryWrapper"
And the URL should contain "binary-wrapper"
Scenario: Back button restores previous page
Given I am on the home page
And I navigate to "Binary Wrapper"
And I navigate to "Skills"
When I press the browser back button
Then the page content should contain "BinaryWrapper"
Scenario: Direct URL loads correct page
Given I navigate directly to "/content/blog/binary-wrapper.html"
Then the page content should contain "BinaryWrapper"
And the URL should be bookmarkable# navigation.feature
Feature: SPA Navigation and Deep Links
Users navigate between pages using the table of contents,
browser history, and direct URLs.
Scenario: TOC click loads the corresponding page
Given I am on the home page
When I click the TOC item "Binary Wrapper"
Then the page content should contain "BinaryWrapper"
And the URL should contain "binary-wrapper"
Scenario: Back button restores previous page
Given I am on the home page
And I navigate to "Binary Wrapper"
And I navigate to "Skills"
When I press the browser back button
Then the page content should contain "BinaryWrapper"
Scenario: Direct URL loads correct page
Given I navigate directly to "/content/blog/binary-wrapper.html"
Then the page content should contain "BinaryWrapper"
And the URL should be bookmarkableStep definitions implement each Given/When/Then line:
// steps/navigation.steps.ts
import { Given, When, Then } from '@cucumber/cucumber';
Given('I am on the home page', async function () {
await this.page.goto('/');
await this.page.waitForLoadState('networkidle');
});
When('I click the TOC item {string}', async function (item: string) {
await this.page.click(`.toc-item:has-text("${item}")`);
await this.page.waitForTimeout(500);
});
Then('the page content should contain {string}', async function (text: string) {
const content = await this.page.locator('#markdown-output h1').textContent();
expect(content).toContain(text);
});
Then('the URL should contain {string}', async function (fragment: string) {
expect(this.page.url()).toContain(fragment);
});
When('I press the browser back button', async function () {
await this.page.goBack();
await this.page.waitForTimeout(1000);
});
// ... more step definitions// steps/navigation.steps.ts
import { Given, When, Then } from '@cucumber/cucumber';
Given('I am on the home page', async function () {
await this.page.goto('/');
await this.page.waitForLoadState('networkidle');
});
When('I click the TOC item {string}', async function (item: string) {
await this.page.click(`.toc-item:has-text("${item}")`);
await this.page.waitForTimeout(500);
});
Then('the page content should contain {string}', async function (text: string) {
const content = await this.page.locator('#markdown-output h1').textContent();
expect(content).toContain(text);
});
Then('the URL should contain {string}', async function (fragment: string) {
expect(this.page.url()).toContain(fragment);
});
When('I press the browser back button', async function () {
await this.page.goBack();
await this.page.waitForTimeout(1000);
});
// ... more step definitionsThe Cucumber runner parses the .feature file, matches each line to a step definition via regex or expression matching, and executes the steps in sequence.
What BDD Gets Right
BDD's strengths are real and important:
Cross-functional communication. A PM, a QA engineer, and a developer can all read Given I am on the home page / When I click the TOC item "Binary Wrapper" / Then the page content should contain "BinaryWrapper". Try reading @Implements<NavigationFeature>('tocClickLoadsPage') as a non-developer. BDD wins here, decisively.
Structured thinking. The Given/When/Then template forces teams to think in terms of preconditions, actions, and outcomes. This is a powerful discipline even if you throw away the tooling.
Living documentation. A well-maintained .feature file IS the specification. It's both the document and the test. When it passes, the spec is satisfied. When it fails, the spec is violated.
Ecosystem maturity. Cucumber has plugins for every major IDE, integrations with every major CI system, and reporting tools that generate HTML documentation from .feature files. The ecosystem is 15+ years old.
Scenario Outlines. Data-driven testing is built in:
Scenario Outline: Navigation to <page> loads correct content
Given I am on the home page
When I click the TOC item "<page>"
Then the page content should contain "<heading>"
Examples:
| page | heading |
| Binary Wrapper | BinaryWrapper |
| Skills | Skills |
| DDD | Domain-Driven | Scenario Outline: Navigation to <page> loads correct content
Given I am on the home page
When I click the TOC item "<page>"
Then the page content should contain "<heading>"
Examples:
| page | heading |
| Binary Wrapper | BinaryWrapper |
| Skills | Skills |
| DDD | Domain-Driven |Where BDD Breaks
Typos Are Caught at Runtime, Not Compile Time
A typo in a step definition is caught when the scenario runs — not when you write it:
Scenario: TOC click loads page
Given I am on the home page
When I click the TOC itme "Binary Wrapper" # typo: "itme"
Then the page content should contain "BinaryWrapper" Scenario: TOC click loads page
Given I am on the home page
When I click the TOC itme "Binary Wrapper" # typo: "itme"
Then the page content should contain "BinaryWrapper"Cucumber will report Undefined step: "I click the TOC itme {string}" — but only after you run the test suite. In a large project with hundreds of scenarios, this feedback loop can be minutes, not seconds.
Compare:
@Implements<NavigationFeature>('tocClickLoadPage')
// TypeScript error: 'tocClickLoadPage' is not assignable to
// keyof NavigationFeature. Did you mean 'tocClickLoadsPage'?@Implements<NavigationFeature>('tocClickLoadPage')
// TypeScript error: 'tocClickLoadPage' is not assignable to
// keyof NavigationFeature. Did you mean 'tocClickLoadsPage'?The compiler catches it before you save the file. The IDE underlines it in red. Zero-latency feedback.
Rename Safety Is Partial
Rename a step definition? Every .feature file that references it via the old phrasing breaks — silently, until you run the tests. IDEs with Cucumber plugins can sometimes propagate renames, but the support is inconsistent:
- IntelliJ with the Cucumber plugin: decent rename support for Java step definitions
- VS Code with Cucumber extension: partial, often misses edge cases
- SpecFlow in Visual Studio: reasonable for C# projects
- Python behave: no IDE rename support
Compare: renaming an abstract method in a typed spec feature class triggers TypeScript's rename refactoring across every @Implements reference automatically, in every file, with zero manual intervention.
No Compile-Time Completeness Check
A feature file with 8 scenarios and step definitions for only 5 will show 3 "undefined" steps — at runtime. There's no build-time check that says "your Navigation feature has 8 ACs and only 5 have step definitions."
More critically: BDD has no concept of a canonical AC list. The .feature file IS the list. If you forget to write a scenario for "deep links resolve correctly," nothing tells you it's missing. The feature file defines what's tested, not what should be tested.
In typed specs, the abstract class defines what SHOULD be tested. The scanner compares what IS tested against what SHOULD be. The gap is visible and quantified.
The Indirection Layer
BDD introduces a three-layer chain: .feature file → step definition matcher → test code. Each layer is connected by strings:
Feature file Step definition Test implementation
"I click {string}" → regex match → page.click(selector)Feature file Step definition Test implementation
"I click {string}" → regex match → page.click(selector)This indirection has costs:
- Maintenance: changing a step's wording requires updating both the
.featurefile and the step definition - Debugging: when a scenario fails, you navigate from the
.featurefile to the step definition to the actual assertion — three files, not one - Reuse complexity: shared steps across features require careful wording to avoid regex collisions
- Expression ambiguity:
"I click the button"and"I click the button {string}"can collide in subtle ways
Typed specs have a two-layer chain: feature class → test method. The link is a type reference, not a string match.
The Conversation Gap
Here's the uncomfortable truth about BDD in practice: most teams skip the conversation.
BDD's value proposition is that business stakeholders, QA, and developers collaboratively write .feature files — the "Three Amigos" meeting. In practice, developers write Gherkin as an alternative test syntax. The PM never reads the .feature files. QA reads them occasionally. The collaborative benefit — which is BDD's strongest argument — evaporates.
When this happens, you're left with a testing framework that adds an indirection layer (step definitions) without adding type safety. The ceremony of Given/When/Then becomes overhead without payoff.
This is not a criticism of BDD as a methodology. It's an observation about how BDD is used in most teams. If your team actually does the Three Amigos conversation, BDD's natural-language format is a genuine advantage over typed specs.
Scale Challenges
At scale, BDD .feature files accumulate problems:
- Step definition explosion: 200 scenarios might need 500+ step definitions with careful naming to avoid collisions
- Slow feedback: running all scenarios to check for undefined steps takes minutes in large suites
- Feature file organization: which
.featurefile owns which scenarios? How do you handle cross-cutting features? - Step reuse vs. readability: generic steps are reusable but obscure; specific steps are readable but proliferate
Side-by-Side: The Same Feature, Two Approaches
BDD (Gherkin + Cucumber)
# navigation.feature (specification)
Feature: SPA Navigation and Deep Links
Scenario: TOC click loads page
Given I am on the home page
When I click the TOC item "Binary Wrapper"
Then the page content should contain "BinaryWrapper"
Scenario: Back button restores previous page
...
Scenario: Direct URL loads correct page
...# navigation.feature (specification)
Feature: SPA Navigation and Deep Links
Scenario: TOC click loads page
Given I am on the home page
When I click the TOC item "Binary Wrapper"
Then the page content should contain "BinaryWrapper"
Scenario: Back button restores previous page
...
Scenario: Direct URL loads correct page
...// steps/navigation.steps.ts (implementation)
Given('I am on the home page', async function () { ... });
When('I click the TOC item {string}', async function (item) { ... });
Then('the page content should contain {string}', async function (text) { ... });// steps/navigation.steps.ts (implementation)
Given('I am on the home page', async function () { ... });
When('I click the TOC item {string}', async function (item) { ... });
Then('the page content should contain {string}', async function (text) { ... });- Specification: separate
.featurefile (human-readable) - Link: regex/expression matching (runtime-checked)
- Completeness: scenarios define what's tested (no canonical AC list)
Typed Specs (TypeScript)
// requirements/features/navigation.ts (specification)
export abstract class NavigationFeature extends Feature {
readonly id = 'NAV';
readonly title = 'SPA Navigation + Deep Links';
readonly priority = Priority.Critical;
/** Clicking a TOC entry loads the corresponding page. */
abstract tocClickLoadsPage(): ACResult;
/** The browser back button restores the previous page state. */
abstract backButtonRestores(): ACResult;
/** Navigating directly to a URL loads the correct page. */
abstract directUrlLoads(): ACResult;
}// requirements/features/navigation.ts (specification)
export abstract class NavigationFeature extends Feature {
readonly id = 'NAV';
readonly title = 'SPA Navigation + Deep Links';
readonly priority = Priority.Critical;
/** Clicking a TOC entry loads the corresponding page. */
abstract tocClickLoadsPage(): ACResult;
/** The browser back button restores the previous page state. */
abstract backButtonRestores(): ACResult;
/** Navigating directly to a URL loads the correct page. */
abstract directUrlLoads(): ACResult;
}// test/e2e/navigation.spec.ts (implementation)
@FeatureTest(NavigationFeature)
class NavigationTests {
@Implements<NavigationFeature>('tocClickLoadsPage')
async 'clicking TOC loads page'({ page }) {
await page.goto('/');
await page.click('.toc-item[data-path="content/blog/binary-wrapper.md"]');
await expect(page.locator('#markdown-output h1')).toContainText('BinaryWrapper');
}
}// test/e2e/navigation.spec.ts (implementation)
@FeatureTest(NavigationFeature)
class NavigationTests {
@Implements<NavigationFeature>('tocClickLoadsPage')
async 'clicking TOC loads page'({ page }) {
await page.goto('/');
await page.click('.toc-item[data-path="content/blog/binary-wrapper.md"]');
await expect(page.locator('#markdown-output h1')).toContainText('BinaryWrapper');
}
}- Specification: abstract class in the same codebase (developer-readable)
- Link:
keyof Ttype constraint (compile-time-checked) - Completeness: abstract methods define what SHOULD be tested; scanner reports gaps
How Typed Specs Differ
| Dimension | BDD (Gherkin) | Typed Specifications |
|---|---|---|
| Specification format | Plain text (.feature file) |
Abstract class (.ts file) |
| Readable by PMs/QA | Yes | No (developer-only) |
| Link mechanism | Regex/expression match | keyof T type constraint |
| Typo detection | Runtime (undefined step) | Compile-time (type error) |
| Rename propagation | Partial (IDE plugin) | Full (IDE refactor) |
| Canonical AC list | No (scenarios ARE the list) | Yes (abstract methods) |
| Completeness check | No | Scanner reports gaps |
| Indirection layers | 3 (feature → step def → code) | 2 (feature → test) |
| Scale story | Step definition explosion | Abstract class explosion (mitigated by code generation) |
Honest Credit
BDD is the better choice when:
- Non-developers need to read and co-author specifications. Gherkin's natural-language format is genuinely more accessible than abstract classes.
- The Three Amigos conversation actually happens. If PMs, QA, and developers collaboratively define scenarios, BDD's collaborative value is real and significant.
- Regulatory environments require human-readable test documentation.
.featurefiles double as living documentation that auditors can review. - The team is already invested in BDD tooling. Migrating away from a working Cucumber setup has a real cost.
Typed specs are the better choice when:
- Developers own the specifications. If the PM writes user stories in Jira and the developer translates them to code, the
.featurefile is an unnecessary intermediate step. - Compile-time safety matters. If you want zero-latency feedback on broken links, the type system beats regex matching.
- Completeness checking matters. If you need to answer "which ACs are NOT tested?", typed specs provide this and BDD doesn't.
- The team values simplicity. 75 lines of decorator infrastructure versus a full Cucumber setup with step definitions, hooks, world objects, and reporters.
Previous: Part I: Jira, Azure DevOps, and Linear Next: Part III: Test Management Platforms — Allure, TestRail, Zephyr, and the two-sources-of-truth problem.