2 Concept TDD Through Structure
jbr870 edited this page 2026-04-30 10:11:16 +00:00
Status Last updated Parent Read before this
complete 2026-04-30 none
Design - Skill Pipeline

TDD Through Structure

Current state

TDD discipline in devwork-skills is enforced by the structure of the skills, not by asking agents to "follow TDD."

Inside /develop, each work unit runs as a Test Writer + Implementer pair. The Test Writer produces a failing test against the work unit's acceptance criteria; the Implementer cannot proceed until the failure is verified, and cannot finish until the test passes. There is no path through the skill that lets an agent skip to implementation.

In the QA pipelines, the validate/tests/fix triplet does the same thing for regressions. validate produces a report with findings. tests generates regression tests for each finding, marked test.fixme() (Playwright), @pytest.mark.xfail (pytest), or t.Skip() (Go) so they fail-as-expected without breaking the suite. fix applies the actual code fix and promotes the test to a passing one, removing the marker. The order is fixed, the markers force the test to exist before the fix, and the suite stays green throughout.

Rationale

Asking agents to "follow TDD" doesn't work reliably. Agents that genuinely understand TDD will follow it; agents that don't will skip to implementation and write tests after if at all. Even agents that follow TDD will sometimes cut corners under pressure (Ralph loops in particular create pressure to declare done).

Structural enforcement removes the ambiguity. The skill defines the order — write failing test, verify failure, implement, verify pass, commit — and each step has a checkable artifact. There's no "did the agent follow TDD?" question because the only path through the skill is the TDD path.

The fixme/xfail/Skip pattern in the QA pipelines is a specific application of the same idea. Without it, a validate skill that finds 12 issues would either (a) leave the suite broken until all 12 are fixed (blocks shipping), or (b) skip writing tests for unfixed findings (loses the regression coverage). Markers let the tests exist immediately, document the known-failing state, and convert to real coverage as fixes land.

The cost is a more rigid skill structure — there's no "agent decides what to do next." The benefit is that TDD discipline becomes a property of the system rather than a property of the agent on duty.

Open questions

None.