The Harness engineering
Stop Writing Code. Start Writing Rules.
We've been thinking about AI coding assistants wrong.
For the past year, most developers have been treating AI agents like a faster pair of hands. You prompt it, it writes code, you review and fix the mess, repeat. Useful, sure. But not transformative.
Then I read about OpenAI's Codex team building an entire production app -- over a million lines of code -- where no human wrote a single line. Five months, a million lines, real users. The engineers didn't write code. They wrote rules.
That distinction broke my brain a little.
The Thoroughbred Problem
An AI model is like a thoroughbred horse. Incredibly powerful, incredibly fast. Also completely useless if you set it loose in a field.
What makes a horse productive isn't more horsepower. It's the harness -- reins, saddle, bit -- equipment that channels raw power into useful direction. The rider provides intent. The harness translates intent into controlled motion.
The industry calls this harness engineering: building the systems around AI agents that make them reliable. Not the model. Not the prompt. The entire environment -- constraints, documentation, feedback loops, CI pipelines -- that turns a chaotic code generator into something you'd trust with production.
Same Horse, Different Harness
What convinced me this isn't hype is LangChain's coding agent. They were at 52.8% on Terminal Bench 2.0, around 30th place. Then they jumped to 66.5%, landing top 5.
They didn't switch models. They changed the harness.
They added a self-check step before submission. They mapped the codebase at startup instead of letting the agent discover it through trial and error. They detected edit loops -- where the agent rewrites the same file endlessly. They allocated reasoning strategically: high for planning, medium for implementation.
Same model. Better environment. Dramatically better results. The ceiling on AI coding quality isn't the model -- it's what you put around it.
What a Harness Actually Looks Like
After months building with Claude Code, Codex, and Cursor, I think about it in three layers.
Give it the right context. From the agent's perspective, if it's not in the context window, it doesn't exist. That Slack thread about naming conventions? Gone. The design doc in Google Docs? Might as well be on the moon. The fix is simple: put everything in the repo. I maintain an AGENTS.md file encoding every decision I'd normally explain to a new team member -- structure, patterns, conventions, API contracts. It's just documentation, but written for a literal-minded non-human reader.
Constrain what it can do. Counterintuitively, less freedom produces better code. When the solution space is bounded, agents don't waste tokens exploring dead ends. Dependency rules enforced by CI. Structural tests that verify architecture, not just behavior. Pre-commit hooks that normalize formatting. The paradox is real: more constraints, more productivity.
Clean up after it. AI-generated codebases accumulate entropy fast. Documentation drifts. Naming conventions diverge. Dead code piles up. With agents writing at 10x speed, what takes human teams months happens in weeks. The answer is maintenance agents -- separate AI processes that run on schedules, checking that docs match code, finding constraint violations, normalizing pattern deviations.
What I've Actually Changed
After building this way, I can't go back.
I write specifications, not implementations. Instead of writing a function, I describe what it should do, accept, return, and maintain. The agent writes code. I review against the spec.
I invest in repo-level documentation. Every decision lives in the repo. If a new developer couldn't find and follow it without asking me, the documentation isn't good enough.
I build constraints incrementally. I don't design the perfect harness upfront. I start with basic linting, watch what the agent gets wrong, and add constraints for those specific mistakes. The harness grows from observed failures, not theory.
I review differently. AI-generated code over-abstracts, adds unnecessary error handling, and lets documentation drift. My review process is tuned for these patterns.
The Career Question
If agents write all the code, what are engineers doing?
Something harder. Designing environments where AI writes correct code reliably requires deeper architectural thinking. You need to understand systems well enough to express constraints formally. You need to write specs precise enough for a literal-minded agent to execute. You need to anticipate failure modes and build feedback loops around them.
Stripe's agents already produce over 1,000 merged PRs per week. Developer posts a task, agent writes code, passes CI, opens a PR, human reviews and merges. That's not the future. That's now.
Start Simple
Add an AGENTS.md to your next project. Write down three important conventions. Set up pre-commit hooks. Make sure you have a test suite the agent can run. That's a harness.
Then watch where the agent stumbles. Add a constraint for each recurring failure. Evolve from experience.
One more thing: keep it rippable. Models improve fast. The harness that saved you last quarter might be unnecessary overhead next quarter. Build it to be adaptable, build it to be removable.
The horse gets stronger every quarter. Make sure the harness keeps up.
Thanks for reading!