Every Style LSP: Harness Engineering for AI Writing Agents
On building infrastructure that encodes signals for AI agents, instead of burying rules in prompts—illustrated through an LSP I built for Every's style guide.
During Every's think week in January, I built Dan Shipper and the team a Language Server Protocol server – one that encodes their style guide as structured diagnostic rules and delivers those signals to AI agents responsible for making edits.
The difference in practice is instructive. An AI operating without the LSP identified 22 issues. An AI operating with it considered 116 signals and produced 27 targeted edits. Dan's first question was the right one: what is the advantage of an LSP over a subagent equipped with a prompt, some tools, and regular expressions?
The answer to that question encompasses what I have started calling harness engineering – and it represents one of the more underexplored approaches in applied AI infrastructure today.
The Core Principle
Prompts carry a cost that extends beyond token expenditure. When a model is asked to hold 38 distinct style rules in attention while simultaneously editing prose, it is competing against the full weight of its prior training for the allocation of that attention. Some rules are applied reliably. Others are not. The rules that surface most consistently tend to be those that coincide with patterns the model has encountered at high frequency – not necessarily the rules that matter most for a given style guide.
Harness engineering offers a structural alternative: encode the rules into the environment the model operates within, rather than into the model's attention. The environment generates signals; the model responds to them. This is a meaningful architectural distinction.
It is important to note that this approach is already well-established in software development. Linters, type checkers, and formatters do not rely on a model to remember a style guide. They instrument the environment to produce structured findings, and the model acts on those findings. English writing is governed by a different set of rules; there is no reason the same principle cannot be applied.
Why LSP Rather Than a Subagent
A subagent architecture – discrete agents scanning for number formatting violations, checking punctuation, flagging adverbs – is a valid approach and one that would function. However, the LSP model encompasses several structural advantages that such an approach does not.
Existing infrastructure. Claude Code, OpenCode, and editors broadly already speak LSP. The models operating within these environments have been trained to act on diagnostic signals. Deploying a single server makes it available everywhere the protocol is understood, without requiring the construction of a new interface. To this end, the LSP approach leverages infrastructure that already exists rather than mandating the development of a parallel one.
Parity. A human author working in an LSP-enabled editor receives red squiggles, hover text, and a running problem count. An agent receives the same findings as structured data. The source of truth is identical; the interfaces are optimized for their respective consumers. When a rule is updated in the LSP, the update propagates to both simultaneously.
Event-driven delivery. An LSP is not a batch process. It pushes diagnostics – line, column, rule identifier, message – as the document changes, without requiring the model to go looking for them. The signals arrive scoped to precisely what has changed, at the moment it changes.
Separation of signal from action. This is the distinction that matters most. The LSP's responsibility is to find violations. The model's responsibility is to fix them. These are distinct capabilities, and they benefit from distinct execution contexts. Bundling them – asking a model to simultaneously discover violations and determine how to address them – produces tradeoffs that are difficult to manage and unnecessary to accept.
What the Harness Encompasses
The Every Style LSP encodes more than 40 rules drawn from the Every style guide as precise diagnostic checks: numbers spelled out one through nine, em dashes with no surrounding spaces, the Oxford comma, Oxford-clause capitalization, company names treated as singular. Each rule is a discrete function that scans the document, identifies violations, and emits a diagnostic carrying a code, a message, a severity level, and exact line and column positions.
The agents do not carry the rules internally. They receive diagnostics and act on them.
The agent layer is organized as parallel subagents with narrow, well-defined scopes – one handling numbers and currency, another handling punctuation mechanics, another handling names and attribution. Each subagent operates with a focused system prompt that encodes the rules for its domain, not the full 38-rule guide. Scoped prompts reduce competition for attention; reduced competition produces more reliable enforcement.
Each proposed edit is structured to include the rule it addresses, the subagent that proposed it, the exact text it would replace, and context anchors that prevent the edit from landing at the wrong position. Everything is traceable to a source.
A verification pass reviews all proposals before any edit is applied. Conflicts are resolved; duplicates are consolidated. The final count – 27 edits in the test document – reflects targeted, verified corrections, not the model's unguided estimation of what might be wrong.
Granularity and Division of Labor
One effect of granularity at this level is that it makes a more sophisticated division of labor possible.
When the LSP handles mechanical rule enforcement – numbers, punctuation, em dashes – the model is free to operate on problems that genuinely require it: voice, structure, whether the introduction achieves what it sets out to achieve. These are categorically different problems. Conflating them produces mediocre results at both.
Atomic signals enable specialized pipelines. The agent responsible for evaluating whether a sentence reads well does not need to carry the overhead of evaluating whether a number is formatted correctly; the LSP has already answered that question. The agent receives the answer and proceeds accordingly.
It is critical that this principle be recognized for what it is: the same compositional logic that governs well-designed APIs. Small, well-defined responsibilities that can be combined without creating dependencies between them.
Rules as Data
There is a practical consequence of encoding rules in an LSP that is easy to overlook in the initial design but becomes significant over time: the rules can be changed without touching the agents at all.
Update the rules file. The LSP regenerates its diagnostics. Every editor and every agent workflow that consumes those diagnostics picks up the change automatically – no prompt editing, no redeployment, no ambiguity about whether the model is applying the new rule or the old one.
The style guide, in this model, becomes configurable data rather than embedded instruction. Individual documents can suppress specific rules inline. The guide can evolve without requiring downstream systems to be rearchitected. This is what well-functioning infrastructure looks like: the behavior that matters lives in one place, and everything else composes on top of it.
The Broader Pattern
Harness engineering is, in essence, the answer to a question that arises constantly in AI development: how do agents become reliable?
The instinct, frequently, is to improve the prompt – to write clearer instructions, to add more examples, to be more explicit about edge cases. This approach yields results up to a point, after which it encounters a structural ceiling, because the model is still being asked to hold more context than attention can support with consistency.
The more productive instinct is to make the environment smarter. It is imperative to ask: what signals can be encoded upstream, so the model does not have to derive them at runtime? What tools can be provided so agents are not required to estimate things that could be measured? What structure can be imposed on outputs so that downstream systems can reason about them with confidence?
Code tooling has had decades to develop in this direction. Linters, formatters, type systems, and test runners each represent an instance of encoding a class of judgment into infrastructure, removing it from the execution path, and making the result available as a structured signal. English writing is substantially behind this curve. The tools that exist are, in most cases, still humans reading printouts with red pens.
There is considerable room to close that gap – and the infrastructure required to do so is, in large part, already built.
The repository is at SchneiderBD/Every-LSP. It runs on Claude Code out of the box. If you are working on something similar – encoding domain rules into agent-consumable signals – I would be glad to hear about it.