What 10 years of code taught me about AI agent specs

May 26, 20257 min

After 10 years of implementing specs — good and bad — I can recognize a bad AI spec in a few lines. Here are the patterns I keep seeing, and what to do instead.

The fundamental problem

A classic spec starts from expected behavior and expresses it as a rule: “The system must do X when Y.” That works for deterministic code.

An LLM is probabilistic. It doesn’t do X, it generates a response that resembles X in most cases. This paradigm shift changes everything about how to write specs.

Mistake #1: specifying behavior without specifying edge cases

I’ve seen dozens of specs like:

“The agent must analyze incoming emails and create a task in the CRM if the email contains a customer request.”

On the surface, that’s clear. In practice, it ignores dozens of questions:

What is a “customer request”? Is an email saying “it’s not working” a request?
A spam email mentioning a product — is that a request?
An email from your own team discussing a customer — is that a request?

What I do instead: I systematically ask for 10 input examples with the expected behavior for each. This forces clarification of edge cases before coding starts.

Mistake #2: confusing “the AI understands” and “the AI does”

A spec written as if the LLM were an intelligent human who understands the intent behind words:

“The agent must respond in a professional and empathetic manner.”

Professional how? Empathetic in what context? For which type of customers?

An LLM without precise constraints will invent its own definition of “professional” — which may not match yours at all.

What I do instead: I translate every vague adjective into observable criteria. “Professional” becomes: “Use formal address. Don’t start with ‘Hey!’. Avoid phrases like ‘No worries’. End with a concrete action proposal.”

Mistake #3: ignoring output format

Half the integration bugs in agents I’ve seen come from malformed JSON or unexpected response structure.

If your agent must return structured data, the spec must include:

{
  "action": "create_task" | "ignore" | "escalate",
  "priority": "low" | "medium" | "high",
  "summary": "string (max 100 chars)",
  "confidence": 0.0 to 1.0
}

And the system prompt must explicitly request this format — with an example, not just a description.

Mistake #4: not specifying behavior under uncertainty

What does the agent do when it doesn’t know? This is often the most important question, and the most overlooked.

An LLM without instruction will hallucinate an answer rather than admit uncertainty. You need to tell it explicitly when and how to declare uncertainty.

What this changes in practice

A good spec for an AI agent looks like:

Concrete examples (inputs + expected outputs), not just descriptions
A precise output schema with all fields and their types
Explicit rules for handling edge cases
Defined behavior for uncertainty situations
Evaluation metrics: how do you know if the agent is working well?

It takes longer to write than a classic spec. But it’s infinitely shorter than debugging a hallucinating agent in production.

Stéphanie Caumont

AI Product Owner · Learn more

← All articles Contact me