Why I Don't Use LangChain in Production

Jul 2, 20265 min

LangChain is the most-mentioned AI framework in tutorials, conferences, and job postings. I spent several weeks building with it, then rewrote everything directly against the SDKs. Here’s what I learned.

What LangChain Does Well

To be fair: LangChain solves real problems.

Fast start: chain a retriever + LLM in 10 lines
Useful abstractions: Document loaders, text splitters, vector store interfaces
Ecosystem: hundreds of pre-built integrations

For a prototype or POC, it’s hard to beat. You lay the foundations in a day.

Why It Breaks Down in Production

1. Debugging becomes opaque.

When a LangChain chain produces an unexpected result, tracing back to the cause takes time. The abstractions hide exactly what’s being sent to the model. With a direct call, the problem is immediately visible.

2. Versions break things.

LangChain has had v0.1, v0.2, v0.3 — each with breaking changes. I’ve had projects where updating a transitive dependency silently broke a production pipeline. With a custom wrapper on the official SDK, you control your dependencies.

3. Performance overhead.

On batch pipelines, the LangChain abstraction layer adds measurable latency. Not dramatic, but at 100k calls/day it matters.

4. You don’t learn what the model actually does.

If you use LLMChain without understanding what prompt it sends, you can’t optimize. Abstractions encourage not looking under the hood.

What I Use Instead

Official Anthropic SDK (anthropic Python or TypeScript). Stable, documented, direct.

Pydantic for structured output validation. Better than LangChain’s PydanticOutputParser.

My own minimalist abstractions — 100–200 lines of code I fully understand:

class LLMClient:
    def __init__(self, model: str = "claude-sonnet-4-6"):
        self.client = anthropic.Anthropic()
        self.model = model

    def complete(self, prompt: str, max_tokens: int = 1024) -> str:
        response = self.client.messages.create(
            model=self.model,
            max_tokens=max_tokens,
            messages=[{"role": "user", "content": prompt}]
        )
        return response.content[0].text

    def complete_json(self, prompt: str, schema: type[T]) -> T:
        text = self.complete(prompt + "\nRespond only with valid JSON.")
        return schema.model_validate_json(text)

200 lines I master > 200,000 lines I endure.

When LangChain Makes Sense

POC or demo: yes, time-to-demo is faster
Integrations with exotic systems (obscure vector databases, proprietary formats)
Projects maintained by a team that already knows LangChain well

Outside of these cases, I prefer building on official SDKs. The code is more readable, debugging faster, and production behavior more predictable.

The Rule I Apply

Use a framework when it saves you time on your specific needs, not on the common parts you could write in an hour.

The common parts (API call, retry, JSON parsing) take an hour to write. Invest that time.

Stéphanie Caumont

AI Product Owner · Learn more

← All articles Contact me