← Back to blog AI Techniques

Fine-tuning vs prompt engineering: how to choose?

May 15, 20256 min

“We should fine-tune the model” is a phrase I often hear when results aren’t satisfactory. Most of the time, it’s not the right solution. Here’s how to decide.

The fine-tuning reflex, and why it’s often premature

Fine-tuning is presented as the solution to “customize” an LLM for your domain. That’s true — but it’s also expensive, time-consuming, and introduces technical debt.

Before fine-tuning, the real question is: have you truly exhausted prompt engineering possibilities?

In my practice, 80% of cases where fine-tuning is discussed are resolved with a better system prompt, well-chosen few-shot examples, or better input structuring.

When prompt engineering is enough

Output format. You want the model to always respond in JSON with a precise structure → prompt engineering with explicit schema and examples.

Tone and style. You want an agent that speaks like your brand → prompt engineering with examples of desired phrasings.

Business rules. You want the agent to apply domain-specific rules → prompt engineering with rules explicitly listed.

Edge case behavior. You want the agent to say “I don’t know” rather than hallucinate → prompt engineering with explicit uncertainty handling instructions.

When fine-tuning makes sense

Very high call volume. A smaller fine-tuned model can replace a large generic model for repetitive tasks, at 10x lower inference cost. At 10M requests/month, the savings can be massive.

Highly specialized task with lots of data. If you have 10,000+ high-quality examples in a very specific domain, fine-tuning can outperform generic models.

Critical latency. A smaller fine-tuned model responds faster. For real-time applications, this can make the difference.

Privacy. If you fine-tune and host your own model, your production data doesn’t leave your infrastructure.

The decision process

Before talking fine-tuning, answer these questions:

  1. Do you have at least 1,000 quality (input/output) examples for training?
  2. Do you have a team capable of maintaining the fine-tuning pipeline over time?
  3. Have you first tried optimizing the prompt with few-shot examples?
  4. Have you measured that the generic model is insufficient on your real cases?

If you answer no to any of these, fine-tuning is premature.

SC

Stéphanie Caumont

AI Product Owner · Learn more