← Back to blog AI Comparison

Fine-tuning, RAG or Prompt Engineering: When to Use Which?

Jul 2, 20266 min

When a client asks “could we fine-tune the model on our data?”, my first answer is always: “probably not necessary.” Here’s the decision framework I use.

The Three Approaches

Prompt engineering: everything in context. Examples, instructions, expected format — the general-purpose model does the rest.

RAG: the model retrieves relevant passages from a document base before responding.

Fine-tuning: you retrain the model on annotated examples to change its base behavior.

Decision Matrix

CriterionPrompt eng.RAGFine-tuning
Upfront costVery lowMediumHigh
MaintenanceLowMediumHigh
Data needed0 examplesDocuments100–10k examples
Answers on fresh data
Added latencyNone+100–500msNone
Style consistencyGoodGoodExcellent

When Prompt Engineering Is Enough

The vast majority of cases: structured extraction, classification, format-constrained generation.

# 90% of cases are solved with a good prompt
prompt = """Extract the following entities from the text as JSON:
- company_name (string)
- amount (number, in euros)
- date (YYYY-MM-DD)

If an entity is missing, return null.

Text: {text}"""

If you need 10 examples for the model to understand the format, use few-shot — not fine-tuning.

When to Use RAG

Whenever the answer depends on documents that change frequently or exceed the context window:

  • Product knowledge base (updated monthly)
  • Internal technical documentation
  • Email archives / support tickets

RAG is cheaper to maintain than fine-tuning and stays current without retraining.

When Fine-tuning Is Justified

Three real cases:

  1. Very strict style: the model must write exactly like your brand, with phrasings that prompting can’t capture reliably.
  2. High-volume repetitive task: if you’re making 10M calls/month to Sonnet for simple classification, fine-tuning Haiku can cut the bill by 5×.
  3. Confidential proprietary data: examples can’t be sent with every call for legal reasons.

What I Recommend

  1. Start with prompt engineering. Test on 50 real examples.
  2. If results fall short due to knowledge gaps → RAG.
  3. If RAG is too slow / too expensive at scale, or style is critical → fine-tuning.

You’ll rarely reach step 3.

SC

Stéphanie Caumont

AI Product Owner · Learn more