Fine-tuning, RAG or Prompt Engineering: When to Use Which?

Jul 2, 20266 min

When a client asks “could we fine-tune the model on our data?”, my first answer is always: “probably not necessary.” Here’s the decision framework I use.

The Three Approaches

Prompt engineering: everything in context. Examples, instructions, expected format — the general-purpose model does the rest.

RAG: the model retrieves relevant passages from a document base before responding.

Fine-tuning: you retrain the model on annotated examples to change its base behavior.

Decision Matrix

Criterion	Prompt eng.	RAG	Fine-tuning
Upfront cost	Very low	Medium	High
Maintenance	Low	Medium	High
Data needed	0 examples	Documents	100–10k examples
Answers on fresh data	✅	✅	❌
Added latency	None	+100–500ms	None
Style consistency	Good	Good	Excellent

When Prompt Engineering Is Enough

The vast majority of cases: structured extraction, classification, format-constrained generation.

# 90% of cases are solved with a good prompt
prompt = """Extract the following entities from the text as JSON:
- company_name (string)
- amount (number, in euros)
- date (YYYY-MM-DD)

If an entity is missing, return null.

Text: {text}"""

If you need 10 examples for the model to understand the format, use few-shot — not fine-tuning.

When to Use RAG

Whenever the answer depends on documents that change frequently or exceed the context window:

Product knowledge base (updated monthly)
Internal technical documentation
Email archives / support tickets

RAG is cheaper to maintain than fine-tuning and stays current without retraining.

When Fine-tuning Is Justified

Three real cases:

Very strict style: the model must write exactly like your brand, with phrasings that prompting can’t capture reliably.
High-volume repetitive task: if you’re making 10M calls/month to Sonnet for simple classification, fine-tuning Haiku can cut the bill by 5×.
Confidential proprietary data: examples can’t be sent with every call for legal reasons.

Start with prompt engineering. Test on 50 real examples.
If results fall short due to knowledge gaps → RAG.
If RAG is too slow / too expensive at scale, or style is critical → fine-tuning.

You’ll rarely reach step 3.

SC

Stéphanie Caumont

AI Product Owner · Learn more