Fine-Tuning LLMs: When to Fine-Tune vs Prompt vs RAG

The most common mistake teams make with LLMs is reaching for fine-tuning too early. Fine-tuning is powerful, but it's a long feedback loop — data curation, training runs, evaluation, deployment. Most problems that seem to require fine-tuning can be solved faster with better prompting or RAG.

The three-way decision: Prompt → RAG → Fine-tune

Approach	Setup time	Iteration speed	Knowledge updates	Style/format control
Prompt engineering	Hours	Minutes	Easy	Good
RAG	Days	Hours	Easy	Good
Fine-tuning	Weeks	Days	Requires retraining	Excellent

When fine-tuning genuinely wins

Consistent output format: You need the model to reliably produce structured JSON, a specific markdown template, or a proprietary format — and prompting alone isn't reliable enough at scale.
Tone and voice: You need the model to write exactly like your brand — not just "formal" or "casual," but your specific voice, vocabulary, and style.
Domain-specific reasoning: Legal, medical, or scientific reasoning that requires the model to internalize patterns not present in general training data.
Latency and cost at scale: A fine-tuned smaller model (e.g., Llama 3.1 8B) can match a larger model's quality for a specific task while being 10× cheaper and faster.
Private/sensitive data: You want the model to incorporate proprietary knowledge that you're not comfortable sending through RAG queries to a third-party API.

Fine-tuning with LoRA in 2026

Low-Rank Adaptation (LoRA) is now the standard approach — it fine-tunes a small fraction of model parameters (typically 0.1–1%), dramatically reducing compute cost while achieving near-full fine-tuning quality.

# Approximate costs for LoRA fine-tuning on common models (2026)
# Llama 3.1 8B: ~$15–50 per run on H100, ~1000 examples minimum
# Llama 3.1 70B: ~$150–500 per run on H100, ~5000 examples recommended
# Claude: instruction tuning via Anthropic fine-tune API (request access)

from moltbot.fine_tuning import LoRATrainer

trainer = LoRATrainer(
    base_model="meta-llama/Llama-3.1-8B-Instruct",
    lora_rank=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
)

trainer.train(
    dataset="./training_data.jsonl",  # {"prompt": ..., "completion": ...}
    epochs=3,
    batch_size=8,
)

model = trainer.export()  # Deploy to MoltBot inference
      

How to prepare training data

Quantity: 100–500 examples for format/style tasks; 1,000–10,000 for domain knowledge.
Quality beats quantity: 200 carefully curated examples beats 2,000 low-quality ones. Clean your data before anything else.
Format: Instruction-response pairs in JSONL. Prompt should match exactly how the model will be called in production.
Evaluation set: Reserve 10–20% of data as held-out eval. Fine-tuning without an eval set is flying blind.

Fine-tuning + RAG + prompting — all on MoltBot

LoRA fine-tuning, managed RAG pipelines, and prompt management. One platform. 14-day free trial.

Start Free Trial →