CRMA: Stable Fine-Tuning + Continual Learning for Small LLMs
We’ve been building CRMA (Constrained Residual Mixing Adapter) — a small adapter that attaches to every layer of a
language model during fine-tuning. It applies a mathematical constraint that keeps training stable: the model can
learn new information but can’t overwrite what it already knows.
Inspired by “mHC: Manifold-Constrained Hyper-Connections” (arXiv:2512.24880) by Zhenda Xie, Yixuan Wei, et al. — not
equivalent.
What it does — two capabilities:
- Fine-tuning stability
- Peak gradient norm reduced 39–84% vs standard LoRA
- Near-identity initialization — no cold-start collapse
- Works with QLoRA (4-bit) on TinyLlama-1.1B, Mistral-7B, Gemma-2B
- All stability claims are empirically measured per run, not theoretical
- Continual learning
- Train sequentially on multiple domains — medical, legal, code, finance
- -0.1% backbone drift across 4 domains (vs +351% catastrophic forgetting with naive sequential training)
- Each domain gets its own adapter; the shared backbone stays stable
- No replay buffers, no growing memory — swap adapters at inference
Measured on:
- TinyLlama-1.1B-Chat (1.1B params, Apache 2.0)
- Mistral-7B-v0.3 (7B params, Apache 2.0)
- Modal A10G GPU
Try it:
- API: https://fourwheels2512--crma-finetune-fastapi-app.modal.run
- Free tier: 3 runs/day on TinyLlama, no credit card needed
- Pro: pay-as-you-go credits, starting at $5
The fine-tuning API is live. Continual learning is available via the /start_cl_run endpoint — bring a base fine-tuned
run and add new domains without losing previous ones.
Built by Kiran Nayudu. Feedback welcome.