allura-org
/

Tlacuilo-12B

Model card Files Files and versions

ToastyPigeon commited on Oct 15

Commit

6d3fb0d

·

verified ·

1 Parent(s): f1615de

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -24,11 +24,13 @@ The training process went like this:
 Stage 1:
   - Data: A bunch of books (chosen for their prose/writing style). About 28M tokens per epoch.
-  - r32/a32 QLoRA at 32k context, applied only to the QKV tensors. LR: 1e-5, 2 epochs.
 Stage 2:
   - Trained on top of Stage 1.
   - Data: RP data, about 4M tokens.
   - r32/a32 QLoRA at 16k context, applied to `o_proj` and `down_proj` only. LR: 5e-6, 1 epoch.
 Stage 3:
   - Trained on top of Stage 2.
   - Data: Instruct data (1000 random samples from koto-instruct-sft), about 1.2M tokens.

 Stage 1:
   - Data: A bunch of books (chosen for their prose/writing style). About 28M tokens per epoch.
+  - r32/a32 QLoRA at 32k context, applied only to the QKV tensors. LR: 1e-5, 2 epochs.
 Stage 2:
   - Trained on top of Stage 1.
   - Data: RP data, about 4M tokens.
   - r32/a32 QLoRA at 16k context, applied to `o_proj` and `down_proj` only. LR: 5e-6, 1 epoch.
 Stage 3:
   - Trained on top of Stage 2.
   - Data: Instruct data (1000 random samples from koto-instruct-sft), about 1.2M tokens.