Update README.md
Browse files
README.md
CHANGED
|
@@ -24,11 +24,13 @@ The training process went like this:
|
|
| 24 |
|
| 25 |
Stage 1:
|
| 26 |
- Data: A bunch of books (chosen for their prose/writing style). About 28M tokens per epoch.
|
| 27 |
-
- r32/a32 QLoRA at 32k context, applied only to the QKV tensors. LR: 1e-5, 2 epochs.
|
|
|
|
| 28 |
Stage 2:
|
| 29 |
- Trained on top of Stage 1.
|
| 30 |
- Data: RP data, about 4M tokens.
|
| 31 |
- r32/a32 QLoRA at 16k context, applied to `o_proj` and `down_proj` only. LR: 5e-6, 1 epoch.
|
|
|
|
| 32 |
Stage 3:
|
| 33 |
- Trained on top of Stage 2.
|
| 34 |
- Data: Instruct data (1000 random samples from koto-instruct-sft), about 1.2M tokens.
|
|
|
|
| 24 |
|
| 25 |
Stage 1:
|
| 26 |
- Data: A bunch of books (chosen for their prose/writing style). About 28M tokens per epoch.
|
| 27 |
+
- r32/a32 QLoRA at 32k context, applied only to the QKV tensors. LR: 1e-5, 2 epochs.
|
| 28 |
+
|
| 29 |
Stage 2:
|
| 30 |
- Trained on top of Stage 1.
|
| 31 |
- Data: RP data, about 4M tokens.
|
| 32 |
- r32/a32 QLoRA at 16k context, applied to `o_proj` and `down_proj` only. LR: 5e-6, 1 epoch.
|
| 33 |
+
|
| 34 |
Stage 3:
|
| 35 |
- Trained on top of Stage 2.
|
| 36 |
- Data: Instruct data (1000 random samples from koto-instruct-sft), about 1.2M tokens.
|