Safetensors
mistral
ToastyPigeon commited on
Commit
6d3fb0d
·
verified ·
1 Parent(s): f1615de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -24,11 +24,13 @@ The training process went like this:
24
 
25
  Stage 1:
26
  - Data: A bunch of books (chosen for their prose/writing style). About 28M tokens per epoch.
27
- - r32/a32 QLoRA at 32k context, applied only to the QKV tensors. LR: 1e-5, 2 epochs.
 
28
  Stage 2:
29
  - Trained on top of Stage 1.
30
  - Data: RP data, about 4M tokens.
31
  - r32/a32 QLoRA at 16k context, applied to `o_proj` and `down_proj` only. LR: 5e-6, 1 epoch.
 
32
  Stage 3:
33
  - Trained on top of Stage 2.
34
  - Data: Instruct data (1000 random samples from koto-instruct-sft), about 1.2M tokens.
 
24
 
25
  Stage 1:
26
  - Data: A bunch of books (chosen for their prose/writing style). About 28M tokens per epoch.
27
+ - r32/a32 QLoRA at 32k context, applied only to the QKV tensors. LR: 1e-5, 2 epochs.
28
+
29
  Stage 2:
30
  - Trained on top of Stage 1.
31
  - Data: RP data, about 4M tokens.
32
  - r32/a32 QLoRA at 16k context, applied to `o_proj` and `down_proj` only. LR: 5e-6, 1 epoch.
33
+
34
  Stage 3:
35
  - Trained on top of Stage 2.
36
  - Data: Instruct data (1000 random samples from koto-instruct-sft), about 1.2M tokens.