Update README.md
Browse files
README.md
CHANGED
|
@@ -78,7 +78,7 @@ We have released checkpoints for these models. For pretraining, the naming conve
|
|
| 78 |
|
| 79 |
To load a specific model revision with HuggingFace, simply add the argument `revision`:
|
| 80 |
```bash
|
| 81 |
-
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-0425-1B", revision="
|
| 82 |
```
|
| 83 |
|
| 84 |
Or, you can access all the revisions for the models via the following code snippet:
|
|
@@ -89,7 +89,19 @@ branches = [b.name for b in out.branches]
|
|
| 89 |
```
|
| 90 |
|
| 91 |
### Fine-tuning
|
| 92 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
|
| 94 |
|
| 95 |
### Model Description
|
|
@@ -123,7 +135,7 @@ TODO
|
|
| 123 |
|-------------------|------------|------------|------------|------------|
|
| 124 |
| Pretraining Stage 1 | 4 trillion tokens<br>(1 epoch) | 4 trillion tokens<br>(1 epoch) | 5 trillion tokens<br>(1.2 epochs) | 6 trillion tokens<br>(1.5 epochs) |
|
| 125 |
| Pretraining Stage 2 | 50B tokens (3 runs)<br>*merged* | 50B tokens (3 runs)<br>*merged* | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* |
|
| 126 |
-
| Post-training | SFT + DPO + GRPO<br>([preference mix](
|
| 127 |
|
| 128 |
#### Stage 1: Initial Pretraining
|
| 129 |
- Dataset: [OLMo-mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124) (3.9T tokens)
|
|
|
|
| 78 |
|
| 79 |
To load a specific model revision with HuggingFace, simply add the argument `revision`:
|
| 80 |
```bash
|
| 81 |
+
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-0425-1B", revision="stage1-step140000-tokens294B")
|
| 82 |
```
|
| 83 |
|
| 84 |
Or, you can access all the revisions for the models via the following code snippet:
|
|
|
|
| 89 |
```
|
| 90 |
|
| 91 |
### Fine-tuning
|
| 92 |
+
Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
|
| 93 |
+
1. Fine-tune with the OLMo repository:
|
| 94 |
+
```bash
|
| 95 |
+
torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \
|
| 96 |
+
--data.paths=[{path_to_data}/input_ids.npy] \
|
| 97 |
+
--data.label_mask_paths=[{path_to_data}/label_mask.npy] \
|
| 98 |
+
--load_path={path_to_checkpoint} \
|
| 99 |
+
--reset_trainer_state
|
| 100 |
+
```
|
| 101 |
+
For more documentation, see the [GitHub README](https://github.com/allenai/OLMo/).
|
| 102 |
+
|
| 103 |
+
2. Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are [here](https://github.com/allenai/open-instruct).
|
| 104 |
+
|
| 105 |
|
| 106 |
|
| 107 |
### Model Description
|
|
|
|
| 135 |
|-------------------|------------|------------|------------|------------|
|
| 136 |
| Pretraining Stage 1 | 4 trillion tokens<br>(1 epoch) | 4 trillion tokens<br>(1 epoch) | 5 trillion tokens<br>(1.2 epochs) | 6 trillion tokens<br>(1.5 epochs) |
|
| 137 |
| Pretraining Stage 2 | 50B tokens (3 runs)<br>*merged* | 50B tokens (3 runs)<br>*merged* | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* |
|
| 138 |
+
| Post-training | SFT + DPO + GRPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-0425-1b-preference-mix)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-7b-preference-mix)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix)) | SFT + DPO + GRPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-32b-pref-mix-v1)) |
|
| 139 |
|
| 140 |
#### Stage 1: Initial Pretraining
|
| 141 |
- Dataset: [OLMo-mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124) (3.9T tokens)
|