amanrangapur commited on
Commit
58dc3e9
·
verified ·
1 Parent(s): 183c6d1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -3
README.md CHANGED
@@ -78,7 +78,7 @@ We have released checkpoints for these models. For pretraining, the naming conve
78
 
79
  To load a specific model revision with HuggingFace, simply add the argument `revision`:
80
  ```bash
81
- olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-0425-1B", revision="step250000-tokens2098B")
82
  ```
83
 
84
  Or, you can access all the revisions for the models via the following code snippet:
@@ -89,7 +89,19 @@ branches = [b.name for b in out.branches]
89
  ```
90
 
91
  ### Fine-tuning
92
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
93
 
94
 
95
  ### Model Description
@@ -123,7 +135,7 @@ TODO
123
  |-------------------|------------|------------|------------|------------|
124
  | Pretraining Stage 1 | 4 trillion tokens<br>(1 epoch) | 4 trillion tokens<br>(1 epoch) | 5 trillion tokens<br>(1.2 epochs) | 6 trillion tokens<br>(1.5 epochs) |
125
  | Pretraining Stage 2 | 50B tokens (3 runs)<br>*merged* | 50B tokens (3 runs)<br>*merged* | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* |
126
- | Post-training | SFT + DPO + GRPO<br>([preference mix](#)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-7b-preference-mix)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix)) | SFT + DPO + GRPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-32b-pref-mix-v1)) |
127
 
128
  #### Stage 1: Initial Pretraining
129
  - Dataset: [OLMo-mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124) (3.9T tokens)
 
78
 
79
  To load a specific model revision with HuggingFace, simply add the argument `revision`:
80
  ```bash
81
+ olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-0425-1B", revision="stage1-step140000-tokens294B")
82
  ```
83
 
84
  Or, you can access all the revisions for the models via the following code snippet:
 
89
  ```
90
 
91
  ### Fine-tuning
92
+ Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
93
+ 1. Fine-tune with the OLMo repository:
94
+ ```bash
95
+ torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \
96
+ --data.paths=[{path_to_data}/input_ids.npy] \
97
+ --data.label_mask_paths=[{path_to_data}/label_mask.npy] \
98
+ --load_path={path_to_checkpoint} \
99
+ --reset_trainer_state
100
+ ```
101
+ For more documentation, see the [GitHub README](https://github.com/allenai/OLMo/).
102
+
103
+ 2. Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are [here](https://github.com/allenai/open-instruct).
104
+
105
 
106
 
107
  ### Model Description
 
135
  |-------------------|------------|------------|------------|------------|
136
  | Pretraining Stage 1 | 4 trillion tokens<br>(1 epoch) | 4 trillion tokens<br>(1 epoch) | 5 trillion tokens<br>(1.2 epochs) | 6 trillion tokens<br>(1.5 epochs) |
137
  | Pretraining Stage 2 | 50B tokens (3 runs)<br>*merged* | 50B tokens (3 runs)<br>*merged* | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* |
138
+ | Post-training | SFT + DPO + GRPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-0425-1b-preference-mix)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-7b-preference-mix)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix)) | SFT + DPO + GRPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-32b-pref-mix-v1)) |
139
 
140
  #### Stage 1: Initial Pretraining
141
  - Dataset: [OLMo-mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124) (3.9T tokens)