YikangS commited on
Commit
d1866e2
·
1 Parent(s): bddcb61

update readme

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -5,7 +5,7 @@ license: apache-2.0
5
  MoLM is a collection of MoE-based language models ranging in scale from 4 billion to 8 billion parameters. This is the repository for the 4B pretrained model, converted for the Hugging Face Transformers format. Links to other models can be found in the index at the bottom.
6
 
7
  **Model Usage**
8
- To load the model, you need install the [ModuleFormer package](github.com/IBM/ModuleFormer). Then you can load the model with the following code:
9
  ```
10
  from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig, AutoModelForSequenceClassification
11
  from moduleformer import ModuleFormerForCausalLM, ModuleFormerConfig, ModuleFormerForSequenceClassification
@@ -34,7 +34,7 @@ Both models are trained on 300 billion tokens from publicly available sources, w
34
 
35
  **Status** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback.
36
 
37
- **Research Paper** ["ModuleFormer: Modularity Emerges from Mixture-of-Experts"](arxiv.org/abs/2306.04640)
38
 
39
  ## Training Data
40
  MoLM was pretrained on 300 billion tokens of data from publicly available sources.
 
5
  MoLM is a collection of MoE-based language models ranging in scale from 4 billion to 8 billion parameters. This is the repository for the 4B pretrained model, converted for the Hugging Face Transformers format. Links to other models can be found in the index at the bottom.
6
 
7
  **Model Usage**
8
+ To load the model, you need install the [ModuleFormer package](https://github.com/IBM/ModuleFormer). Then you can load the model with the following code:
9
  ```
10
  from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig, AutoModelForSequenceClassification
11
  from moduleformer import ModuleFormerForCausalLM, ModuleFormerConfig, ModuleFormerForSequenceClassification
 
34
 
35
  **Status** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback.
36
 
37
+ **Research Paper** ["ModuleFormer: Modularity Emerges from Mixture-of-Experts"](https://arxiv.org/abs/2306.04640)
38
 
39
  ## Training Data
40
  MoLM was pretrained on 300 billion tokens of data from publicly available sources.