Files changed (1) hide show
  1. README.md +116 -154
README.md CHANGED
@@ -1,86 +1,101 @@
1
  ---
 
2
  language:
3
  - en
4
  - es
5
  - it
6
  - de
7
  - fr
8
- license: apache-2.0
 
9
  ---
10
 
11
  # Model Card for Mixtral-8x22B-Instruct-v0.1
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
- ## Encode and Decode with `mistral_common`
15
-
16
  ```py
 
 
 
17
  from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
18
  from mistral_common.protocol.instruct.messages import UserMessage
19
  from mistral_common.protocol.instruct.request import ChatCompletionRequest
20
-
21
- mistral_models_path = "MISTRAL_MODELS_PATH"
22
-
23
- tokenizer = MistralTokenizer.v3()
24
-
 
 
25
  completion_request = ChatCompletionRequest(messages=[UserMessage(content="Explain Machine Learning to me in a nutshell.")])
26
-
27
  tokens = tokenizer.encode_chat_completion(completion_request).tokens
28
- ```
29
-
30
- ## Inference with `mistral_inference`
31
-
32
- ```py
33
- from mistral_inference.model import Transformer
34
- from mistral_inference.generate import generate
35
-
36
- model = Transformer.from_folder(mistral_models_path)
37
- out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
38
 
39
- result = tokenizer.decode(out_tokens[0])
 
40
 
41
  print(result)
42
  ```
43
 
44
- ## Inference with hugging face `transformers`
45
-
46
- ```py
47
- from transformers import AutoModelForCausalLM
48
-
49
- model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x22B-Instruct-v0.1")
50
- model.to("cuda")
51
-
52
- generated_ids = model.generate(tokens, max_new_tokens=1000, do_sample=True)
53
 
54
- # decode with mistral tokenizer
55
- result = tokenizer.decode(generated_ids[0].tolist())
56
- print(result)
57
- ```
58
 
59
- > [!TIP]
60
- > PRs to correct the `transformers` tokenizer so that it gives 1-to-1 the same results as the `mistral_common` reference implementation are very welcome!
61
-
62
- ---
63
- The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the [Mixtral-8x22B-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-v0.1).
64
-
65
- ## Run the model
66
- ```python
67
- from transformers import AutoModelForCausalLM
68
- from mistral_common.protocol.instruct.messages import (
69
- AssistantMessage,
70
- UserMessage,
71
- )
72
- from mistral_common.protocol.instruct.tool_calls import (
73
- Tool,
74
- Function,
75
- )
76
  from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
77
- from mistral_common.tokens.instruct.normalize import ChatCompletionRequest
 
 
78
 
79
- device = "cuda" # the device to load the model onto
 
80
 
81
- tokenizer_v3 = MistralTokenizer.v3()
82
 
83
- mistral_query = ChatCompletionRequest(
84
  tools=[
85
  Tool(
86
  function=Function(
@@ -105,126 +120,73 @@ mistral_query = ChatCompletionRequest(
105
  )
106
  ],
107
  messages=[
108
- UserMessage(content="What's the weather like today in Paris"),
109
- ],
110
- model="test",
111
  )
112
 
113
- encodeds = tokenizer_v3.encode_chat_completion(mistral_query).tokens
114
- model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x22B-Instruct-v0.1")
115
- model_inputs = encodeds.to(device)
116
- model.to(device)
117
 
118
- generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
119
- sp_tokenizer = tokenizer_v3.instruct_tokenizer.tokenizer
120
- decoded = sp_tokenizer.decode(generated_ids[0])
121
- print(decoded)
 
 
122
  ```
123
- Alternatively, you can run this example with the Hugging Face tokenizer.
124
- To use this example, you'll need transformers version 4.39.0 or higher.
125
- ```console
126
- pip install transformers==4.39.0
127
  ```
128
- ```python
129
- from transformers import AutoModelForCausalLM, AutoTokenizer
130
 
131
- model_id = "mistralai/Mixtral-8x22B-Instruct-v0.1"
132
- tokenizer = AutoTokenizer.from_pretrained(model_id)
133
- conversation=[
134
- {"role": "user", "content": "What's the weather like in Paris?"},
135
- {
136
- "role": "tool_calls",
137
- "content": [
138
- {
139
- "name": "get_current_weather",
140
- "arguments": {"location": "Paris, France", "format": "celsius"},
141
-
142
- }
143
- ]
144
- },
145
- {
146
- "role": "tool_results",
147
- "content": {"content": 22}
148
- },
149
- {"role": "assistant", "content": "The current temperature in Paris, France is 22 degrees Celsius."},
150
- {"role": "user", "content": "What about San Francisco?"}
151
- ]
152
 
 
 
 
 
 
 
 
153
 
154
- tools = [{"type": "function", "function": {"name":"get_current_weather", "description": "Get▁the▁current▁weather", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "format": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The temperature unit to use. Infer this from the users location."}},"required":["location","format"]}}}]
 
 
 
155
 
156
- # render the tool use prompt as a string:
157
- tool_use_prompt = tokenizer.apply_chat_template(
158
- conversation,
159
- chat_template="tool_use",
160
- tools=tools,
161
  tokenize=False,
162
  add_generation_prompt=True,
163
-
164
  )
165
- model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x22B-Instruct-v0.1")
166
 
167
  inputs = tokenizer(tool_use_prompt, return_tensors="pt")
168
 
169
- outputs = model.generate(**inputs, max_new_tokens=20)
170
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
171
- ```
172
-
173
- # Instruct tokenizer
174
- The HuggingFace tokenizer included in this release should match our own. To compare:
175
- `pip install mistral-common`
176
 
 
 
 
177
  ```py
178
- from mistral_common.protocol.instruct.messages import (
179
- AssistantMessage,
180
- UserMessage,
181
- )
182
- from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
183
- from mistral_common.tokens.instruct.normalize import ChatCompletionRequest
184
 
185
- from transformers import AutoTokenizer
186
-
187
- tokenizer_v3 = MistralTokenizer.v3()
 
 
 
188
 
189
- mistral_query = ChatCompletionRequest(
190
- messages=[
191
- UserMessage(content="How many experts ?"),
192
- AssistantMessage(content="8"),
193
- UserMessage(content="How big ?"),
194
- AssistantMessage(content="22B"),
195
- UserMessage(content="Noice 🎉 !"),
196
- ],
197
- model="test",
198
- )
199
- hf_messages = mistral_query.model_dump()['messages']
200
 
201
- tokenized_mistral = tokenizer_v3.encode_chat_completion(mistral_query).tokens
202
 
203
- tokenizer_hf = AutoTokenizer.from_pretrained('mistralai/Mixtral-8x22B-Instruct-v0.1')
204
- tokenized_hf = tokenizer_hf.apply_chat_template(hf_messages, tokenize=True)
 
205
 
206
- assert tokenized_hf == tokenized_mistral
207
- ```
208
 
209
- # Function calling and special tokens
210
- This tokenizer includes more special tokens, related to function calling :
211
- - [TOOL_CALLS]
212
- - [AVAILABLE_TOOLS]
213
- - [/AVAILABLE_TOOLS]
214
- - [TOOL_RESULTS]
215
- - [/TOOL_RESULTS]
216
-
217
- If you want to use this model with function calling, please be sure to apply it similarly to what is done in our [SentencePieceTokenizerV3](https://github.com/mistralai/mistral-common/blob/main/src/mistral_common/tokens/tokenizers/sentencepiece.py#L299).
218
-
219
- # The Mistral AI Team
220
- Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux,
221
- Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault,
222
- Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot,
223
- Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger,
224
- Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona,
225
- Jean-Malo Delignon, Jia Li, Justus Murke, Louis Martin, Louis Ternon,
226
- Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat,
227
- Marie Torelli, Marie-Anne Lachaux, Nicolas Schuhl, Patrick von Platen,
228
- Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao,
229
- Thibaut Lavril, Timothée Lacroix, Théophile Gervet, Thomas Wang,
230
- Valera Nemychnikova, William El Sayed, William Marshall
 
1
  ---
2
+ license: apache-2.0
3
  language:
4
  - en
5
  - es
6
  - it
7
  - de
8
  - fr
9
+ tags:
10
+ - moe
11
  ---
12
 
13
  # Model Card for Mixtral-8x22B-Instruct-v0.1
14
 
15
+ The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1.
16
+
17
+ Mixtral-8x22B-v0.1 has the following characteristics:
18
+ - 140.6B parameters
19
+ - 39.1B active parameters
20
+ - 64k context window
21
+ - 32768 vocab size
22
+ - Supports function calling
23
+
24
+ ## How to use
25
+
26
+ It is recommended to use `mistralai/Mixtral-8x22B-Instruct-v0.1` with [mistral_inference](https://github.com/mistralai/mistral-inference) and [mistral_common](https://github.com/mistralai/mistral-common). For HF `transformers` code snippets, please keep scrolling.
27
+
28
+ ## Generate with `mistral_inference` and `mistral_common`
29
+
30
+ ### Install dependencies
31
+ ```
32
+ pip install mistral_inference mistral_common
33
+ ```
34
+
35
+ ### Download model
36
+
37
+ ```py
38
+ from huggingface_hub import snapshot_download
39
+ from pathlib import Path
40
+
41
+ mistral_models_path = Path.home().joinpath('mistral_models', '8x22B-Instruct-v0.1')
42
+ mistral_models_path.mkdir(parents=True, exist_ok=True)
43
+
44
+ snapshot_download(repo_id="mistralai/Mixtral-8x22B-Instruct-v0.1", allow_patterns=["params.json", "consolidated.safetensors", "tokenizer.model.v3"], local_dir=mistral_models_path)
45
+ ```
46
+
47
+ ### Chat
48
+
49
+ After installing `mistral_inference`, a `mistral-chat` CLI command should be available in your environment. You can chat with the model using
50
+
51
+ ```
52
+ mistral-chat $HOME/mistral_models/8x22B-Instruct-v0.1 --instruct --max_tokens 256
53
+ ```
54
+
55
+ ### Instruct following
56
 
 
 
57
  ```py
58
+ from mistral_inference.model import Transformer
59
+ from mistral_inference.generate import generate
60
+
61
  from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
62
  from mistral_common.protocol.instruct.messages import UserMessage
63
  from mistral_common.protocol.instruct.request import ChatCompletionRequest
64
+
65
+
66
+ tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
67
+ # tokenizer = MistralTokenizer.v3()
68
+
69
+ model = Transformer.from_folder(mistral_models_path)
70
+
71
  completion_request = ChatCompletionRequest(messages=[UserMessage(content="Explain Machine Learning to me in a nutshell.")])
72
+
73
  tokens = tokenizer.encode_chat_completion(completion_request).tokens
 
 
 
 
 
 
 
 
 
 
74
 
75
+ out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
76
+ result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
77
 
78
  print(result)
79
  ```
80
 
81
+ ### Function calling
 
 
 
 
 
 
 
 
82
 
83
+ ```py
84
+ from mistral_common.protocol.instruct.tool_calls import Function, Tool
85
+ from mistral_inference.model import Transformer
86
+ from mistral_inference.generate import generate
87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
  from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
89
+ from mistral_common.protocol.instruct.messages import UserMessage
90
+ from mistral_common.protocol.instruct.request import ChatCompletionRequest
91
+
92
 
93
+ tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
94
+ # tokenizer = MistralTokenizer.v3()
95
 
96
+ model = Transformer.from_folder(mistral_models_path)
97
 
98
+ completion_request = ChatCompletionRequest(
99
  tools=[
100
  Tool(
101
  function=Function(
 
120
  )
121
  ],
122
  messages=[
123
+ UserMessage(content="What's the weather like today in Paris?"),
124
+ ],
 
125
  )
126
 
127
+ tokens = tokenizer.encode_chat_completion(completion_request).tokens
128
+
129
+ out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
130
+ result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
131
 
132
+ print(result)
133
+ ```
134
+
135
+ ## Generate with `transformers`
136
+
137
+ ### Install dependencies
138
  ```
139
+ pip install transformers
 
 
 
140
  ```
 
 
141
 
142
+ ### Instruct following
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
143
 
144
+ ```py
145
+ from transformers import AutoModelForCausalLM, AutoTokenizer
146
+
147
+ model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x22B-Instruct-v0.1")
148
+ model.to("cuda")
149
+
150
+ tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x22B-Instruct-v0.1")
151
 
152
+ messages = [
153
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
154
+ {"role": "user", "content": "Who are you?"},
155
+ ]
156
 
157
+ messages_prompt = tokenizer.apply_chat_template(
158
+ messages,
 
 
 
159
  tokenize=False,
160
  add_generation_prompt=True,
 
161
  )
 
162
 
163
  inputs = tokenizer(tool_use_prompt, return_tensors="pt")
164
 
165
+ outputs = model.generate(**inputs, max_new_tokens=1000)
166
+ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
 
 
 
 
 
167
 
168
+ print(result)
169
+ ```
170
+ Or:
171
  ```py
172
+ from transformers import pipeline
 
 
 
 
 
173
 
174
+ messages = [
175
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
176
+ {"role": "user", "content": "Who are you?"},
177
+ ]
178
+ chatbot = pipeline("text-generation", model="mistralai/Mixtral-8x22B-Instruct-v0.1")
179
+ result = chatbot(messages)
180
 
181
+ print(result)
182
+ ```
 
 
 
 
 
 
 
 
 
183
 
184
+ ## Limitations
185
 
186
+ The Mistral 8x22B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance.
187
+ It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to
188
+ make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
189
 
190
+ ## The Mistral AI Team
 
191
 
192
+ Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Jean-Malo Delignon, Jia Li, Justus Murke, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Nicolas Schuhl, Patrick von Platen, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Timothée Lacroix, Théophile Gervet, Thomas Wang, Valera Nemychnikova, William El Sayed, William Marshall