Update README.md
Browse files
README.md
CHANGED
|
@@ -45,6 +45,39 @@ Learn more about Mistral Small in our [blog post](https://mistral.ai/news/mistra
|
|
| 45 |
- **System Prompt:** Maintains strong adherence and support for system prompts.
|
| 46 |
- **Tokenizer:** Utilizes a Tekken tokenizer with a 131k vocabulary size.
|
| 47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
### Basic Instruct Template (V7-Tekken)
|
| 49 |
|
| 50 |
```
|
|
|
|
| 45 |
- **System Prompt:** Maintains strong adherence and support for system prompts.
|
| 46 |
- **Tokenizer:** Utilizes a Tekken tokenizer with a 131k vocabulary size.
|
| 47 |
|
| 48 |
+
## Benchmark results
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
### Human evaluated benchmarks
|
| 52 |
+
|
| 53 |
+
TODO:
|
| 54 |
+
|
| 55 |
+
### Publicly accesible benchmarks
|
| 56 |
+
|
| 57 |
+
**Reasoning & Knowledge**
|
| 58 |
+
|
| 59 |
+
| Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
|
| 60 |
+
|------------|---------------|--------------|---------------|---------------|-------------|
|
| 61 |
+
| mmlu_pro_5shot_cot_instruct | 0.663 | 0.536 | 0.666 | 0.683 | 0.617 |
|
| 62 |
+
| gpqa_main_cot_5shot_instruct | 0.453 | 0.344 | 0.531 | 0.404 | 0.377 |
|
| 63 |
+
|
| 64 |
+
**Math & Coding**
|
| 65 |
+
|
| 66 |
+
| Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
|
| 67 |
+
|------------|---------------|--------------|---------------|---------------|-------------|
|
| 68 |
+
| humaneval_instruct_pass@1 | 0.848 | 0.732 | 0.854 | 0.909 | 0.890 |
|
| 69 |
+
| math_instruct | 0.706 | 0.535 | 0.743 | 0.819 | 0.761 |
|
| 70 |
+
| aime_instruct_maj@16 | 0.133 | 0.067 | 0.2333 | 0.100 | 0.100 |
|
| 71 |
+
|
| 72 |
+
**Instruction following**
|
| 73 |
+
|
| 74 |
+
| Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
|
| 75 |
+
|------------|---------------|--------------|---------------|---------------|-------------|
|
| 76 |
+
| mtbench_dev | 8.35 | 7.86 | 7.96 | 8.26 | 8.33 |
|
| 77 |
+
| wildbench | 52.27 | 48.21 | 50.04 | 52.73 | 56.13 |
|
| 78 |
+
| arena_hard | 0.873 | 0.788 | 0.840 | 0.860 | 0.897 |
|
| 79 |
+
| ifeval | 0.829 | 0.8065 | 0.8835 | 0.8401 | 0.8499 |
|
| 80 |
+
|
| 81 |
### Basic Instruct Template (V7-Tekken)
|
| 82 |
|
| 83 |
```
|