patrickvonplaten commited on
Commit
8dc72ef
·
verified ·
1 Parent(s): 3727ce4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md CHANGED
@@ -45,6 +45,39 @@ Learn more about Mistral Small in our [blog post](https://mistral.ai/news/mistra
45
  - **System Prompt:** Maintains strong adherence and support for system prompts.
46
  - **Tokenizer:** Utilizes a Tekken tokenizer with a 131k vocabulary size.
47
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  ### Basic Instruct Template (V7-Tekken)
49
 
50
  ```
 
45
  - **System Prompt:** Maintains strong adherence and support for system prompts.
46
  - **Tokenizer:** Utilizes a Tekken tokenizer with a 131k vocabulary size.
47
 
48
+ ## Benchmark results
49
+
50
+
51
+ ### Human evaluated benchmarks
52
+
53
+ TODO:
54
+
55
+ ### Publicly accesible benchmarks
56
+
57
+ **Reasoning & Knowledge**
58
+
59
+ | Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
60
+ |------------|---------------|--------------|---------------|---------------|-------------|
61
+ | mmlu_pro_5shot_cot_instruct | 0.663 | 0.536 | 0.666 | 0.683 | 0.617 |
62
+ | gpqa_main_cot_5shot_instruct | 0.453 | 0.344 | 0.531 | 0.404 | 0.377 |
63
+
64
+ **Math & Coding**
65
+
66
+ | Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
67
+ |------------|---------------|--------------|---------------|---------------|-------------|
68
+ | humaneval_instruct_pass@1 | 0.848 | 0.732 | 0.854 | 0.909 | 0.890 |
69
+ | math_instruct | 0.706 | 0.535 | 0.743 | 0.819 | 0.761 |
70
+ | aime_instruct_maj@16 | 0.133 | 0.067 | 0.2333 | 0.100 | 0.100 |
71
+
72
+ **Instruction following**
73
+
74
+ | Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
75
+ |------------|---------------|--------------|---------------|---------------|-------------|
76
+ | mtbench_dev | 8.35 | 7.86 | 7.96 | 8.26 | 8.33 |
77
+ | wildbench | 52.27 | 48.21 | 50.04 | 52.73 | 56.13 |
78
+ | arena_hard | 0.873 | 0.788 | 0.840 | 0.860 | 0.897 |
79
+ | ifeval | 0.829 | 0.8065 | 0.8835 | 0.8401 | 0.8499 |
80
+
81
  ### Basic Instruct Template (V7-Tekken)
82
 
83
  ```