Update README.md
Browse files
README.md
CHANGED
|
@@ -70,7 +70,7 @@ It is finetuned from [Mistral-Small-3.1](https://huggingface.co/mistralai/Mistra
|
|
| 70 |
|
| 71 |
For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.
|
| 72 |
|
| 73 |
-
Learn more about Devstral in our [blog post](https://mistral.ai/news/devstral).
|
| 74 |
|
| 75 |
**Updates compared to [`Devstral Small 1.0`](https://huggingface.co/mistralai/Devstral-Small-2505):**
|
| 76 |
- Improved performance, please refer to the [benchmark results](#benchmark-results).
|
|
@@ -90,11 +90,11 @@ Learn more about Devstral in our [blog post](https://mistral.ai/news/devstral).
|
|
| 90 |
|
| 91 |
### SWE-Bench
|
| 92 |
|
| 93 |
-
Devstral Small 1.1 achieves a score of **
|
| 94 |
|
| 95 |
| Model | Agentic Scaffold | SWE-Bench Verified (%) |
|
| 96 |
|--------------------|--------------------|------------------------|
|
| 97 |
-
| Devstral Small 1.1 | OpenHands Scaffold | **
|
| 98 |
| Devstral Small 1.0 | OpenHands Scaffold | *46.8* |
|
| 99 |
| GPT-4.1-mini | OpenAI Scaffold | 23.6 |
|
| 100 |
| Claude 3.5 Haiku | Anthropic Scaffold | 40.6 |
|
|
@@ -539,4 +539,4 @@ Finally, the game is ready to be played:
|
|
| 539 |
|
| 540 |

|
| 541 |
|
| 542 |
-
Don't hesitate to iterate or give more information to Devstral to improve the game!
|
|
|
|
| 70 |
|
| 71 |
For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.
|
| 72 |
|
| 73 |
+
Learn more about Devstral in our [blog post](https://mistral.ai/news/devstral-2507).
|
| 74 |
|
| 75 |
**Updates compared to [`Devstral Small 1.0`](https://huggingface.co/mistralai/Devstral-Small-2505):**
|
| 76 |
- Improved performance, please refer to the [benchmark results](#benchmark-results).
|
|
|
|
| 90 |
|
| 91 |
### SWE-Bench
|
| 92 |
|
| 93 |
+
Devstral Small 1.1 achieves a score of **53.6%** on SWE-Bench Verified, outperforming Devstral Small 1.0 by +6,8% and the second best state of the art model by +11.4%.
|
| 94 |
|
| 95 |
| Model | Agentic Scaffold | SWE-Bench Verified (%) |
|
| 96 |
|--------------------|--------------------|------------------------|
|
| 97 |
+
| Devstral Small 1.1 | OpenHands Scaffold | **53.6** |
|
| 98 |
| Devstral Small 1.0 | OpenHands Scaffold | *46.8* |
|
| 99 |
| GPT-4.1-mini | OpenAI Scaffold | 23.6 |
|
| 100 |
| Claude 3.5 Haiku | Anthropic Scaffold | 40.6 |
|
|
|
|
| 539 |
|
| 540 |

|
| 541 |
|
| 542 |
+
Don't hesitate to iterate or give more information to Devstral to improve the game!
|