RedHatAI
/

Devstral-Small-2507-FP8-Dynamic

Text Generation

compressed-tensors

Model card Files Files and versions

ekurtic commited on Aug 28

Commit

4cd653b

·

verified ·

1 Parent(s): ab359a8

Upload folder using huggingface_hub

Files changed (1) hide show

README.md +68 -3

README.md CHANGED Viewed

@@ -1,3 +1,68 @@
----
-license: apache-2.0
----

+---
+language:
+- en
+base_model:
+- mistralai/Devstral-Small-2507
+pipeline_tag: text-generation
+tags:
+- mistral
+- neuralmagic
+- redhat
+- llmcompressor
+- quantized
+- FP8
+- compressed-tensors
+license: mit
+license_name: mit
+name: RedHatAI/Devstral-Small-2507
+description: This model was obtained by quantizing weights and activations of Devstral-Small-2507 to FP8 data type.
+readme: https://huggingface.co/RedHatAI/Devstral-Small-2507-FP8-Dynamic/main/README.md
+tasks:
+- text-to-text
+provider: mistralai
+---
+# Devstral-Small-2507-FP8-Dynamic
+## Model Overview
+- **Model Architecture:** MistralForCausalLM
+  - **Input:** Text
+  - **Output:** Text
+- **Model Optimizations:**
+  - **Activation quantization:** FP8
+  - **Weight quantization:** FP8
+- **Release Date:** 08/28/2025
+- **Version:** 1.0
+- **Model Developers:** Red Hat (Neural Magic)
+### Model Optimizations
+This model was obtained by quantizing weights and activations of [Devstral-Small-2507](https://huggingface.co/mistralai/Devstral-Small-2507) to FP8 data type.
+This optimization reduces the number of bits used to represent weights and activations from 16 to 8, reducing GPU memory requirements (by approximately 50%).
+Weight quantization also reduces disk size requirements by approximately 50%.
+## Deployment
+This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.
+```bash
+vllm serve RedHatAI/Devstral-Small-2507-FP8-Dynamic --tensor-parallel-size 1 --tokenizer_mode mistral
+```
+## Evaluation
+The model was evaluated on popular coding tasks (HumanEval, HumanEval+, MBPP, MBPP+) via [EvalPlus](https://github.com/evalplus/evalplus) and vllm backend (v0.10.1.1).
+For evaluations, we run greedy sampling and report pass@1
+### Accuracy
+|                             | Recovery (%) | mistralai/Devstral-Small-2507 | RedHatAI/Devstral-Small-2507-FP8-Dynamic<br>(this model) |
+| --------------------------- | :----------: | :------------------: | :--------------------------------------------------: |
+| HumanEval                   | 98.50        | 89.0                | 89.6                                                |
+| HumanEval+                  | 99.88        | 81.1                | 82.9                                                |
+| MBPP                        | 101.21       | 77.5                | 75.4                                                |
+| MBPP+                       | 101.21       | 66.1                | 64.8                                                |
+| **Average Score**           | **99.68**    | **78.43**            | **78.18**                                            |