Intel
/

deepmath-v1

Text Generation

reinforcement-learning

text-generation-inference

Model card Files Files and versions

danf commited on 7 days ago

Commit

7a7885d

·

verified ·

1 Parent(s): bdd9de5

Figures

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ pipeline_tag: text-generation
 # DeepMath-v1: A Lightweight Math Reasoning Agent
-<img src="assets/deepmath-figure.jpg" style="width:600px" alt="An LLM is using a calculator to answer questions." />
 ## Model Description
@@ -51,7 +51,7 @@ DeepMath-v1 uses a LoRA adapter fine-tuned on top of Qwen3-4B Thinking with the
 - **Training Method:** GRPO with accuracy and code generation rewards
 <figure>
-<img src="assets/trl-grpo-vllm-deepmath.png" style="width:400px" alt="Changes to vLLM client and server in TRL library." />
 <figcaption><p><em>Figure 1: The vLLM client and server were modified to use the DeepMath agent in generating the candidates, while using the vLLM backend.</em></p></figcaption>
 </figure>
@@ -85,7 +85,7 @@ DeepMath-v1 uses a LoRA adapter fine-tuned on top of Qwen3-4B Thinking with the
 We evaluated DeepMath on four mathematical reasoning datasets using **majority@16** and mean output length metrics:
-<img src="assets/main-results.png" style="width:800px" alt="Main results table showing performance across MATH500, AIME, HMMT, and HLE datasets."/>
 **Key Findings:**
@@ -101,7 +101,7 @@ We evaluated DeepMath on four mathematical reasoning datasets using **majority@1
 - **HLE:** High-level exam problems
 <figure>
-<img src="assets/output-example.png" style="width:700px" alt="Output example showing Python code generation and execution." />
 <figcaption><p><em>Figure 2: Example output where Python code is generated, evaluated, and the result is inserted into the reasoning trace.</em></p></figcaption>
 </figure>

 # DeepMath-v1: A Lightweight Math Reasoning Agent
+<img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/ndb_WmPavW1MONAjsGpYT.jpeg" style="width:600px" alt="An LLM is using a calculator to answer questions." />
 ## Model Description
 - **Training Method:** GRPO with accuracy and code generation rewards
 <figure>
+<img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/zOcvJ2DY61QZyozarsKbT.png" style="width:400px" alt="Changes to vLLM client and server in TRL library." />
 <figcaption><p><em>Figure 1: The vLLM client and server were modified to use the DeepMath agent in generating the candidates, while using the vLLM backend.</em></p></figcaption>
 </figure>
 We evaluated DeepMath on four mathematical reasoning datasets using **majority@16** and mean output length metrics:
+<img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/mBuINzNvjDKdZEuIqzJeO.png" style="width:800px" alt="Main results table showing performance across MATH500, AIME, HMMT, and HLE datasets."/>
 **Key Findings:**
 - **HLE:** High-level exam problems
 <figure>
+<img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/a-kn3oHdlxTP_L-63N9LX.png" style="width:700px" alt="Output example showing Python code generation and execution." />
 <figcaption><p><em>Figure 2: Example output where Python code is generated, evaluated, and the result is inserted into the reasoning trace.</em></p></figcaption>
 </figure>