danf commited on
Commit
7a7885d
·
verified ·
1 Parent(s): bdd9de5
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -20,7 +20,7 @@ pipeline_tag: text-generation
20
 
21
  # DeepMath-v1: A Lightweight Math Reasoning Agent
22
 
23
- <img src="assets/deepmath-figure.jpg" style="width:600px" alt="An LLM is using a calculator to answer questions." />
24
 
25
  ## Model Description
26
 
@@ -51,7 +51,7 @@ DeepMath-v1 uses a LoRA adapter fine-tuned on top of Qwen3-4B Thinking with the
51
  - **Training Method:** GRPO with accuracy and code generation rewards
52
 
53
  <figure>
54
- <img src="assets/trl-grpo-vllm-deepmath.png" style="width:400px" alt="Changes to vLLM client and server in TRL library." />
55
  <figcaption><p><em>Figure 1: The vLLM client and server were modified to use the DeepMath agent in generating the candidates, while using the vLLM backend.</em></p></figcaption>
56
  </figure>
57
 
@@ -85,7 +85,7 @@ DeepMath-v1 uses a LoRA adapter fine-tuned on top of Qwen3-4B Thinking with the
85
 
86
  We evaluated DeepMath on four mathematical reasoning datasets using **majority@16** and mean output length metrics:
87
 
88
- <img src="assets/main-results.png" style="width:800px" alt="Main results table showing performance across MATH500, AIME, HMMT, and HLE datasets."/>
89
 
90
  **Key Findings:**
91
 
@@ -101,7 +101,7 @@ We evaluated DeepMath on four mathematical reasoning datasets using **majority@1
101
  - **HLE:** High-level exam problems
102
 
103
  <figure>
104
- <img src="assets/output-example.png" style="width:700px" alt="Output example showing Python code generation and execution." />
105
  <figcaption><p><em>Figure 2: Example output where Python code is generated, evaluated, and the result is inserted into the reasoning trace.</em></p></figcaption>
106
  </figure>
107
 
 
20
 
21
  # DeepMath-v1: A Lightweight Math Reasoning Agent
22
 
23
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/ndb_WmPavW1MONAjsGpYT.jpeg" style="width:600px" alt="An LLM is using a calculator to answer questions." />
24
 
25
  ## Model Description
26
 
 
51
  - **Training Method:** GRPO with accuracy and code generation rewards
52
 
53
  <figure>
54
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/zOcvJ2DY61QZyozarsKbT.png" style="width:400px" alt="Changes to vLLM client and server in TRL library." />
55
  <figcaption><p><em>Figure 1: The vLLM client and server were modified to use the DeepMath agent in generating the candidates, while using the vLLM backend.</em></p></figcaption>
56
  </figure>
57
 
 
85
 
86
  We evaluated DeepMath on four mathematical reasoning datasets using **majority@16** and mean output length metrics:
87
 
88
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/mBuINzNvjDKdZEuIqzJeO.png" style="width:800px" alt="Main results table showing performance across MATH500, AIME, HMMT, and HLE datasets."/>
89
 
90
  **Key Findings:**
91
 
 
101
  - **HLE:** High-level exam problems
102
 
103
  <figure>
104
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/a-kn3oHdlxTP_L-63N9LX.png" style="width:700px" alt="Output example showing Python code generation and execution." />
105
  <figcaption><p><em>Figure 2: Example output where Python code is generated, evaluated, and the result is inserted into the reasoning trace.</em></p></figcaption>
106
  </figure>
107