The main purpose of this model is to validate the usability of thomas-yanxin/MT-SFT-ShareGPT, i.e., the quality of the data is all you need. We found that when we meticulously extract the data through a better data governance approach, the corresponding model results can be vastly improved, even if only through SFT.
Here are the results from our OpenCompass evaluation:
| Classification | Benchmarks | Models |
|---|---|---|
| 名称 | XinYuan-Qwen2-7B | |
| English | MMLU | 73.72 |
| MMLU-Pro | / | |
| Theorem QA | / | |
| GPQA | 33.04 | |
| BBH | 67.55 | |
| IFEval (Prompt Strict-Acc.) | 40.48 | |
| ARC-C | 91.19 | |
| Math | GSM8K | 82.94 |
| MATH | 41.06 | |
| Chinese | C-EVAL | 81.02 |
| CMMLU | 80.06 | |
| Code | MBPP | 50.6 |
| HumanEval | 83.99 |
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support