SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning [arXiv] [Project]

Jiaqi Chen, Bang Zhang, Ruotian Ma, Peisong Wang, Xiaodan Liang, Zhaopeng Tu, Xiaolong Li, Kwan-Yee K. Wong.

Downloads last month
2
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for judge/SPC-Critic-2

Base model

Qwen/Qwen2.5-7B
Finetuned
(2793)
this model