Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys Challenge
Abstract
A caption-assisted reasoning framework bridges visual and textual modalities, achieving top performance in multimodal reasoning tasks like SeePhys and MathVerse.
Multimodal reasoning remains a fundamental challenge in artificial intelligence. Despite substantial advances in text-based reasoning, even state-of-the-art models such as GPT-o3 struggle to maintain strong performance in multimodal scenarios. To address this gap, we introduce a caption-assisted reasoning framework that effectively bridges visual and textual modalities. Our approach achieved 1st place in the ICML 2025 AI for Math Workshop \& Challenge 2: SeePhys, highlighting its effectiveness and robustness. Furthermore, we validate its generalization on the MathVerse benchmark for geometric reasoning, demonstrating the versatility of our method. Our code is publicly available at https://github.com/OpenDCAI/SciReasoner.
Community
1st Place Solution to the ICML 2025 SeePhys Challenge
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning (2025)
- MathReal: We Keep It Real! A Real Scene Benchmark for Evaluating Math Reasoning in Multimodal Large Language Models (2025)
- Simple o3: Towards Interleaved Vision-Language Reasoning (2025)
- Hierarchical Vision-Language Reasoning for Multimodal Multiple-Choice Question Answering (2025)
- BLUEX Revisited: Enhancing Benchmark Coverage with Automatic Captioning (2025)
- MAC: A Live Benchmark for Multimodal Large Language Models in Scientific Understanding (2025)
- MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper