MathBode: Frequency-Domain Fingerprints of LLM Mathematical Reasoning
Abstract
MathBode provides a diagnostic for mathematical reasoning in LLMs by analyzing frequency-resolved metrics of model outputs compared to exact solutions, revealing systematic low-pass behavior and phase lag.
This paper presents MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs). Instead of one-shot accuracy, MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions. This yields interpretable, frequency-resolved metrics -- gain (amplitude tracking) and phase (lag) -- that form Bode-style fingerprints. Across five closed-form families (linear solve, ratio/saturation, compound interest, 2x2 linear systems, similar triangles), the diagnostic surfaces systematic low-pass behavior and growing phase lag that accuracy alone obscures. We compare several models against a symbolic baseline that calibrates the instrument (G approx 1, phi approx 0). Results separate frontier from mid-tier models on dynamics, providing a compact, reproducible protocol that complements standard benchmarks with actionable measurements of reasoning fidelity and consistency. We open-source the dataset and code to enable further research and adoption.
Community
MathBode benchmarks dynamic reasoning in LLMs by turning parametric math problems into time-varying systems. We sinusoidally sweep a problem parameter and read out gain (amplitude tracking) and phase (reasoning lag), à la Bode plots in control theory. Dataset: 47,040 test points across 5 problem families, enabling fine-grained frequency-response analyses.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems (2025)
- Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks (2025)
- Look Before you Leap: Estimating LLM Benchmark Scores from Descriptions (2025)
- Aryabhata: An exam-focused language model for JEE Math (2025)
- Variation in Verification: Understanding Verification Dynamics in Large Language Models (2025)
- Uncertainty Under the Curve: A Sequence-Level Entropy Area Metric for Reasoning LLM (2025)
- Can Structured Templates Facilitate LLMs in Tackling Harder Tasks? : An Exploration of Scaling Laws by Difficulty (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper