Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
Abstract
Think-at-Hard (TaH) dynamically refines only hard tokens in LLMs using a neural decider and LoRA, improving reasoning performance with minimal additional parameters or iterations.
Improving reasoning capabilities of Large Language Models (LLMs), especially under parameter constraints, is crucial for real-world applications. Prior work proposes recurrent transformers, which allocate a fixed number of extra iterations per token to improve generation quality. After the first, standard forward pass, instead of verbalization, last-layer hidden states are fed back as inputs for additional iterations to refine token predictions. Yet we identify a latent overthinking phenomenon: easy token predictions that are already correct after the first pass are sometimes revised into errors in additional iterations. To address this, we propose Think-at-Hard (TaH), a dynamic latent thinking method that iterates deeper only at hard tokens. It employs a lightweight neural decider to trigger latent iterations only at tokens that are likely incorrect after the standard forward pass. During latent iterations, Low-Rank Adaptation (LoRA) modules shift the LLM objective from general next-token prediction to focused hard-token refinement. We further introduce a duo-causal attention mechanism that extends attention from the token sequence dimension to an additional iteration depth dimension. This enables cross-iteration information flow while maintaining full sequential parallelism. Experiments show that TaH boosts LLM reasoning performance across five challenging benchmarks while maintaining the same parameter count. Compared with baselines that iterate twice for all output tokens, TaH delivers 8.1-11.3% accuracy gains while exempting 94% of tokens from the second iteration. Against strong single-iteration Qwen3 models finetuned with the same data, it also delivers 4.0-5.0% accuracy gains. When allowing less than 3% additional parameters from LoRA and the iteration decider, the gains increase to 8.5-12.6% and 5.3-5.4%, respectively. Our code is available at https://github.com/thu-nics/TaH.
Community
Think-at-Hard (TaH) improves LLM reasoning by running extra latent iterations only on hard tokens instead of all tokens. A lightweight decider and duo-causal attention enable targeted refinement while keeping full parallelism. TaH outperforms fixed two-iteration baselines by 8โ11% while skipping 94% of second iterations, and also beats strong single-iteration Qwen3 models by 4โ5%.
Nice work
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PonderLM-2: Pretraining LLM with Latent Thoughts in Continuous Space (2025)
- Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts (2025)
- Latent Reasoning in LLMs as a Vocabulary-Space Superposition (2025)
- KaVa: Latent Reasoning via Compressed KV-Cache Distillation (2025)
- LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning (2025)
- Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning (2025)
- Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
arXiv explained breakdown of this paper ๐ https://arxivexplained.com/papers/think-at-hard-selective-latent-iterations-to-improve-reasoning-language-models
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper