When Do Transformers Learn Heuristics for Graph Connectivity?
Abstract
Transformers struggle with generalizable algorithms, preferring heuristics; a disentangled Transformer can learn graph algorithms within its capacity but resorts to heuristics otherwise.
Transformers often fail to learn generalizable algorithms, instead relying on brittle heuristics. Using graph connectivity as a testbed, we explain this phenomenon both theoretically and empirically. We consider a simplified Transformer architecture, the disentangled Transformer, and prove that an L-layer model has capacity to solve for graphs with diameters up to exactly 3^L, implementing an algorithm equivalent to computing powers of the adjacency matrix. We analyze the training-dynamics, and show that the learned strategy hinges on whether most training instances are within this model capacity. Within-capacity graphs (diameter leq 3^L) drive the learning of a correct algorithmic solution while beyond-capacity graphs drive the learning of a simple heuristic based on node degrees. Finally, we empirically demonstrate that restricting training data within a model's capacity leads to both standard and disentangled transformers learning the exact algorithm rather than the degree-based heuristic.
Community
When Do Transformers Learn Heuristics for Graph Connectivity? Hint: when training data surpassed model's capacity.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- The Impossibility of Inverse Permutation Learning in Transformer Models (2025)
- On the Capacity of Self-Attention (2025)
- Transformers Can Learn Connectivity in Some Graphs but Not Others (2025)
- GraphUniverse: Enabling Systematic Evaluation of Inductive Generalization (2025)
- Prior Makes It Possible: From Sublinear Graph Algorithms to LLM Test-Time Methods (2025)
- Flock: A Knowledge Graph Foundation Model via Learning on Random Walks (2025)
- Why does your graph neural network fail on some graphs? Insights from exact generalisation error (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper