Accelerating Vision Transformers with Adaptive Patch Sizes
Abstract
Adaptive Patch Transformers (APT) improve Vision Transformer (ViT) efficiency by using variable patch sizes, enhancing speed without compromising performance.
Vision Transformers (ViTs) partition input images into uniformly sized patches regardless of their content, resulting in long input sequence lengths for high-resolution images. We present Adaptive Patch Transformers (APT), which addresses this by using multiple different patch sizes within the same image. APT reduces the total number of input tokens by allocating larger patch sizes in more homogeneous areas and smaller patches in more complex ones. APT achieves a drastic speedup in ViT inference and training, increasing throughput by 40% on ViT-L and 50% on ViT-H while maintaining downstream performance, and can be applied to a previously fine-tuned ViT, converging in as little as 1 epoch. It also significantly reduces training and inference time without loss of performance in high-resolution dense visual tasks, achieving up to 30\% faster training and inference in visual QA, object detection, and semantic segmentation.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Where Do Tokens Go? Understanding Pruning Behaviors in STEP at High Resolutions (2025)
- TinyDrop: Tiny Model Guided Token Dropping for Vision Transformers (2025)
- SkipSR: Faster Super Resolution with Token Skipping (2025)
- ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution (2025)
- UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation (2025)
- Index-Preserving Lightweight Token Pruning for Efficient Document Understanding in Vision-Language Models (2025)
- Decorrelation Speeds Up Vision Transformers (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper