Fractal-Guided Token Pruning for Efficient Vision Transformers

Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Vision Transformers achieve strong performance across computer vision tasks but suffer from quadratic computational complexity with respect to token count, limiting deployment in resource-constrained environments. Existing token pruning methods rely on attention scores to identify important tokens, but attention mechanisms capture query-specific relevance rather than intrinsic information content, potentially discarding tokens that carry information for subsequent layers or different downstream tasks. We propose fractal-guided token pruning, a method that leverages the correlation dimension Dcorr of token embeddings as a task-agnostic measure of geometric complexity. Our key insight is that tokens with high Dcorr span higher-dimensional manifolds in representation space, indicating complex patterns, while tokens with low Dcorr collapse to simpler structures representing redundant information. By computing a local Dcorr for each token and pruning those with the lowest values, our method retains geometrically complex tokens independent of attention-based relevance. The correlation dimension quantifies how token embeddings fill the representation space: embeddings from uniform background regions cluster tightly in low-dimensional subspaces (low Dcorr), while embeddings from complex textures or object boundaries spread across higher-dimensional manifolds (high Dcorr), reflecting their richer information content. Experiments on CIFAR-10 and CIFAR-100 with fine-tuned ViT-B/16 models show that fractal-guided pruning consistently outperforms random and norm-based pruning across all tested ratios. At forty percent pruning, fractal pruning maintains 92.26% accuracy on CIFAR-10 with only a 0.99 percentage point drop from the 93.25% baseline while achieving 1.17x speedup. Our approach provides a geometry-based criterion for token importance that complements attention-based methods and shows promising generalization between CIFAR-10 and CIFAR-100 datasets.

키워드

Vision Transformerstoken pruningfractal dimensioncorrelation dimensioncomputational efficiencygeometric complexitymodel compression
제목
Fractal-Guided Token Pruning for Efficient Vision Transformers
저자
Kim, Seong RokLee, Minhyeok
DOI
10.3390/fractalfract9120767
발행일
2025-12
유형
Article
저널명
FRACTAL AND FRACTIONAL
9
12

파일 다운로드