Fractal-Guided Token Pruning for Efficient Vision Transformers

Kim, Seong Rok; Lee, Minhyeok

doi:10.3390/fractalfract9120767

상세 보기

Fractal-Guided Token Pruning for Efficient Vision Transformers

Kim, Seong Rok;
Lee, Minhyeok

Citations

WEB OF SCIENCE

0

Citations

SCOPUS

0

초록

Vision Transformers achieve strong performance across computer vision tasks but suffer from quadratic computational complexity with respect to token count, limiting deployment in resource-constrained environments. Existing token pruning methods rely on attention scores to identify important tokens, but attention mechanisms capture query-specific relevance rather than intrinsic information content, potentially discarding tokens that carry information for subsequent layers or different downstream tasks. We propose fractal-guided token pruning, a method that leverages the correlation dimension Dcorr of token embeddings as a task-agnostic measure of geometric complexity. Our key insight is that tokens with high Dcorr span higher-dimensional manifolds in representation space, indicating complex patterns, while tokens with low Dcorr collapse to simpler structures representing redundant information. By computing a local Dcorr for each token and pruning those with the lowest values, our method retains geometrically complex tokens independent of attention-based relevance. The correlation dimension quantifies how token embeddings fill the representation space: embeddings from uniform background regions cluster tightly in low-dimensional subspaces (low Dcorr), while embeddings from complex textures or object boundaries spread across higher-dimensional manifolds (high Dcorr), reflecting their richer information content. Experiments on CIFAR-10 and CIFAR-100 with fine-tuned ViT-B/16 models show that fractal-guided pruning consistently outperforms random and norm-based pruning across all tested ratios. At forty percent pruning, fractal pruning maintains 92.26% accuracy on CIFAR-10 with only a 0.99 percentage point drop from the 93.25% baseline while achieving 1.17x speedup. Our approach provides a geometry-based criterion for token importance that complements attention-based methods and shows promising generalization between CIFAR-10 and CIFAR-100 datasets.

키워드

Vision Transformers; token pruning; fractal dimension; correlation dimension; computational efficiency; geometric complexity; model compression

제목: Fractal-Guided Token Pruning for Efficient Vision Transformers

저자: Kim, Seong Rok; Lee, Minhyeok

DOI: 10.3390/fractalfract9120767

발행일: 2025-12

유형: Article

저널명: FRACTAL AND FRACTIONAL

권: 9

호: 12

상세 보기

Fractal-Guided Token Pruning for Efficient Vision Transformers

초록

키워드

파일 다운로드