Frequency-Domain Vision Transformers: Architectures, Applications, and Open Challenges

Aslan, Muhammet FatihSabanci, KadirAslan, Busra2026-04-102026-04-1020262076-3417https://hdl.handle.net/20.500.13091/13159https://doi.org/10.3390/app16042024Vision Transformers (ViTs) have achieved strong performance in computer vision but suffer from limited inductive bias, high data requirements, and reduced sensitivity to high-frequency visual details. To address these limitations, Frequency-Domain ViTs (FD-ViTs) incorporate spectral representations-such as Fourier, wavelet, and discrete cosine transforms-into the Transformer pipeline to improve feature expressiveness and robustness. This survey provides a systematic review of FD-ViT architectures and introduces a unified taxonomy based on spectral transformation type, integration level, and computational characteristics. We summarize empirical findings across image classification, image restoration, and domain-specific applications, including medical imaging and remote sensing, highlighting consistent performance patterns and task-dependent trade-offs. Our analysis shows that frequency-domain integration yields modest, context-dependent gains in large-scale classification, while offering more consistent advantages in frequency-sensitive tasks such as image restoration and noise-robust visual analysis. We further discuss key open challenges, including spectral aliasing, phase information loss, evaluation inconsistency, and deployment efficiency, and outline emerging directions toward dynamic spectral operators, multimodal integration, and hardware-aware designs. To the best of our knowledge, this work constitutes the first systematic survey that consolidates the growing body of research on FD-ViT, providing a structured conceptual and methodological reference for future studies on spectral representations in Transformer-based visual learning.eninfo:eu-repo/semantics/openAccessVision TransformersDiscrete Cosine TransformFrequency-Domain Vision TransformersFourier TransformSpectral RepresentationWavelet TransformFrequency-Domain Vision Transformers: Architectures, Applications, and Open ChallengesArticle10.3390/app160420242-s2.0-105031628508