Eker, Bengisu U.Solak, Fatma Z.2025-11-102025-11-1020251863-17031863-1711https://doi.org/10.1007/s11760-025-04841-xhttps://hdl.handle.net/20.500.13091/10978Deep learning models achieve high accuracy in medical image classification. However, their clinical adoption faces two major problems. Insufficient robustness evaluation under realistic degradation conditions is one issue. Lack of proper interpretability mechanisms is another. Such limitations critically hinder their use in clinical settings. The critical shortcomings are particularly pronounced in gastrointestinal endoscopic diagnosis, where image quality varies substantially and clinical decision-making demands transparent, explainable artificial intelligence systems. To address such research gaps, this study presents a comprehensive framework that systematically evaluates Base and Tiny variants of three modern architectures (Vision Transformer, Swin Transformer, ConvNeXt) for gastrointestinal abnormality classification. Unlike previous studies focused exclusively on clean data, this evaluation incorporates realistic perturbations including Gaussian noise, blur, rotation, color jitter, CutMix, and MixUp, combined with McNemar's statistical significance testing and Local Interpretable Model-agnostic Explanations for clinical interpretability. The experimental results reveal distinct performance patterns across different conditions. Among all tested variants, ConvNeXt_Tiny achieved the highest performances (98.50% accuracy, 98.46% F1-Score and 99.11% AUC) under clean conditions. When evaluated under degraded conditions, ConvNeXt_Tiny showed strong resilience to low-level degradations, while Swin_Tiny demonstrated superior robustness to geometric and photometric distortions. Statistical validation confirmed significant performance differences under distortions, and explainability analysis revealed that high-performing models consistently focused on clinically meaningful regions. These findings provide evidence-based guidance for clinical deployment where reliability under varied conditions and transparent explainability represent paramount considerations for successful healthcare integration.eninfo:eu-repo/semantics/closedAccessGastrointestinal EndoscopyExplainable AIMedical Image ClassificationRobustness EvaluationStatistical Significance TestTransformer-Based Deep LearningRobust, Explainable, and Statistically Validated Gastrointestinal Image Analysis Using Modern Deep Learning ArchitecturesArticle10.1007/s11760-025-04841-x2-s2.0-105018577092