Repository logoGCRIS
  • English
  • Türkçe
  • Русский
Log In
New user? Click here to register. Have you forgotten your password?
Home
Communities
Browse GCRIS
Entities
Overview
GCRIS Guide
  1. Home
  2. Browse by Author

Browsing by Author "Taskiran, Salimkan Fatma"

Filter results by typing the first few letters
Now showing 1 - 1 of 1
  • Results Per Page
  • Sort Options
  • Loading...
    Thumbnail Image
    Article
    Citation - WoS: 3
    Citation - Scopus: 4
    A Comprehensive Evaluation of Oversampling Techniques for Enhancing Text Classification Performance
    (Nature Portfolio, 2025) Taskiran, Salimkan Fatma; Turkoglu, Bahaeddin; Kaya, Ersin; Asuroglu, Tunc
    Class imbalance is a common and critical challenge in text classification tasks, where the underrepresentation of certain classes often impairs the ability of classifiers to learn minority class patterns effectively. According to the "garbage in, garbage out" principle, even high-performing models may fail when trained on skewed distributions. To address this issue, this study investigates the impact of oversampling techniques, specifically the Synthetic Minority Over-sampling Technique (SMOTE) and thirty of its variants, on two benchmark text classification datasets: TREC and Emotions. Each dataset was vectorized using the MiniLMv2 transformer model to obtain semantically rich representations, and classification was performed using six machine learning algorithms. The balanced and imbalanced scenarios were compared in terms of F1-Score and Balanced Accuracy. This work constitutes, to the best of our knowledge, the first large-scale, systematic benchmarking of SMOTE-based oversampling methods in the context of transformer-embedded text classification. Furthermore, statistical significance of the observed performance differences was validated using the Friedman test. The results provide practical insights into the selection of oversampling techniques tailored to dataset characteristics and classifier sensitivity, supporting more robust and fair learning in imbalanced natural language processing tasks.
Repository logo
Collections
  • Scopus Collection
  • WoS Collection
  • TrDizin Collection
  • PubMed Collection
Entities
  • Research Outputs
  • Organizations
  • Researchers
  • Projects
  • Awards
  • Equipments
  • Events
About
  • Contact
  • GCRIS
  • Research Ecosystems
  • Feedback
  • OAI-PMH

Log in to GCRIS Dashboard

Powered by Research Ecosystems

  • Privacy policy
  • End User Agreement
  • Feedback