Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.13091/5408
Title: Comparison of Textual Data Augmentation Methods on SST-2 Dataset
Authors: Çataltaş, M.
Baykan, N.A.
Cicekli, I.
Keywords: Data augmentation
Natural language processing
Text generation
Deep learning
Learning algorithms
Learning systems
Natural language processing systems
Data augmentation
Data sample
Language processing
Large datasets
Learning models
Natural language processing
Natural languages
Performance
Text generations
Textual data
Large datasets
Publisher: Springer Science and Business Media Deutschland GmbH
Abstract: Since the arrival of advanced deep learning models, more successful techniques have been proposed, significantly enhancing the performance of nearly all natural language processing tasks. While these deep learning models achieve the best results, large datasets are needed to get these results. However, data collection in large amounts is a challenging task and cannot be done successfully for every task. Therefore, data augmentation might be required to satisfy the need for large datasets by generating synthetic data samples using original data samples. This study aims to give an idea to those who will work in this field by comparing the successes of using a large dataset as a whole and data augmentation in smaller pieces at different rates. For this aim, this study presents a comparison of three textual data augmentation techniques, examining their efficacy based on the augmentation mechanism. Through empirical evaluations on the Stanford Sentiment Treebank dataset, the sampling-based method LAMBADA showed superior performance in low-data regime scenarios and moreover showcased better results than other methods when the augmentation ratio is increased, offering significant improvements in model robustness and accuracy. These findings offer insights for researchers on augmentation strategies, thereby enhancing generalization in future works. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
Description: 2nd International Congress of Electrical and Computer Engineering, ICECENG 2023 -- 22 November 2023 through 25 November 2023 -- 309799
URI: https://doi.org/10.1007/978-3-031-52760-9_14
https://hdl.handle.net/20.500.13091/5408
ISBN: 9783031527593
ISSN: 2522-8595
Appears in Collections:Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collections

Show full item record



CORE Recommender

Page view(s)

40
checked on Apr 29, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.