Comparison of Textual Data Augmentation Methods on SST-2 Dataset

Çataltaş, M.; Baykan, N.A.; Cicekli, I.

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.13091/5408

Full metadata record

DC Field	Value	Language
dc.contributor.author	Çataltaş, M.	-
dc.contributor.author	Baykan, N.A.	-
dc.contributor.author	Cicekli, I.	-
dc.date.accessioned	2024-04-20T13:05:50Z	-
dc.date.available	2024-04-20T13:05:50Z	-
dc.date.issued	2024	-
dc.identifier.isbn	9783031527593	-
dc.identifier.issn	2522-8595	-
dc.identifier.uri	https://doi.org/10.1007/978-3-031-52760-9_14	-
dc.identifier.uri	https://hdl.handle.net/20.500.13091/5408	-
dc.description	2nd International Congress of Electrical and Computer Engineering, ICECENG 2023 -- 22 November 2023 through 25 November 2023 -- 309799	en_US
dc.description.abstract	Since the arrival of advanced deep learning models, more successful techniques have been proposed, significantly enhancing the performance of nearly all natural language processing tasks. While these deep learning models achieve the best results, large datasets are needed to get these results. However, data collection in large amounts is a challenging task and cannot be done successfully for every task. Therefore, data augmentation might be required to satisfy the need for large datasets by generating synthetic data samples using original data samples. This study aims to give an idea to those who will work in this field by comparing the successes of using a large dataset as a whole and data augmentation in smaller pieces at different rates. For this aim, this study presents a comparison of three textual data augmentation techniques, examining their efficacy based on the augmentation mechanism. Through empirical evaluations on the Stanford Sentiment Treebank dataset, the sampling-based method LAMBADA showed superior performance in low-data regime scenarios and moreover showcased better results than other methods when the augmentation ratio is increased, offering significant improvements in model robustness and accuracy. These findings offer insights for researchers on augmentation strategies, thereby enhancing generalization in future works. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.	en_US
dc.language.iso	en	en_US
dc.publisher	Springer Science and Business Media Deutschland GmbH	en_US
dc.relation.ispartof	EAI/Springer Innovations in Communication and Computing	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Data augmentation	en_US
dc.subject	Natural language processing	en_US
dc.subject	Text generation	en_US
dc.subject	Deep learning	en_US
dc.subject	Learning algorithms	en_US
dc.subject	Learning systems	en_US
dc.subject	Natural language processing systems	en_US
dc.subject	Data augmentation	en_US
dc.subject	Data sample	en_US
dc.subject	Language processing	en_US
dc.subject	Large datasets	en_US
dc.subject	Learning models	en_US
dc.subject	Natural language processing	en_US
dc.subject	Natural languages	en_US
dc.subject	Performance	en_US
dc.subject	Text generations	en_US
dc.subject	Textual data	en_US
dc.subject	Large datasets	en_US
dc.title	Comparison of Textual Data Augmentation Methods on SST-2 Dataset	en_US
dc.type	Conference Object	en_US
dc.identifier.doi	10.1007/978-3-031-52760-9_14	-
dc.identifier.scopus	2-s2.0-85189544660	en_US
dc.department	KTÜN	en_US
dc.identifier.startpage	189	en_US
dc.identifier.endpage	201	en_US
dc.institutionauthor	…	-
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US
dc.authorscopusid	57222720532	-
dc.authorscopusid	35091134000	-
dc.authorscopusid	6603079400	-
item.fulltext	No Fulltext	-
item.openairetype	Conference Object	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.languageiso639-1	en	-
item.cerifentitytype	Publications	-
item.grantfulltext	none	-
Appears in Collections:	Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collections

Show simple item record

CORE Recommender

Page view(s)

74

checked on Aug 26, 2024

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Altmetric

Google Scholar^TM