Comparison of Textual Data Augmentation Methods on Sst-2 Dataset

dc.contributor.author Çataltaş, M.
dc.contributor.author Baykan, N.A.
dc.contributor.author Cicekli, I.
dc.date.accessioned 2024-04-20T13:05:50Z
dc.date.available 2024-04-20T13:05:50Z
dc.date.issued 2024
dc.description 2nd International Congress of Electrical and Computer Engineering, ICECENG 2023 -- 22 November 2023 through 25 November 2023 -- 309799 en_US
dc.description.abstract Since the arrival of advanced deep learning models, more successful techniques have been proposed, significantly enhancing the performance of nearly all natural language processing tasks. While these deep learning models achieve the best results, large datasets are needed to get these results. However, data collection in large amounts is a challenging task and cannot be done successfully for every task. Therefore, data augmentation might be required to satisfy the need for large datasets by generating synthetic data samples using original data samples. This study aims to give an idea to those who will work in this field by comparing the successes of using a large dataset as a whole and data augmentation in smaller pieces at different rates. For this aim, this study presents a comparison of three textual data augmentation techniques, examining their efficacy based on the augmentation mechanism. Through empirical evaluations on the Stanford Sentiment Treebank dataset, the sampling-based method LAMBADA showed superior performance in low-data regime scenarios and moreover showcased better results than other methods when the augmentation ratio is increased, offering significant improvements in model robustness and accuracy. These findings offer insights for researchers on augmentation strategies, thereby enhancing generalization in future works. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024. en_US
dc.identifier.doi 10.1007/978-3-031-52760-9_14
dc.identifier.isbn 9783031527593
dc.identifier.issn 2522-8595
dc.identifier.scopus 2-s2.0-85189544660
dc.identifier.uri https://doi.org/10.1007/978-3-031-52760-9_14
dc.identifier.uri https://hdl.handle.net/20.500.13091/5408
dc.language.iso en en_US
dc.publisher Springer Science and Business Media Deutschland GmbH en_US
dc.relation.ispartof EAI/Springer Innovations in Communication and Computing en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Data augmentation en_US
dc.subject Natural language processing en_US
dc.subject Text generation en_US
dc.subject Deep learning en_US
dc.subject Learning algorithms en_US
dc.subject Learning systems en_US
dc.subject Natural language processing systems en_US
dc.subject Data augmentation en_US
dc.subject Data sample en_US
dc.subject Language processing en_US
dc.subject Large datasets en_US
dc.subject Learning models en_US
dc.subject Natural language processing en_US
dc.subject Natural languages en_US
dc.subject Performance en_US
dc.subject Text generations en_US
dc.subject Textual data en_US
dc.subject Large datasets en_US
dc.title Comparison of Textual Data Augmentation Methods on Sst-2 Dataset en_US
dc.type Conference Object en_US
dspace.entity.type Publication
gdc.author.institutional
gdc.author.scopusid 57222720532
gdc.author.scopusid 35091134000
gdc.author.scopusid 6603079400
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access metadata only access
gdc.coar.type text::conference output
gdc.description.department KTÜN en_US
gdc.description.departmenttemp Çataltaş, M., Hacettepe University, Ankara, Turkey; Baykan, N.A., Konya Technical University, Konya, Turkey; Cicekli, I., Hacettepe University, Ankara, Turkey en_US
gdc.description.endpage 201 en_US
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q3
gdc.description.startpage 189 en_US
gdc.description.wosquality N/A
gdc.identifier.openalex W4392912054
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 0.0
gdc.oaire.influence 2.4895952E-9
gdc.oaire.isgreen false
gdc.oaire.popularity 2.3737945E-9
gdc.oaire.publicfunded false
gdc.openalex.collaboration National
gdc.openalex.fwci 0.0
gdc.openalex.normalizedpercentile 0.09
gdc.opencitations.count 0
gdc.plumx.mendeley 2
gdc.plumx.scopuscites 0
gdc.scopus.citedcount 0
gdc.virtual.author Çataltaş, Mustafa
gdc.virtual.author Baykan, Nurdan
relation.isAuthorOfPublication d196a562-7eb5-4293-b812-8ed4b8b004dd
relation.isAuthorOfPublication 81dff1ca-db16-4103-b9cb-612ae1600b38
relation.isAuthorOfPublication.latestForDiscovery d196a562-7eb5-4293-b812-8ed4b8b004dd

Files