Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.13091/5408
Full metadata record
DC FieldValueLanguage
dc.contributor.authorÇataltaş, M.-
dc.contributor.authorBaykan, N.A.-
dc.contributor.authorCicekli, I.-
dc.date.accessioned2024-04-20T13:05:50Z-
dc.date.available2024-04-20T13:05:50Z-
dc.date.issued2024-
dc.identifier.isbn9783031527593-
dc.identifier.issn2522-8595-
dc.identifier.urihttps://doi.org/10.1007/978-3-031-52760-9_14-
dc.identifier.urihttps://hdl.handle.net/20.500.13091/5408-
dc.description2nd International Congress of Electrical and Computer Engineering, ICECENG 2023 -- 22 November 2023 through 25 November 2023 -- 309799en_US
dc.description.abstractSince the arrival of advanced deep learning models, more successful techniques have been proposed, significantly enhancing the performance of nearly all natural language processing tasks. While these deep learning models achieve the best results, large datasets are needed to get these results. However, data collection in large amounts is a challenging task and cannot be done successfully for every task. Therefore, data augmentation might be required to satisfy the need for large datasets by generating synthetic data samples using original data samples. This study aims to give an idea to those who will work in this field by comparing the successes of using a large dataset as a whole and data augmentation in smaller pieces at different rates. For this aim, this study presents a comparison of three textual data augmentation techniques, examining their efficacy based on the augmentation mechanism. Through empirical evaluations on the Stanford Sentiment Treebank dataset, the sampling-based method LAMBADA showed superior performance in low-data regime scenarios and moreover showcased better results than other methods when the augmentation ratio is increased, offering significant improvements in model robustness and accuracy. These findings offer insights for researchers on augmentation strategies, thereby enhancing generalization in future works. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.en_US
dc.language.isoenen_US
dc.publisherSpringer Science and Business Media Deutschland GmbHen_US
dc.relation.ispartofEAI/Springer Innovations in Communication and Computingen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectData augmentationen_US
dc.subjectNatural language processingen_US
dc.subjectText generationen_US
dc.subjectDeep learningen_US
dc.subjectLearning algorithmsen_US
dc.subjectLearning systemsen_US
dc.subjectNatural language processing systemsen_US
dc.subjectData augmentationen_US
dc.subjectData sampleen_US
dc.subjectLanguage processingen_US
dc.subjectLarge datasetsen_US
dc.subjectLearning modelsen_US
dc.subjectNatural language processingen_US
dc.subjectNatural languagesen_US
dc.subjectPerformanceen_US
dc.subjectText generationsen_US
dc.subjectTextual dataen_US
dc.subjectLarge datasetsen_US
dc.titleComparison of Textual Data Augmentation Methods on SST-2 Dataseten_US
dc.typeConference Objecten_US
dc.identifier.doi10.1007/978-3-031-52760-9_14-
dc.identifier.scopus2-s2.0-85189544660en_US
dc.departmentKTÜNen_US
dc.identifier.startpage189en_US
dc.identifier.endpage201en_US
dc.institutionauthor-
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.authorscopusid57222720532-
dc.authorscopusid35091134000-
dc.authorscopusid6603079400-
item.fulltextNo Fulltext-
item.openairetypeConference Object-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.grantfulltextnone-
item.cerifentitytypePublications-
item.languageiso639-1en-
Appears in Collections:Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collections
Show simple item record



CORE Recommender

Page view(s)

44
checked on May 13, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.