Analyzing the Effect of Data Preprocessing Techniques Using Machine Learning Algorithms on the Diagnosis of Covid-19

dc.contributor.author Erol, Gizemnur
dc.contributor.author Uzbaş, Betül
dc.contributor.author Yücelbaş, Cüneyt
dc.contributor.author Yücelbaş, Sule
dc.date.accessioned 2022-11-28T16:54:40Z
dc.date.available 2022-11-28T16:54:40Z
dc.date.issued 2022
dc.description.abstract Real-time polymerase chain reaction (RT-PCR) known as the swab test is a diagnostic test that can diagnose COVID-19 disease through respiratory samples in the laboratory. Due to the rapid spread of the coronavirus around the world, the RT-PCR test has become insufficient to get fast results. For this reason, the need for diagnostic methods to fill this gap has arisen and machine learning studies have started in this area. On the other hand, studying medical data is a challenging area because the data it contains is inconsistent, incomplete, difficult to scale, and very large. Additionally, some poor clinical decisions, irrelevant parameters, and limited medical data adversely affect the accuracy of studies performed. Therefore, considering the availability of datasets containing COVID-19 blood parameters, which are less in number than other medical datasets today, it is aimed to improve these existing datasets. In this direction, to obtain more consistent results in COVID-19 machine learning studies, the effect of data preprocessing techniques on the classification of COVID-19 data was investigated in this study. In this study primarily, encoding categorical feature and feature scaling processes were applied to the dataset with 15 features that contain blood data of 279 patients, including gender and age information. Then, the missingness of the dataset was eliminated by using both K-nearest neighbor algorithm (KNN) and chain equations multiple value assignment (MICE) methods. Data balancing has been done with synthetic minority oversampling technique (SMOTE), which is a data balancing method. The effect of data preprocessing techniques on ensemble learning algorithms bagging, AdaBoost, random forest and on popular classifier algorithms KNN classifier, support vector machine, logistic regression, artificial neural network, and decision tree classifiers have been analyzed. The highest accuracies obtained with the bagging classifier were 83.42% and 83.74% with KNN and MICE imputations by applying SMOTE, respectively. On the other hand, the highest accuracy ratio reached with the same classifier without SMOTE was 83.91% for the KNN imputation. In conclusion, certain data preprocessing techniques are examined comparatively and the effect of these data preprocessing techniques on success is presented and the importance of the right combination of data preprocessing to achieve success has been demonstrated by experimental studies. en_US
dc.identifier.doi 10.1002/cpe.7393
dc.identifier.issn 1532-0626
dc.identifier.issn 1532-0634
dc.identifier.scopus 2-s2.0-85140036970
dc.identifier.uri https://doi.org/10.1002/cpe.7393
dc.identifier.uri https://doi.org/10.1002/cpe.7393
dc.identifier.uri https://hdl.handle.net/20.500.13091/3135
dc.language.iso en en_US
dc.publisher Wiley en_US
dc.relation.ispartof Concurrency and Computation-Practice & Experience en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject COVID-19 en_US
dc.subject KNN imputation en_US
dc.subject machine learning en_US
dc.subject multivariate imputation by chained equation en_US
dc.subject synthetic minority oversampling technique en_US
dc.title Analyzing the Effect of Data Preprocessing Techniques Using Machine Learning Algorithms on the Diagnosis of Covid-19 en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.institutional Erol, Gizemnur
gdc.author.institutional Uzbaş, Betül
gdc.author.scopusid 57931549700
gdc.author.scopusid 57201915831
gdc.author.scopusid 55913650300
gdc.author.scopusid 55913641100
gdc.bip.impulseclass C4
gdc.bip.influenceclass C4
gdc.bip.popularityclass C4
gdc.coar.access metadata only access
gdc.coar.type text::journal::journal article
gdc.description.department Fakülteler, Mühendislik ve Doğa Bilimleri Fakültesi, Yazılım Mühendisliği Bölümü en_US
gdc.description.department Fakülteler, Mühendislik ve Doğa Bilimleri Fakültesi, Bilgisayar Mühendisliği Bölümü en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q2
gdc.description.volume 34
gdc.description.wosquality Q3
gdc.identifier.openalex W4306737044
gdc.identifier.pmid 36714180
gdc.identifier.wos WOS:000869547800001
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 15.0
gdc.oaire.influence 3.503221E-9
gdc.oaire.isgreen false
gdc.oaire.popularity 1.3357297E-8
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0202 electrical engineering, electronic engineering, information engineering
gdc.oaire.sciencefields 02 engineering and technology
gdc.openalex.collaboration National
gdc.openalex.fwci 3.1239922
gdc.openalex.normalizedpercentile 0.89
gdc.openalex.toppercent TOP 10%
gdc.opencitations.count 8
gdc.plumx.crossrefcites 3
gdc.plumx.mendeley 37
gdc.plumx.newscount 1
gdc.plumx.pubmedcites 4
gdc.plumx.scopuscites 12
gdc.scopus.citedcount 12
gdc.virtual.author Erol Doğan, Gizemnur
gdc.virtual.author Uzbaş, Betül
gdc.wos.citedcount 7
relation.isAuthorOfPublication ffc15503-99f3-41bf-8b33-3eaff0ac0355
relation.isAuthorOfPublication b37a91b2-acda-4cb4-9cb2-12392200749f
relation.isAuthorOfPublication.latestForDiscovery ffc15503-99f3-41bf-8b33-3eaff0ac0355

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Concurrency and Computation - 2022 - Erol - Analyzing the effect of data preprocessing techniques using machine learning.pdf
Size:
1.17 MB
Format:
Adobe Portable Document Format