The Effect of Balancing Process on Classifying Unbalancing Data Set
No Thumbnail Available
Date
2018
Authors
Kaya, Ersin
Journal Title
Journal ISSN
Volume Title
Publisher
Open Access Color
OpenAIRE Downloads
OpenAIRE Views
Abstract
Unbalanced data indicates a situation where the number of monitoring is not the same for all categories in the label data set. In some fields, unbalanced data problems are very common. Some of machine learning classifiers failed to deal with unbalanced training data sets because they are sensitive to the proportions of different classes. As a result, these algorithms tend to favor the class with the largest proportion of observations known as the majority class, which may lead to misleading accuracy. Most of data sets are unbalanced because most of the data collected over the diseases are usually not disease. These data when used in the classification algorithm it gave un-well results, the data sets used in the training process must be balanced to increase this success. In this article, (SMOTE) synthetic minority over-sampling technique is used on data sets. K-Nearest Neighbors (K-NN), and Naïve Bayes (NB) classification algorithms are applied to classify the balanced datasets and according to the obtained classification results the balanced data sets achieved a better classification success.
Description
ORCID
Keywords
Unbalanced Data, SMOTE, K-NN, NB, Classification
Turkish CoHE Thesis Center URL
Fields of Science
Citation
WoS Q
N/A
Scopus Q
N/A
Source
Volume
Issue
Start Page
121
End Page
124
Collections
Google Scholar™
Sustainable Development Goals
4
QUALITY EDUCATION

9
INDUSTRY, INNOVATION AND INFRASTRUCTURE

17
PARTNERSHIPS FOR THE GOALS

