The Effect of Balancing Process on Classifying Unbalancing Data Set

No Thumbnail Available

Date

2018

Authors

Kaya, Ersin

Journal Title

Journal ISSN

Volume Title

Publisher

Open Access Color

OpenAIRE Downloads

OpenAIRE Views

Research Projects

Journal Issue

Abstract

Unbalanced data indicates a situation where the number of monitoring is not the same for all categories in the label data set. In some fields, unbalanced data problems are very common. Some of machine learning classifiers failed to deal with unbalanced training data sets because they are sensitive to the proportions of different classes. As a result, these algorithms tend to favor the class with the largest proportion of observations known as the majority class, which may lead to misleading accuracy. Most of data sets are unbalanced because most of the data collected over the diseases are usually not disease. These data when used in the classification algorithm it gave un-well results, the data sets used in the training process must be balanced to increase this success. In this article, (SMOTE) synthetic minority over-sampling technique is used on data sets. K-Nearest Neighbors (K-NN), and Naïve Bayes (NB) classification algorithms are applied to classify the balanced datasets and according to the obtained classification results the balanced data sets achieved a better classification success.

Description

Keywords

Unbalanced Data, SMOTE, K-NN, NB, Classification

Turkish CoHE Thesis Center URL

Fields of Science

Citation

WoS Q

N/A

Scopus Q

N/A

Source

Volume

Issue

Start Page

121

End Page

124
Google Scholar Logo
Google Scholar™

Sustainable Development Goals

4

QUALITY EDUCATION
QUALITY EDUCATION Logo

9

INDUSTRY, INNOVATION AND INFRASTRUCTURE
INDUSTRY, INNOVATION AND INFRASTRUCTURE Logo

17

PARTNERSHIPS FOR THE GOALS
PARTNERSHIPS FOR THE GOALS Logo