Learning from class-imbalanced data: overlap-driven resampling for imbalanced data classification.

Vuttipittayamongkol, Pattaramon

Repository landing page

oai:rgu-repository.worktribe.com:1239009

Learning from class-imbalanced data: overlap-driven resampling for imbalanced data classification.

Authors: Pattaramon Vuttipittayamongkol
Publication date: 31 October 2020
Publisher

Abstract

Classification of imbalanced datasets has attracted substantial research interest over the past years. This is because imbalanced datasets are common in several domains such as health, finance and security, but learning algorithms are generally not designed to handle them. Many existing solutions focus mainly on the class distribution problem. However, a number of reports showed that class overlap had a higher negative impact on the learning process than class imbalance. This thesis thoroughly explores the impact of class overlap on the learning algorithm and demonstrates how elimination of class overlap can effectively improve the classification of imbalanced datasets. Novel undersampling approaches were developed with the main objective of enhancing the presence of minority class instances in the overlapping region. This is achieved by identifying and removing majority class instances potentially residing in such a region. Seven methods under the two different approaches were designed for the task. Extensive experiments were carried out to evaluate the methods on simulated and well-known real-world datasets. Results showed that substantial improvement in the classification accuracy of the minority class was obtained with favourable trade-offs with the majority class accuracy. Moreover, successful application of the methods in predictive diagnostics of diseases with imbalanced records is presented. These novel overlap-based approaches have several advantages over other common resampling methods. First, the undersampling amount is independent of class imbalance and proportional to the degree of overlap. This could effectively address the problem of class overlap while reducing the effect of class imbalance. Second, information loss is minimised as instance elimination is contained within the problematic region. Third, adaptive parameters enable the methods to be generalised across different problems. It is also worth pointing out that these methods provide different trade-offs, which offer more alternatives to real-world users in selecting the best fit solution to the problem

Similar works

Full text

Open in the Core reader

Download PDF

Open Access Institutional Repository at Robert Gordon University

oai:rgu-repository.worktribe.c...

Last time updated on 04/03/2021

This paper was published in Open Access Institutional Repository at Robert Gordon University.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.