On Identifying Critical Nuggets Of Information During Classification Task

Sathiaraj, David

Repository landing page

oai:repository.lsu.edu:gradschool_dissertations-1505

On Identifying Critical Nuggets Of Information During Classification Task

Authors: David Sathiaraj
Publication date: 1 January 2013
Publisher: LSU Scholarly Repository
Doi

Abstract

In large databases, there may exist critical nuggets - small collections of records or instances that contain domain-specific important information. This information can be used for future decision making such as labeling of critical, unlabeled data records and improving classification results by reducing false positive and false negative errors. In recent years, data mining efforts have focussed on pattern and outlier detection methods. However, not much effort has been dedicated to finding critical nuggets within a data set. This work introduces the idea of critical nuggets, proposes an innovative domain-independent method to measure criticality, suggests a heuristic to reduce the search space for finding critical nuggets, and isolates and validates critical nuggets from some real world data sets. It seems that only a few subsets may qualify to be critical nuggets, underlying the importance of finding them. The proposed methodology can detect them. This work also identifies certain properties of critical nuggets and provides experimental validation of the properties. Critical nuggets were then applied to 2 important classification task related performance metrics - classification accuracy and misclassification costs. Experimental results helped validate that critical nuggets can assist in improving classification accuracies in real world data sets when compared with other standalone classification algorithms. The improvements in accuracy using the critical nuggets were statistically significant. Extensive studies were also undertaken on real world data sets that utilized critical nuggets to help minimize misclassification costs. In this case as well the critical nuggets based approach yielded statistically significant, lower misclassification costs than than standalone classification methods

Similar works

Full text

Louisiana State University

oai:repository.lsu.edu:gradsch...

Last time updated on 26/10/2023

This paper was published in Louisiana State University.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.