An approach to validity indices for clustering techniques in Big Data

Luna Romera, José María; García Gutiérrez, Jorge; Martínez Ballesteros, María del Mar; Riquelme Santos, José Cristóbal

Repository landing page

oai:idus.us.es:11441/132065

An approach to validity indices for clustering techniques in Big Data

Authors: José María Luna Romera
Jorge García Gutiérrez
María del Mar Martínez Ballesteros
José Cristóbal Riquelme Santos
Publication date: 1 January 2018
Publisher: Springer
Doi

Abstract

Clustering analysis is one of the most used Machine Learning techniques to discover groups among data objects. Some clustering methods require the number of clus ters into which the data is going to be partitioned. There exist several cluster validity indices that help us to approximate the optimal number of clusters of the dataset. However, such indices are not suitable to deal with Big Data due to its size limitation and runtime costs. This paper presents two cluster ing validity indices that handle large amount of data in low computational time. Our indices are based on redefinitions of traditional indices by simplifying the intra-cluster distance calculation. Two types of tests have been carried out over 28 synthetic datasets to analyze the performance of the proposed indices. First, we test the indices with small and medium size datasets to verify that our indices have a similar effectiveness to the traditional ones. Subsequently, tests on datasets of up to 11 million records and 20 features have been executed to check their efficiency. The results show that both indices can handle Big Data in a very low computational time with an effectiveness similar to the traditional indices using Apache Spark framework.Ministerio de Economía y Competitividad TIN2014-55894-C2-1-

Similar works

Full text

Open in the Core reader

Download PDF

idUS. Depósito de Investigación Universidad de Sevilla

oai:idus.us.es:11441/132065

Last time updated on 19/05/2022

This paper was published in idUS. Depósito de Investigación Universidad de Sevilla.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.