Empirical Comparative Analysis of 1-of-K Coding and K-Prototypes in Categorical Clustering

Wang, Fei; Franco, Hector; Pugh, John; Ross, Robert J.

Repository landing page

oai:arrow.tudublin.ie:scschcomcon-1203

Empirical Comparative Analysis of 1-of-K Coding and K-Prototypes in Categorical Clustering

Authors: Fei Wang
Hector Franco
John Pugh
Robert J. Ross
Publication date: 20 September 2016
Publisher: Technological University Dublin
Doi

Abstract

Clustering is a fundamental machine learning application, which partitions data into homogeneous groups. K-means and its variants are the most widely used class of clustering algorithms today. However, the original k-means algorithm can only be applied to numeric data. For categorical data, the data has to be converted into numeric data through 1-of-K coding which itself causes many problems. K-prototypes, another clustering algorithm that originates from the k-means algorithm, can handle categorical data by adopting a different notion of distance. In this paper, we systematically compare these two methods through an experimental analysis. Our analysis shows that K-prototypes is more suited when the dataset is large-scaled, while the performance of k-means with 1-of-K coding is more stable. We believe these are useful heuristics for clustering methods working with highly categorical data

Similar works

Full text

Open in the Core reader

Download PDF

Arrow@TUDublin

oai:arrow.tudublin.ie:scschcom...

Last time updated on 17/04/2020

This paper was published in Arrow@TUDublin.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.