Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

A cosine based validation measure for Document Clustering

Abstract

Document Clustering is the peculiar application of cluster analysis methods on huge documentary databases. Document Clustering aims at organizing a large quantity of unlabelled documents into a smaller number of meaningful and coherent clusters, similar in content. One of the main unsolved problems in clustering literature is the lack of a reliable methodology to evaluate results, although a wide variety of validation measures has been proposed. If those measures are often unsatisfactory when dealing with numerical databases, they definitely underperform in Document Clustering. This paper proposes a new validation measure. After introducing the most common approaches to Document Clustering, our attention is focused on Spherical K-means, do to its strict connection with the Vector Space Model, typical of Information Retrieval. Since Spherical K-means adopts a cosine-based similarity measure, we propose a validation measure based on the same criterion. The new measure effectiveness is shown in the frame of a comparative study, by involving 13 different corpora (usually used in literature for comparing different proposals) and 15 validation measures

Similar works

Full text

thumbnail-image

Archivio della ricerca - Università degli studi di Napoli Federico II

redirect
Last time updated on 08/02/2017

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.