Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

Turney, Peter

Repository landing page

oai:cisti-icist.nrc-cnrc.ca:cistinparc:8914166

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

Authors: Peter Turney
Publication date: 2002
Publisher

Abstract

This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has good associations (e.g., "subtle nuances") and a negative semantic orientation when it has bad associations (e.g., "very cavalier"). In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word "excellent" minus the mutual information between the given phrase and the word "poor". A review is classified as recommended if the average semantic orientation of its phrases is positive. The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations). The accuracy ranges from 84% for automobile reviews to 66% for movie reviews.Cet article pr\ue9sente un algorithme d'apprentissage non dirig\ue9e pouvant servir \ue0 classifier les comptes rendus et critiques en deux cat\ue9gories : \ue0 conseiller ou \ue0 d\ue9conseiller. Dans un compte rendu ou une critique, c'est l'orientation s\ue9mantique moyenne des syntagmes renfermant des adjectifs et des adverbes qui permet de les classifier. On juge qu'un syntagme poss\ue8de une orientation s\ue9mantique positive lorsqu'il comprend des associations positives (p. ex., "subtiles nuances" et une orientation s\ue9mantique n\ue9gative quand les associations sont n\ue9gatives (p. ex., "tr\ue8s cavalier"). Pour \ue9valuer l'orientation s\ue9mantique d'un syntagme, on mesure la transinformation entre un syntagme donn\ue9e et le mot "excellent", dont on soustrait la transinformation entre ce syntagme et le mot "m\ue9diocre". Un compte rendu est classifi\ue9 dans la cat\ue9gorie \ue0 conseiller si l'orientation s\ue9mantique moyenne des syntagmes qui le composent est positive. Le taux d'exactitude moyen de l'algorithme se situe \ue0 74 % pour un \ue9chantillon de 410 comptes rendus ou critiques tir\ue9s de Epinions et portant sur quatre domaines (les automobiles, les banques, les films et les destinations touristiques). Le taux d'exactitude oscille entre 84 % pour les comptes rendus sur les automobiles et 66 % pour les critiques de film.NRC publication: Ye

article

Similar works

Full text

NRC Publications Archive

oai:cisti-icist.nrc-cnrc.ca:ci...

Last time updated on 08/06/2016

This paper was published in NRC Publications Archive.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.