Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

Active learning in annotating micro-blogs dealing with e-reputation

Abstract

Elections unleash strong political views on Twitter, but what do peoplereally think about politics? Opinion and trend mining on micro blogs dealingwith politics has recently attracted researchers in several fields includingInformation Retrieval and Machine Learning (ML). Since the performance of MLand Natural Language Processing (NLP) approaches are limited by the amount andquality of data available, one promising alternative for some tasks is theautomatic propagation of expert annotations. This paper intends to develop aso-called active learning process for automatically annotating French languagetweets that deal with the image (i.e., representation, web reputation) ofpoliticians. Our main focus is on the methodology followed to build an originalannotated dataset expressing opinion from two French politicians over time. Wetherefore review state of the art NLP-based ML algorithms to automaticallyannotate tweets using a manual initiation step as bootstrap. This paper focuseson key issues about active learning while building a large annotated data setfrom noise. This will be introduced by human annotators, abundance of data andthe label distribution across data and entities. In turn, we show that Twittercharacteristics such as the author's name or hashtags can be considered as thebearing point to not only improve automatic systems for Opinion Mining (OM) andTopic Classification but also to reduce noise in human annotations. However, alater thorough analysis shows that reducing noise might induce the loss ofcrucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science - Vol 3 - Contextualisation digitale - 201

Similar works

Full text

thumbnail-image

Episciences.org

redirect
Last time updated on 02/12/2023

This paper was published in Episciences.org.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.