Twitter sentiment for 15 European languages

Mozetič, Igor; Grčar, Miha; Smailović, Jasmina

Repository landing page

oai:www.clarin.si:11356/1054

Twitter sentiment for 15 European languages

Authors: Igor Mozetič
Miha Grčar
Jasmina Smailović
Publication date: 23 February 2016
Publisher: Jožef Stefan Institute
Doi

Abstract

The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators. There are 15 Twitter corpora for the corresponding 15 European languages. The data can be used to train and evaluate Twitter sentiment classifiers, to compute annotator agreement, or to study the differences between language usage on Twitter. The data analysis is described in the following papers: I. Mozetič, M. Grčar, J. Smailović. Multilingual Twitter sentiment classification: The role of human annotators, PLoS ONE 11(5): e0155036, doi: 10.1371/journal.pone.e0155036, 2016. (http://dx.doi.org/10.1371/journal.pone.0155036) I. Mozetič, L. Torgo, V. Cerqueira, J. Smailović. How to evaluate sentiment classifiers for Twitter time-ordered data?, PLoS ONE 13(3): e0194317, doi: 10.1371/journal.pone.0194317, 2018. (https://dx.doi.org/10.1371/journal.pone.0194317

Similar works

Full text

Common Language Resources and Technology Infrastructure - Slovenia

oai:www.clarin.si:11356/1054

Last time updated on 07/05/2019

This paper was published in Common Language Resources and Technology Infrastructure - Slovenia.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.