A Twitter corpus and benchmark resources for german sentiment analysis

Cieliebak, Mark; Deriu, Jan Milan; Egger, Dominic; Uzdilli, Fatih

Repository landing page

research

oai:digitalcollection.zhaw.ch:11475/1856

A Twitter corpus and benchmark resources for german sentiment analysis

Authors: Mark Cieliebak
Jan Milan Deriu
Dominic Egger
Fatih Uzdilli
Publication date: 1 January 2017
Publisher: Association for Computational Linguistics
Doi

Abstract

In this paper we present SB10k, a new corpus for sentiment analysis with approx.10,000 German tweets. We use this new corpus and two existing corpora to provide state-of-the-art bench-marks for sentiment analysis in German:we implemented a CNN (based on the winning system of SemEval-2016) and a feature-based SVM and compare their performance on all three corpora. For the CNN, we also created German word embeddings trained on 300M tweets. These word embeddings were then optimized for sentiment analysis using distant-supervised learning. The new corpus, the German word embeddings (plain and optimized), and source code to re-run the benchmarks are publicly available

Similar works

Full text

Open in the Core reader

Download PDF

ZHAW digitalcollection

oai:digitalcollection.zhaw.ch:...

Last time updated on 07/01/2018

This paper was published in ZHAW digitalcollection.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.