Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging

Reimers, Nils; Gurevych, Iryna

Repository landing page

oai:tubiblio.ulb.tu-darmstadt.de:104568

Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging

Authors: Nils Reimers
Iryna Gurevych
Publication date: 1 September 2017
Publisher

Abstract

In this paper we show that reporting a single performance score is insufficient to compare non-deterministic approaches. We demonstrate this for common sequence tagging tasks that the seed value for the random number generator can result in statistically significant (p < 10^{-4}) differences for state-of-the-art systems. For two recent systems for NER, we observe an absolute difference of one percentage point F1-score depending on the selected seed value, making these systems perceived either as state-of-the-art or mediocre. Instead of publishing and reporting single performance scores, we propose to compare score distributions based on multiple executions. Based on the evaluation of 50.000 LSTM-networks for five sequence tagging tasks, we present network architectures that perform superior as well as produce results with higher stability on unseen data

Similar works

Full text

TUbiblio

oai:tubiblio.ulb.tu-darmstadt....

Last time updated on 05/04/2020

This paper was published in TUbiblio.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.