Parsing as Pretraining

Vilares, David; Strzyz, Michalina; Søgaard, Anders; Gómez-Rodríguez, Carlos

Repository landing page

oai:ruc.udc.es:2183/24893

Parsing as Pretraining

Authors: David Vilares
Michalina Strzyz
Anders Søgaard
Carlos Gómez-Rodríguez
Publication date: 1 January 2020
Publisher

Abstract

[Abstract] Recent analyses suggest that encoders pretrained for language modeling capture certain morpho-syntactic structure. However, probing frameworks for word vectors still do not report results on standard setups such as constituent and dependency parsing. This paper addresses this problem and does full parsing (on English) relying only on pretraining architectures – and no decoding. We first cast constituent and dependency parsing as sequence tagging. We then use a single feed-forward layer to directly map word vectors to labels that encode a linearized tree. This is used to: (i) see how far we can reach on syntax modelling with just pretrained encoders, and (ii) shed some light about the syntax-sensitivity of different word vectors (by freezing the weights of the pretraining network during training). For evaluation, we use bracketing F1-score and LAS, and analyze in-depth differences across representations for span lengths and dependency displacements. The overall results surpass existing sequence tagging parsers on the PTB (93.5%) and end-to-end EN-EWT UD (78.8%).We thank Mark Anderson and Daniel Hershcovich for their comments. DV, MS and CGR are funded by the ERC under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, grant No 714150), by the ANSWER-ASAP project (TIN2017-85160-C2-1-R) from MINECO, and by Xunta de Galicia (ED431B 2017/01). AS is funded by a Google Focused Research AwardXunta de Galicia; ED431B 2017/0

Similar works

Full text

Open in the Core reader

Download PDF

Repositorio da Universidade da Coruña

oai:ruc.udc.es:2183/24893

Last time updated on 01/04/2020

This paper was published in Repositorio da Universidade da Coruña.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.