Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning

Cohen, S. B.; Smith, N. A.

Repository landing page

research

oai:pure.ed.ac.uk:publications/3041a08d-e3b1-47ca-9bb0-ee5a10dff891

Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning

Authors: S. B. Cohen
N. A. Smith
Publication date: 1 January 2012
Publisher

Abstract

Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammars using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. By making assumptions about the underlying distribution that are appropriate for natural language scenarios, we are able to derive distribution-dependent sample complexity bounds for probabilistic grammars. We also give simple algorithms for carrying out empirical risk minimization using this framework in both the supervised and unsupervised settings. In the unsupervised case, we show that the problem of minimizing empirical risk is NP-hard. We therefore suggest an approximate algorithm, similar to expectation-maximization, to minimize the empirical risk

article

Similar works

Full text

Open in the Core reader

Download PDF

Edinburgh Research Explorer

oai:pure.ed.ac.uk:publications...

Last time updated on 08/02/2015

This paper was published in Edinburgh Research Explorer.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.