HuSpaCy : an industrial-strength Hungarian natural language processing toolkit

Orosz György; Szántó Zsolt; Berkecz Péter; Szabó Gergő; Farkas Richárd

Repository landing page

oai:acta.bibl.u-szeged.hu:75865

HuSpaCy : an industrial-strength Hungarian natural language processing toolkit

Authors: Orosz György
Szántó Zsolt
Berkecz Péter
Szabó Gergő
Farkas Richárd
Publication date: 1 January 2022
Publisher

Abstract

Although there are a couple of open-source language processing pipelines available for Hungarian, none of them satisfies the requirements of today’s NLP applications. A language processing pipeline should consist of close to state-of-the-art lemmatization, morphosyntactic analysis, entity recognition and word embeddings. Industrial text processing applications have to satisfy non-functional software quality requirements, what is more, frameworks supporting multiple languages are more and more favored. This paper introduces HuSpaCy, an industryready Hungarian language processing toolkit. The presented tool provides components for the most important basic linguistic analysis tasks. It is open-source and is available under a permissive license. Our system is built upon spaCy’s NLP components resulting in an easily usable, fast yet accurate application. Experiments confirm that HuSpaCy has high accuracy while maintaining resource-efficient prediction capabilities

Similar works

Full text

Open in the Core reader

Download PDF

University of Szeged

oai:acta.bibl.u-szeged.hu:7586...

Last time updated on 16/06/2022

This paper was published in University of Szeged.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.