We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.
In morphologically complex languages, many high-level tasks in natural language
processing rely on accurate morphosyntactic analyses of the input. However, in
light of the risk of error propagation in present-day pipeline architectures for basic
linguistic pre-processing, the state of the art for morphosyntactic tagging is still
not satisfactory. The main obstacle here is data sparsity inherent to natural lan-
guage in general and highly inflected languages in particular.
In this work, we investigate whether semi-supervised systems may alleviate the
data sparsity problem. Our approach uses word clusters obtained from large
amounts of unlabelled text in an unsupervised manner in order to provide a su-
pervised probabilistic tagger with morphologically informed features. Our evalua-
tions on a number of datasets for the Polish language suggest that this simple
technique improves tagging accuracy, especially with regard to out-of-vocabulary
words. This may prove useful to increase cross-domain performance of taggers,
and to alleviate the dependency on large amounts of supervised training data,
which is especially important from the perspective of less-resourced languages
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.