Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages

Bharathi Raja Chakravarthi; Mihael Arcan; John P. McCrae

Repository landing page

Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages

Authors: Bharathi Raja Chakravarthi
Mihael Arcan
John P. McCrae
Publication date: 20 May 2019
Publisher
Doi

Abstract

Under-resourced languages are a significant challenge for statistical approaches to machine translation, and recently it has been shown that the usage of training data from closely-related languages can improve machine translation quality of these languages. While languages within the same language family share many properties, many under-resourced languages are written in their own native script, which makes taking advantage of these language similarities difficult. In this paper, we propose to alleviate the problem of different scripts by transcribing the native script into common representation i.e. the Latin script or the International Phonetic Alphabet (IPA). In particular, we compare the difference between coarse-grained transliteration to the Latin script and fine-grained IPA transliteration. We performed experiments on the language pairs English-Tamil, English-Telugu, and English-Kannada translation task. Our results show improvements in terms of the BLEU, METEOR and chrF scores from transliteration and we find that the transliteration into the Latin script outperforms the fine-grained IPA transcription

Similar works

Full text

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

oai:zenodo.org:3266918

Last time updated on 02/12/2022

This paper was published in NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.