Using Resources from a Closely-related Language to Develop ASR for a Very Under-resourced Language: A Case Study for Iban

Samson, Sarah; Besacier, Laurent; Lecouteux, Benjamin; Dyab, Mohamed

Repository landing page

oai:HAL:hal-02015501v1

Using Resources from a Closely-related Language to Develop ASR for a Very Under-resourced Language: A Case Study for Iban

Authors: Sarah Samson
Laurent Besacier
Benjamin Lecouteux
Mohamed Dyab
Publication date: 6 September 2015
Publisher: HAL CCSD

Abstract

International audienceThis paper presents our strategies for developing an automatic speech recognition system for Iban, an under-resourced language. We faced several challenges such as no pronunciation dictionary and lack of training material for building acoustic models. To overcome these problems, we proposed approaches which exploit resources from a closely-related language (Malay). We developed a semi-supervised method for building the pronunciation dictionary and applied cross-lingual strategies for improving acoustic models trained with very limited training data. Both approaches displayed very encouraging results, which show that data from a closely-related language, if available, can be exploited to build ASR for a new language. In the final part of the paper, we present a zero-shot ASR using Malay resources that can be used as an alternative method for transcribing Iban speech

Similar works

Full text

Hal - Université Grenoble Alpes

oai:HAL:hal-02015501v1

Last time updated on 19/03/2019

This paper was published in Hal - Université Grenoble Alpes.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.