Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

What’s in a domain?:Towards fine-grained adaptation for machine translation

Abstract

Machine translation (MT) uses software to translate texts in one language to another language. Modern-day MT systems are built using large amounts of example translations between these two languages, so-called parallel corpora. For many translation tasks, or domains, there are no sizable high-quality parallel corpora, and the resulting mismatch between the training data and the translation task can cause large drops in translation quality. In recent years, this problem has been addressed by adapting an MT system to the domain of interest to improve translation quality.Unfortunately, the concept domain is poorly defined. Typically, domain is a hard-labeled concept that is directly used to optimize MT systems. To shed light on domains and their impact on MT, the core question in this thesis is: "What's in a domain?"Guided by this question, we distinguish various aspects that together make up a domain, i.e., topic, genre, register, dialogue acts, speakers, and speaker gender. We study to what extent MT output differs among these aspects, and how we can use them to perform fine-grained adaptation for MT. We are particularly interested in informal and conversational genres, which lack standardization and are notorious for poor MT output. In addition, we aim to develop methods that do not, or at most partially, rely on manual domain information.By studying what's in a domain and showing how we can use different aspects of language to improve MT, we take a step forward towards fine-grained adaptation for machine translation

Similar works

Full text

thumbnail-image

International Migration, Integration and Social Cohesion online publications

redirect
Last time updated on 08/03/2023

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.