End-to-End Simultaneous Speech Translation

Ma, Xutai

Repository landing page

oai:jscholarship.library.jhu.edu:1774.2/67856

End-to-End Simultaneous Speech Translation

Authors: Xutai Ma
Publication date: 30 January 2023
Publisher: 'The Busan Gyeongnam Mathematical Society'

Abstract

Speech translation is the task of translating speech in one language to text or speech in another language, while simultaneous translation aims at lower translation latency by starting the translation before the speaker finishes a sentence. The combination of the two, simultaneous speech translation, can be applied in low latency scenarios such as live video caption translation and real-time interpretation. This thesis will focus on an end-to-end or direct approach for simultaneous speech translation. We first define the task of simultaneous speech translation, including the challenges of the task and its evaluation metrics. We then progressly introduce our contributions to tackle the challenges. First, we proposed a novel simultaneous translation policy, mono- tonic multihead attention, for transformer models on text-to-text translation. Second, we investigate the issues and potential solutions when adapting text-to-text simultaneous policies to end-to-end speech-to-text translation models. Third, we introduced the augmented memory transformer encoder for simultaneous speech-to-text translation models for better computation efficiency. Fourth, we explore a direct simultaneous speech translation with variational monotonic multihead attention policy, based on recent speech-to-unit models. At the end, we provide some directions for potential future research

Similar works

Full text

Johns Hopkins University

oai:jscholarship.library.jhu.e...

Last time updated on 06/12/2023

This paper was published in Johns Hopkins University.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.