Word-Region Alignment-Guided Multimodal Neural Machine Translation

Zhao, Yuting; Komachi, Mamoru; Kajiwara, Tomoyuki; Chu, Chenhui

Repository landing page

oai:repository.kulib.kyoto-u.ac.jp:2433/267448

Word-Region Alignment-Guided Multimodal Neural Machine Translation

Authors: Yuting Zhao
Mamoru Komachi
Tomoyuki Kajiwara
Chenhui Chu
Publication date: 1 January 2022
Publisher: IEEE
Doi

Abstract

We propose word-region alignment-guided multimodal neural machine translation (MNMT), a novel model for MNMT that links the semantic correlation between textual and visual modalities using word-region alignment (WRA). Existing studies on MNMT have mainly focused on the effect of integrating visual and textual modalities. However, they do not leverage the semantic relevance between the two modalities. We advance the semantic correlation between textual and visual modalities in MNMT by incorporating WRA as a bridge. This proposal has been implemented on two mainstream architectures of neural machine translation (NMT): the recurrent neural network (RNN) and the transformer. Experiments on two public benchmarks, English--German and English--French translation tasks using the Multi30k dataset and English--Japanese translation tasks using the Flickr30kEnt-JP dataset prove that our model has a significant improvement with respect to the competitive baselines across different evaluation metrics and outperforms most of the existing MNMT models. For example, 1.0 BLEU scores are improved for the English-German task and 1.1 BLEU scores are improved for the English-French task on the Multi30k test2016 set; and 0.7 BLEU scores are improved for the English-Japanese task on the Flickr30kEnt-JP test set. Further analysis demonstrates that our model can achieve better translation performance by integrating WRA, leading to better visual information use

Similar works

Full text

Open in the Core reader

Download PDF

Kyoto University Research Information Repository

oai:repository.kulib.kyoto-u.a...

Last time updated on 13/01/2022

This paper was published in Kyoto University Research Information Repository.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.