Published on Sun Oct 07 2018

Assessing Crosslingual Discourse Relations in Machine Translation

Karin Sim Smith, Lucia Specia

The challenge is to detect the discourse relations in the source text and determine whether these relations are correctly transferred crosslingually to the target language. This methodology could be used independently for discourse-level evaluation, or as a component in other metrics.

0
0
0
Abstract

In an attempt to improve overall translation quality, there has been an increasing focus on integrating more linguistic elements into Machine Translation (MT). While significant progress has been achieved, especially recently with neural models, automatically evaluating the output of such systems is still an open problem. Current practice in MT evaluation relies on a single reference translation, even though there are many ways of translating a particular text, and it tends to disregard higher level information such as discourse. We propose a novel approach that assesses the translated output based on the source text rather than the reference translation, and measures the extent to which the semantics of the discourse elements (discourse relations, in particular) in the source text are preserved in the MT output. The challenge is to detect the discourse relations in the source text and determine whether these relations are correctly transferred crosslingually to the target language -- without a reference translation. This methodology could be used independently for discourse-level evaluation, or as a component in other metrics, at a time where substantial amounts of MT are online and would benefit from evaluation where the source text serves as a benchmark.

Thu Nov 28 2019
Artificial Intelligence
DiscoTK: Using Discourse Structure for Machine Translation Evaluation
We present novel automatic metrics for machine translation evaluation. We use discourse structure and convolution kernels to compare the discourse tree of an automatic translation with that of the human reference. Experiments on the WMT12 and WMT13 shared task datasets show correlation with human judgments.
0
0
0
Thu Apr 30 2020
NLP
Can Your Context-Aware MT System Pass the DiP Benchmark Tests? : Evaluation Benchmarks for Discourse Phenomena in Machine Translation
Despite increasing instances of machine translation (MT) systems, the evidence for translation quality improvement is sparse. Popular metrics like BLEU are not expressive or sensitive enough to capture quality improvements or drops that are minor in size but significant in perception.
0
0
0
Wed Oct 04 2017
NLP
Discourse Structure in Machine Translation Evaluation
This article explores the potential of using sentence-level discourse structure for machine translation evaluation. We first design discourse-aware similarity measures, which use all-subtree kernels to compare discourse parse trees in accordance with the Rhetorical Structure Theory. Then, we show that a simple linear combination
0
0
0
Wed Nov 01 2017
NLP
Evaluating Discourse Phenomena in Neural Machine Translation
For machine translation to tackle discourse phenomena, models must have access to extra-sentential linguistic context. We investigate the performance of recently proposed multi-encoder NMT models trained on subtitles for English to French.
0
0
0
Mon Mar 22 2021
Artificial Intelligence
BlonD: An Automatic Evaluation Metric for Document-level MachineTranslation
0
0
0
Thu Aug 08 2019
NLP
A Test Suite and Manual Evaluation of Document-Level NMT at WMT19
Test suite for WMT19 aimed at assessing discourse phenomena of machine translation systems. We have manually checked the outputs and identified types of translation errors.
0
0
0