Published on Thu Apr 23 2020

Correct Me If You Can: Learning from Error Corrections and Markings

Julia Kreutzer, Nathaniel Berger, Stefan Riezler

Sequence-to-sequence learning involves a trade-off between signal strength and annotation cost of training data. We show that error-marked data can be used successfully to fine-tune neural machine translation models.

0
0
0
Abstract

Sequence-to-sequence learning involves a trade-off between signal strength and annotation cost of training data. For example, machine translation data range from costly expert-generated translations that enable supervised learning, to weak quality-judgment feedback that facilitate reinforcement learning. We present the first user study on annotation cost and machine learnability for the less popular annotation mode of error markings. We show that error markings for translations of TED talks from English to German allow precise credit assignment while requiring significantly less human effort than correcting/post-editing, and that error-marked data can be used successfully to fine-tune neural machine translation models.

Sun May 27 2018
Machine Learning
Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning
We present a study on reinforcement learning (RL) from human bandit feedback for sequence-to-sequence learning. Best reliability is obtained for standardized cardinal feedback, and cardinal feedback is also easiest to learn and generalize from. This shows that RL is possible even from fairly reliable human feedback.
0
0
0
Sat Feb 10 2018
NLP
Online Learning for Effort Reduction in Interactive Neural Machine Translation
Neural machine translation systems require large amounts of training data and resources. Even with this, the quality of the translations may be insufficient for some users or domains. In such cases, the output of the system must bevised by a human agent.
0
0
0
Sat Jun 10 2017
Machine Learning
Online Learning for Neural Machine Translation Post-editing
Neural machine translation has meant a revolution of the field. Post-editing the outputs of the system is mandatory for tasks requiring high quality. We review neoclassical learning methods and propose a new optimization algorithm.
0
0
0
Fri Oct 09 2020
NLP
MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset
MLQE-PE is a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE) The dataset contains seven language pairs, with human labels for 9,000 translations per language pair.
0
0
0
Fri Sep 15 2017
NLP
Transcribing Against Time
We investigate the problem of manually correcting errors from an automatic speech transcript in a cost-sensitive fashion. This is done by specifying afixed time budget, and then automatically choosing location and size of segments for correction such that the number of corrected errors is maximized. We propose aynamic updating framework
0
0
0
Mon Nov 13 2017
NLP
QuickEdit: Editing Text & Translations by Crossing Words Out
We propose a framework for computer-assisted text editing. It applies to translation post-editing and to paraphrasing. Our proposal relies on very simple interactions. A human editor modifies a sentence by marking tokens they would like the system to change. Our model then generates a new sentence
0
0
0