Published on Mon Apr 08 2019

Jointly Measuring Diversity and Quality in Text Generation Models

Ehsan Montahaei, Danial Alihosseini, Mahdieh Soleymani Baghshah

Text generation is an important Natural Language Processing task with various applications. Several metrics have already been introduced to evaluate text generation methods. The most popular metrics such as BLEU only consider the quality of generated sentences and neglect their diversity.

0
0
0
Abstract

Text generation is an important Natural Language Processing task with various applications. Although several metrics have already been introduced to evaluate the text generation methods, each of them has its own shortcomings. The most widely used metrics such as BLEU only consider the quality of generated sentences and neglect their diversity. For example, repeatedly generation of only one high quality sentence would result in a high BLEU score. On the other hand, the more recent metric introduced to evaluate the diversity of generated texts known as Self-BLEU ignores the quality of generated texts. In this paper, we propose metrics to evaluate both the quality and diversity simultaneously by approximating the distance of the learned generative model and the real data distribution. For this purpose, we first introduce a metric that approximates this distance using n-gram based measures. Then, a feature-based measure which is based on a recent highly deep model trained on a large text corpus called BERT is introduced. Finally, for oracle training mode in which the generator's density can also be calculated, we propose to use the distance measures between the corresponding explicit distributions. Eventually, the most popular and recent text generation models are evaluated using both the existing and the proposed metrics and the preferences of the proposed metrics are determined.

Fri Jul 03 2020
Machine Learning
On the Relation between Quality-Diversity Evaluation and Distribution-Fitting Goal in Text Generation
The goal of text generation models is to fit the underlying real probability worrisomedistribution of text. For performance evaluation, quality and diversity metrics are usually applied. We propose CR/NRR as a substitute for the BLEU/Self/BLEU metric pair.
0
0
0
Tue Apr 27 2021
Machine Learning
Text Generation with Deep Variational GAN
Generating realistic sequences is a central task in many machine learning applications. The issue of mode-collapsing remains a main issue for the current models. In this paper we propose a GAN-based generic framework to address the problem.
0
0
0
Mon Apr 06 2020
NLP
Sparse Text Generation
State-of-the-art text generators build on powerful language models such as GPT-2. They require sampling from a modified softmax, via temperature parameters or ad-hoc truncation techniques. This mismatch creates a mismatch between training and testing conditions.
0
0
0
Fri Jul 31 2020
Artificial Intelligence
Neural Language Generation: Formulation, Methods, and Evaluation
Recent advances in neural network-based generative modeling have reignited the hopes in having computer systems capable of seamlessly conversing with humans. While the field of natural language generation is evolving rapidly, there are still many open challenges to address.
0
0
0
Thu Apr 09 2020
NLP
BLEURT: Learning Robust Metrics for Text Generation
BLEURT is a metric based on BERT that can model human judgments with afew thousand possibly biased training examples. It provides state-of-the-art results on the WMT Metrics shared task and the WebNLG Competition dataset.
0
0
0
Fri Jun 26 2020
NLP
Evaluation of Text Generation: A Survey
The paper surveys evaluation methods of natural language generation (NLG) that have been developed in the last few years. We group NLG evaluation methods into three categories: human-centric, automatic and machine-learned.
0
0
0