Published on Wed Apr 29 2020

ToTTo: A Controlled Table-To-Text Generation Dataset

Ankur P. Parikh, Xuezhi Wang, Sebastian Gehrmann, Manaal Faruqui, Bhuwan Dhingra, Diyi Yang, Dipanjan Das

ToTTo is an open-domain English table-to-text dataset with over120,000 training examples. It proposes a controlled generation task: given a set of highlighted table cells, produce a one-sentence description.

0
0
0
Abstract

We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. To obtain generated targets that are natural but also faithful to the source table, we introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia. We present systematic analyses of our dataset and annotation process as well as results achieved by several state-of-the-art baselines. While usually fluent, existing methods often hallucinate phrases that are not supported by the table, suggesting that this dataset can serve as a useful research benchmark for high-precision conditional text generation.

Mon Jun 03 2019
NLP
Handling Divergent Reference Texts when Evaluating Table-to-Text Generation
We show that metrics which rely solely on the reference texts, such as BLEU and ROUGE, show poor correlation with human judgments. We propose a new metric, PARENT, which aligns n-grams from the reference and generated texts before computing their precision and recall.
0
0
0
Tue Dec 29 2020
NLP
WikiTableT: A Large-Scale Data-to-Text Dataset for Generating Wikipedia Article Sections
We cast generating Wikipedia sections as a data-to-text generation task. We create a large-scale dataset, WikiTableT, that pairs Wikipedia sections with their corresponding tabular data and various metadata. Our analysis shows that the best approaches can generate fluent and high quality texts.
3
0
0
Tue May 29 2018
Artificial Intelligence
Table-to-Text: Describing Table Region with Natural Language
The model maps a row from a table to a continuous vector and then generates a natural language sentence by leveraging the semantics of a table. To deal with rare words appearing in a table, we develop a flexible copying mechanism. Extensive experiments demonstrate the accuracy of the model.
0
0
0
Sat Nov 09 2019
NLP
Table-to-Text Natural Language Generation with Unseen Schemas
Table-to-text natural language generation (NLG) tasks focus on generating text from schemas that are already seen in the training set. We propose a new task of NLG with unseen schemas, which specifically aims to test the generalization of NLG.
0
0
0
Mon May 31 2021
NLP
Sketch and Refine: Towards Faithful and Informative Table-to-Text Generation
Table-to-text generation refers to generating a descriptive text from a table. Traditional autoregressive methods, though can generate text with high fluency, suffer from low coverage and poor faithfulness problems. To mitigate these problems, we propose a novel Skeleton-based two-stage
1
0
0
Mon Jul 06 2020
NLP
DART: Open-Domain Structured Data Record to Text Generation
We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs) We present systematic evaluation on DART as well as new state-of-the-art results on WebNLG 2017.
0
0
0