Published on Tue May 09 2017

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

Mandar Joshi, Eunsol Choi, Daniel S. Weld, Luke Zettlemoyer

TriviaQA includes 95Kquestion-answer pairs authored by trivia enthusiasts. Six per question on average provide high quality distant supervision for answering the questions. We also present two algorithms that perform well on SQuAD reading comprehension. Neither approach comes close to human performance (23% and 40%)

1
0
0
Abstract

We present TriviaQA, a challenging reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions. We show that, in comparison to other recently introduced large-scale datasets, TriviaQA (1) has relatively complex, compositional questions, (2) has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and (3) requires more cross sentence reasoning to find answers. We also present two baseline algorithms: a feature-based classifier and a state-of-the-art neural network, that performs well on SQuAD reading comprehension. Neither approach comes close to human performance (23% and 40% vs. 80%), suggesting that TriviaQA is a challenging testbed that is worth significant future study. Data and code available at -- http://nlp.cs.washington.edu/triviaqa/

Tue Nov 14 2017
NLP
DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications
This paper introduces DuReader, a new large-scale, open-domain Chinese reading comprehension (MRC) dataset. DuReader has three advantages over previous MRC datasets: (1) data sources, (2) question types, and (3) scale.
0
0
0
Thu Jun 16 2016
NLP
SQuAD: 100,000+ Questions for Machine Comprehension of Text
The Stanford Question Answering Dataset (SQuAD) consists of 100,000+ questions posed by crowdworkers on Wikipedia. The answer to each question is a segment of text from the corresponding reading passage. We analyze the dataset to understand the types of reasoning required to answer the
1
0
0
Tue Aug 14 2018
Artificial Intelligence
How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks
Many recent papers address reading comprehension. Presumably, a model must combine information from both questions and passages to predict corresponding answers. We establish sensible baselines for the bAbI, SQuAD, CBT, CNN, and Who-did-What datasets.
0
0
0
Tue Nov 29 2016
Artificial Intelligence
NewsQA: A Machine Comprehension Dataset
Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN. The performance gap between humans and machines (0.198 in F1) demonstrates that significant progress can be made on NewsQA through future research.
0
0
0
Sat Sep 28 2019
NLP
Integrated Triaging for Fast Reading Comprehension
Integrated Triaging is a framework that prunes almost all context in early layers of a network. This pruning increases the efficiency of MRC models and prevents the later layers from overfitting to prevalent short paragraphs in the training set.
0
0
0
Sun Dec 29 2019
NLP
ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine Reading Comprehension
Reading comprehension is one of the crucial tasks for furthering research in natural language understanding. A lot of diverse reading comprehension datasets have recently been introduced to study various phenomena in natural language. The evaluation server places no restrictions on how models are trained, so it is a suitable test bed.
0
0
0