Published on Mon May 01 2017

The Promise of Premise: Harnessing Question Premises in Visual Question Answering

Aroma Mahendru, Viraj Prabhu, Akrit Mohapatra, Dhruv Batra, Stefan Lee

Questions about images often contain premises - objects and relationships implied by the question.reasoning about premises can help Visual Question Answering (VQA) models respond more intelligently to irrelevant questions.

0
0
0
Abstract

In this paper, we make a simple observation that questions about images often contain premises - objects and relationships implied by the question - and that reasoning about premises can help Visual Question Answering (VQA) models respond more intelligently to irrelevant or previously unseen questions. When presented with a question that is irrelevant to an image, state-of-the-art VQA models will still answer purely based on learned language biases, resulting in non-sensical or even misleading answers. We note that a visual question is irrelevant to an image if at least one of its premises is false (i.e. not depicted in the image). We leverage this observation to construct a dataset for Question Relevance Prediction and Explanation (QRPE) by searching for false premises. We train novel question relevance detection models and show that models that reason about premises consistently outperform models that do not. We also find that forcing standard VQA models to reason about premises during training can lead to improvements on tasks requiring compositional reasoning.

Mon Jun 27 2016
Computer Vision
Revisiting Visual Question Answering Baselines
VQA is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image Understanding. Instead of treating answers as competing choices, our model receives the answer as input and predicts whether or not an image-question-answer triplet is correct.
0
0
0
Fri Dec 01 2017
Artificial Intelligence
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
A number of studies have found that today's Visual Question Answering (VQA) models are heavily driven by superficial correlations in the training data. To encourage development of models geared toward the latter, we propose a new setting for VQA where for every question type, train and test sets have different prior distributions of answers.
0
0
0
Sun Jun 03 2018
Computer Vision
On the Flip Side: Identifying Counterexamples in Visual Question Answering
Visual question answering (VQA) models respond to open-ended natural language questions about images. We introduce two methods for evaluating existing VQA models against a supervised counterexample prediction task. While our models surpass existing benchmarks on VQ a-CX, we
0
0
0
Mon Sep 23 2019
Computer Vision
Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network
Explanation and high-order reasoning capabilities are crucial for real-world visual question answering with diverse levels of inference complexity. Current VQA benchmarks on natural images with only an accuracy metric end up pushing the models to exploit the dataset biases. We propose a new HVQR benchmark for evaluating explaining ability.
0
0
0
Sun Mar 28 2021
NLP
'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks
0
0
0
Fri Mar 01 2019
Computer Vision
Answer Them All! Toward Universal Visual Question Answering Models
Visual Question Answering (VQA) research is split into two camps: the first focuses on VQA datasets that require natural image understanding and the second focuses on synthetic datasets that test reasoning. We find that methods do not generalize across the two domains.
0
0
0