Published on Sun May 02 2021

A survey on VQA_Datasets and Approaches

Yeyun Zou, Qiyu Xie
0
0
0
Abstract

Visual question answering (VQA) is a task that combines both the techniques of computer vision and natural language processing. It requires models to answer a text-based question according to the information contained in a visual. In recent years, the research field of VQA has been expanded. Research that focuses on the VQA, examining the reasoning ability and VQA on scientific diagrams, has also been explored more. Meanwhile, more multimodal feature fusion mechanisms have been proposed. This paper will review and analyze existing datasets, metrics, and models proposed for the VQA task.

Sat Oct 24 2020
Computer Vision
Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions
Visual Question Answering is a multi-modal task that aims to measure high-level visual understanding. Contemporary VQA models are restrictive in the sense that answers are obtained via classification over a limited vocabulary. To take a step forward, we introduce a new task: ViQAR.
7
1
5
Thu Nov 29 2018
Computer Vision
Visual Question Answering as Reading Comprehension
Visual question answering (VQA) demands simultaneous comprehension of both visual content and natural language questions. Current methods jointly embed both the visual information and the textual feature into the same space. We propose to unify all the input information by natural language.
0
0
0
Fri Feb 15 2019
Computer Vision
Cycle-Consistency for Robust Visual Question Answering
VQA-Rephrasings contains 3 human-provided rephrasings for 40kQuestions spanning 40k images from the VQA v2.0 validation dataset. We show that our approach is significantly more robust to linguistic variations than state-of-the-art models.
0
0
0
Sun Sep 24 2017
Computer Vision
Survey of Recent Advances in Visual Question Answering
Visual Question Answering (VQA) presents a unique challenge as it requires the ability to understand and encode the multi-modal inputs. This paper presents a survey of different approaches proposed to solve the problem.
0
0
0
Sun Mar 28 2021
NLP
'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks
0
0
0
Mon Jan 29 2018
Computer Vision
Object-based reasoning in VQA
Visual Question Answering (VQA) is a novel problem domain where multi-modal inputs must be processed in order to solve the task. As the solutions inherently require to combine visual and natural language processing with abstract reasoning, the problem is considered AI-complete.
0
0
0