Published on Tue Jul 31 2018

What am I Searching for: Zero-shot Target Identity Inference in Visual Search

Mengmi Zhang, Gabriel Kreiman

Can we infer intentions from a person's actions? As an example problem, here we consider how to decipher what a person is searching for by decoding their eye movement behavior. We conducted two psychophysics experiments where we monitored eye movements while subjects searched for a target object. Using those fixations,

0
0
0
Abstract

Can we infer intentions from a person's actions? As an example problem, here we consider how to decipher what a person is searching for by decoding their eye movement behavior. We conducted two psychophysics experiments where we monitored eye movements while subjects searched for a target object. We defined the fixations falling on non-target objects as "error fixations". Using those error fixations, we developed a model (InferNet) to infer what the target was. InferNet uses a pre-trained convolutional neural network to extract features from the error fixations and computes a similarity map between the error fixations and all locations across the search image. The model consolidates the similarity maps across layers and integrates these maps across all error fixations. InferNet successfully identifies the subject's goal and outperforms competitive null models, even without any object-specific training on the inference task.

Mon May 25 2020
Artificial Intelligence
What am I Searching for: Zero-shot Target Identity Inference in Visual Search
InferNet uses a pre-trained convolutional neural network to extract features from the error fixations and computes a similarity map. InferNet successfully identifies the subject's goal and performs competitive null models, even without object-specific training.
0
0
0
Mon Jun 19 2017
Computer Vision
Visual Decoding of Targets During Visual Search From Human Eye Fixations
Gaze was proposed as an implicit source of information to predict the target of visual search. We propose to first encode fixations into a semantic representation and then decode this representation into an image. 62% of the time users were able to select the categories of the decoded image right.
0
0
0
Tue Jan 05 2021
Computer Vision
Look Twice: A Computational Model of Return Fixations across Tasks and Species
Saccadic eye movements allow animals to bring different parts of an image into high-resolution. During free viewing, inhibition of return incentivizes exploration by discouraging previously visited locations. We studied 44,328 return fixations out of 217,440fixations across different tasks.
0
0
0
Thu Jul 12 2018
Computer Vision
Visual Attention driven by Convolutional Features
The understanding of where humans look in a scene is a problem of great interest in visual perception and computer vision. When eye-tracking devices are not a viable option, models of human attention can be used to predict changes.
0
0
0
Wed Oct 05 2016
Computer Vision
DeepGaze II: Reading fixations from deep features trained on object recognition
DeepGaze II is a model that predicts where people look in images. The model uses the features from the VGG-19 deep neural network. After conservative cross-validation, it explains 87% of the explainable information gain in the patterns of fixations.
0
0
0
Tue Aug 22 2017
Computer Vision
CNN Fixations: An unraveling approach to visualize the discriminative image regions
Deep convolutional neural networks (CNN) have revolutionized various fields of vision research. Our approach can analyze variety of CNN based models trained for object recognition and caption generation. Unlike existing methods, we achieve this via unraveling the forward pass operation.
0
0
0