Published on Tue Jun 15 2021

Physion: Evaluating Physical Prediction from Vision in Humans and Machines

Daniel M. Bear, Elias Wang, Damian Mrowca, Felix J. Binder, Hsiau-Yu Fish Tung, R. T. Pramod, Cameron Holdaway, Sirui Tao, Kevin Smith, Li Fei-Fei, Nancy Kanwisher, Joshua B. Tenenbaum, Daniel L. K. Yamins, Judith E. Fan

Machine learning algorithms excel at many challenging visual tasks. But it is unclear that they can make predictions about commonplace real world physical events. Here, we present a visual and physical prediction benchmark that precisely measures this capability.

3
91
329
Abstract

While machine learning algorithms excel at many challenging visual tasks, it is unclear that they can make predictions about commonplace real world physical events. Here, we present a visual and physical prediction benchmark that precisely measures this capability. In realistically simulating a wide variety of physical phenomena -- rigid and soft-body collisions, stable multi-object configurations, rolling and sliding, projectile motion -- our dataset presents a more comprehensive challenge than existing benchmarks. Moreover, we have collected human responses for our stimuli so that model predictions can be directly compared to human judgments. We compare an array of algorithms -- varying in their architecture, learning objective, input-output structure, and training data -- on their ability to make diverse physical predictions. We find that graph neural networks with access to the physical state best capture human behavior, whereas among models that receive only visual input, those with object-centric representations or pretraining do best but fall far short of human accuracy. This suggests that extracting physically meaningful representations of scenes is the main bottleneck to achieving human-like visual prediction. We thus demonstrate how our benchmark can identify areas for improvement and measure progress on this key aspect of physical understanding.

Mon Jun 05 2017
Computer Vision
Visual Interaction Networks
From just a glance, humans can make rich predictions about the future state of a wide range of physical systems. Modern approaches from engineering, robotics, and graphics are often restricted to narrow domains. We introduce a general-purpose model for learning the dynamics of a physical system from raw visual observations.
0
0
0
Mon Jun 22 2020
Machine Learning
Learning Physical Graph Representations from Visual Scenes
Physical Scene Graphs (PSGs) represent scenes as hierarchical graphs. Bound to each node is a vector of attributes that intuitively represent object properties. PSGNet is a network architecture that learns PSGs by reconstructing scenes through a PSG-structured bottleneck.
0
0
0
Thu Jul 16 2020
Computer Vision
SAILenv: Learning in Virtual Visual Environments Made Simple
Researchers in Machine Learning algorithms, Computer Vision scientists, engineers and others, showed a growing interest in 3D simulators. Most of the existing platforms to interface with 3D environments are often designed to setup navigation-related experiments. SAILenv is specifically designed to be simple andcustomizable.
0
0
0
Tue Mar 20 2018
Artificial Intelligence
IntPhys: A Framework and Benchmark for Visual Intuitive Physics Reasoning
Deep Neural Networks are trained with a future semantic mask prediction. They are then tested on the possible versus impossible discrimi-nation task. The test requires systems to compute a physical plausibilityscore.
0
0
0
Thu Jun 21 2018
Neural Networks
Flexible Neural Representation for Physics Prediction
The Hierarchical Relation Network (HRN) is an end-to-end differentiable neural network based on graph convolution. The HRN handles complex collisions and nonrigid deformations, generating plausible dynamics predictions.
0
0
0
Thu Mar 07 2019
Artificial Intelligence
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
Dramatic progress has been witnessed in basic vision tasks involvinglow-level perception, such as object recognition, detection, and tracking. There is still an enormous performance gap between artificial vision systems and human intelligence in terms of higher-level vision problems.
0
0
0
Fri Feb 21 2020
Machine Learning
Learning to Simulate Complex Physics with Graph Networks
We present a machine learning framework and model implementation that can learn to simulate a wide variety of challenging physical domains. Our framework represents the state of a physical system with particles, expressed as nodes in a graph, and computes dynamics via learned message-passing.
2
599
2,595
Thu Apr 16 2020
Computer Vision
Shortcut Learning in Deep Neural Networks
Deep learning has triggered the current rise of artificial intelligence and is the workhorse of today's machine intelligence. Shortcuts are decision rules that perform well on standard benchmarks but fail to transfer to more challenging conditions. Related issues are known in comparative Psychology, Education and Linguistics.
11
82
306
Fri Mar 23 2018
Artificial Intelligence
Datasheets for Datasets
The machine learning community currently has no standardized process for documenting datasets. To address this gap, we propose datasheets for datasets. Datasheets will facilitate better communication between dataset creators and consumers.
11
51
264
Wed Dec 23 2020
Computer Vision
Training data-efficient image transformers & distillation through attention
We produce a competitive convolution-free transformer by training on Imagenet only. We train them on a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop evaluation) on ImageNet.
2
38
155
Thu Sep 04 2014
Computer Vision
Very Deep Convolutional Networks for Large-Scale Image Recognition
Convolutional networks of increasing depth can achieve state-of-the-art results. The research was the basis of the team's ImageNet Challenge 2014.
2
2
7
Wed Nov 27 2019
Machine Learning
Contrastive Learning of Structured World Models
C-SWMs utilize a contrastive approach for representation learning in environments with compositional structure. We structure each state embedding as a set of object representations and their relations, modeled by a graph neural network. This allows objects to be discovered from raw pixel observations without direct supervision.
1
0
3