Published on Tue Nov 17 2015

Deep multi-scale video prediction beyond mean square error

Michael Mathieu, Camille Couprie, Yann LeCun

pixel-space video prediction may be viewed as a promising avenue for unsupervised feature learning. Many vision applications could benefit from the knowledge of the next frames of videos, that does not require the complexity of tracking every pixel trajectories.

0
0
0
Abstract

Learning to predict future images from a video sequence involves the construction of an internal representation that models the image evolution accurately, and therefore, to some degree, its content and dynamics. This is why pixel-space video prediction may be viewed as a promising avenue for unsupervised feature learning. In addition, while optical flow has been a very studied problem in computer vision for a long time, future frame prediction is rarely approached. Still, many vision applications could benefit from the knowledge of the next frames of videos, that does not require the complexity of tracking every pixel trajectories. In this work, we train a convolutional network to generate future frames given an input sequence. To deal with the inherently blurry predictions obtained from the standard Mean Squared Error (MSE) loss function, we propose three different and complementary feature learning strategies: a multi-scale architecture, an adversarial training method, and an image gradient difference loss function. We compare our predictions to different published results based on recurrent neural networks on the UCF101 dataset

Mon May 10 2021
Machine Learning
Local Frequency Domain Transformer Networks for Video Prediction
0
0
0
Tue Nov 05 2019
Computer Vision
High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks
Predicting future video frames is extremely challenging, as there are many factors of variation that make up the dynamics of how frames change through time. Previously proposed solutions require complex inductive biases inside network architectures. We question if such handcrafted architectures are necessary and instead propose a different approach.
0
0
0
Fri Dec 01 2017
Machine Learning
Folded Recurrent Neural Networks for Future Video Prediction
Future video prediction is an ill-posed Computer Vision problem. This work introduces a double mapping between the input and output of a GRU layer. This allows for recurrent auto-encoders with state sharing between encoder and decoder.
0
0
0
Mon Oct 23 2017
Computer Vision
ContextVP: Fully Context-Aware Video Prediction
Video prediction models based on convolutional networks, recurrent networks, and their combinations often result in blurry predictions. We identify an important contributing factor for imprecise predictions that has not been adequately studied in the literature. To address this issue, we introduce a fully context-aware architecture.
0
0
0
Wed Nov 23 2016
Computer Vision
Deep Feature Flow for Video Recognition
Deep convolutional neutral networks have achieved great success on image recognition tasks. Yet, it is non-trivial to transfer the state-of-the-art image recognition networks to videos. We present deep feature flow, a fast and accurate framework for video recognition.
0
0
0
Mon Mar 26 2018
Neural Networks
Predicting the Future with Transformational States
An intelligent observer looks at the world and sees not only what is, but what is moving and what can be moved. We propose a model that predicts future images by learning to represent the present state and its transformation given only a sequence of images. We describe how this model can be integrated into
0
0
0