Published on Sat Oct 17 2020

A Grid-based Representation for Human Action Recognition

Soufiane Lamghari, Guillaume-Alexandre Bilodeau, Nicolas Saunier

Human action recognition (HAR) in videos is a fundamental research topic in computer vision. It consists mainly in understanding actions performed by humans based on a sequence of visual observations. Most of existing approaches for action recognition rely on information not always relevant for this task.

0
0
0
Abstract

Human action recognition (HAR) in videos is a fundamental research topic in computer vision. It consists mainly in understanding actions performed by humans based on a sequence of visual observations. In recent years, HAR have witnessed significant progress, especially with the emergence of deep learning models. However, most of existing approaches for action recognition rely on information that is not always relevant for this task, and are limited in the way they fuse the temporal information. In this paper, we propose a novel method for human action recognition that encodes efficiently the most discriminative appearance information of an action with explicit attention on representative pose features, into a new compact grid representation. Our GRAR (Grid-based Representation for Action Recognition) method is tested on several benchmark datasets demonstrating that our model can accurately recognize human actions, despite intra-class appearance variations and occlusion challenges.

Sun May 12 2019
Computer Vision
NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding
Researchers have created a large-scale dataset for RGB+D human action recognition. The dataset contains 114 thousand video samples and 8 million frames. It contains 120 different action classes including daily, mutual, and health-related activities.
0
0
0
Thu Oct 22 2020
Computer Vision
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition
In recent years, a number of approaches based on 2D or 3D convolutional neural networks have emerged for video action recognition. In this paper, we carry out in-depth comparative analysis to better understand the differences between these approaches and the progress made by them.
0
0
0
Sun Apr 02 2017
Machine Learning
Hidden Two-Stream Convolutional Networks for Action Recognition
State-of-the-art action recognition methods rely on traditional optical flow estimation methods. Such a two-stage approach is computationally expensive and storage demanding. We name our approach hidden two-stream CNNs because it only takes raw video frames as input and directly predicts action classes.
0
0
0
Wed Nov 29 2017
Computer Vision
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition
Motion representation plays a vital role in human action recognition in videos. In this study, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature. The OFF is derived from the definition of optical flow and is orthogonal to the optical flow.
0
0
0
Wed Mar 22 2017
Computer Vision
PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding
There is a lack of standard large-scale benchmarks, especially for current popular data-hungry deep learning based methods. In this paper, we introduce a new large scale benchmark (PKU-MMD) for continuous multi-modality 3D human action understanding.
0
0
0
Fri Nov 24 2017
Computer Vision
Appearance-and-Relation Networks for Video Classification
Spatiotemporal feature learning in videos is a fundamental problem in computer vision. This paper presents a new architecture, termed as Appearance-and-Relation Network (ARTNet), to learn video representation in an end-to-end manner.
0
0
0