Published on Fri Oct 16 2020

Towards Accurate Human Pose Estimation in Videos of Crowded Scenes

Li Yuan, Shuning Chang, Xuecheng Nie, Ziyuan Huang, Yichen Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan
0
0
0
Abstract

Video-based human pose estimation in crowded scenes is a challenging problem due to occlusion, motion blur, scale variation and viewpoint change, etc. Prior approaches always fail to deal with this problem because of (1) lacking of usage of temporal information; (2) lacking of training data in crowded scenes. In this paper, we focus on improving human pose estimation in videos of crowded scenes from the perspectives of exploiting temporal context and collecting new data. In particular, we first follow the top-down strategy to detect persons and perform single-person pose estimation for each frame. Then, we refine the frame-based pose estimation with temporal contexts deriving from the optical-flow. Specifically, for one frame, we forward the historical poses from the previous frames and backward the future poses from the subsequent frames to current frame, leading to stable and accurate human pose estimation in videos. In addition, we mine new data of similar scenes to HIE dataset from the Internet for improving the diversity of training set. In this way, our model achieves best performance on 7 out of 13 videos and 56.33 average w\_AP on test dataset of HIE challenge.

Fri Oct 16 2020
Computer Vision
A Simple Baseline for Pose Tracking in Videos of Crowded Scenes
This paper presents our solution to ACM MM challenge: Large-scale Human-centric Video Analysis in Complex Events. We use a multi-object tracking method to assign human ID to each bounding box generated by the detection model. After that, a pose is generated that is assigned to each
0
0
0
Thu Jun 06 2019
Computer Vision
Learning Temporal Pose Estimation from Sparsely-Labeled Videos
Modern approaches for multi-person pose estimation in video require large amounts of dense annotations. To reduce the need for dense annotations, we propose a PoseWarper network that leverages training videos with sparse annotations to learn to perform dense temporal pose propagation and estimation.
0
0
0
Tue Dec 04 2018
Computer Vision
Learning 3D Human Dynamics from Video
From an image of a person in action, we can easily guess the 3D motion of the person in the immediate past and future. We present a framework that can similarly learn arepresentation of 3D dynamics of humans from video via a simple but effective temporal encoding of image features.
0
0
0
Mon Apr 27 2020
Computer Vision
Self-supervised Keypoint Correspondences for Multi-Person Pose Estimation and Tracking in Videos
Video annotation is expensive and time consuming. datasets for multi-person pose estimation and tracking are less diverse. Instead of training the network for estimating keypoint correspondences on video data, it is trained on a large scale image dataset for human pose estimation.
0
0
0
Mon Mar 30 2020
Computer Vision
Combining detection and tracking for human pose estimation in videos
We propose a novel top-down approach that tackles the problem of multi-person human pose estimation and tracking in videos. Our approach consists of three components: a Clip Tracking Network, a Video Tracking Pipeline and a Spatial-Temporal Merging procedure.
0
0
0
Fri Oct 27 2017
Computer Vision
PoseTrack: A Benchmark for Human Pose Estimation and Tracking
"PoseTrack" is a new large-scale benchmark for video-based human pose estimation and articulated tracking. The benchmark is freely accessible at https://posetrack.net.
0
0
0