Published on Tue Aug 27 2019

Global-Local Temporal Representations For Video Person Re-Identification

Jianing Li, Jingdong Wang, Qi Tian, Wen Gao, Shiliang Zhang

This paper proposes the Global-Local Temporal Representation (GLTR) tooit the multi-scale temporal cues in video sequences. GLTR is constructed by first modeling the short-term cues among adjacent frames, then capturing the long-term relations among inconsecutive frames.

0
0
0
Abstract

This paper proposes the Global-Local Temporal Representation (GLTR) to exploit the multi-scale temporal cues in video sequences for video person Re-Identification (ReID). GLTR is constructed by first modeling the short-term temporal cues among adjacent frames, then capturing the long-term relations among inconsecutive frames. Specifically, the short-term temporal cues are modeled by parallel dilated convolutions with different temporal dilation rates to represent the motion and appearance of pedestrian. The long-term relations are captured by a temporal self-attention model to alleviate the occlusions and noises in video sequences. The short and long-term temporal cues are aggregated as the final GLTR by a simple single-stream CNN. GLTR shows substantial superiority to existing features learned with body part cues or metric learning on four widely-used video ReID datasets. For instance, it achieves Rank-1 Accuracy of 87.02% on MARS dataset without re-ranking, better than current state-of-the art.

Wed Dec 26 2018
Computer Vision
3D PersonVLAD: Learning Deep Global Representations for Video-based Person Re-identification
The paper introduces a global video representation to video-based person re-identification. It aggregates local 3D features across the entire video extent. The proposed network is further augmented with a 3D part alignment module.
0
0
0
Mon Jul 16 2018
Computer Vision
SCAN: Self-and-Collaborative Attention Network for Video Person Re-identification
Video person re-identification attracts much attention in recent years. It aims to match image sequences of pedestrians from different camera views. Previous approaches usually improve this task from three aspects. We present a novel and practical deep architecture for video personReidentification.
0
0
0
Thu Jul 12 2018
Computer Vision
Video-based Person Re-identification via 3D Convolutional Networks and Non-local Attention
Video-based person re-identification (ReID) is a challenging problem. Feature aggregation from a video track is a key step for video-based ReID. Many existing methods tackle this problem by average/maximumTemporal pooling or RNNs with attention.
0
0
0
Mon Aug 05 2019
Computer Vision
Spatially and Temporally Efficient Non-local Attention Network for Video-based Person Re-Identification
Video-based person re-identification (Re-ID) aims at matching video sequences across non-overlapping cameras. We propose a Non-local Video Attention Network (NVAN) to incorporate video characteristics into the representation at multiple feature levels.
0
0
0
Fri Apr 30 2021
Computer Vision
BiCnet-TKS: Learning Efficient Spatial-Temporal Representation for Video Person Re-Identification
0
0
0
Fri Nov 09 2018
Computer Vision
STA: Spatial-Temporal Attention for Large-Scale Video-based Person Re-Identification
Spatial-Temporal Attention (STA) approach to tackle the large-scale person re-identification task in videos. STA fully exploits those discriminative parts of one target person in both spatial and temporal dimensions, which results in a 2-D attention score matrix.
0
0
0