One of the central tasks of multi-object tracking involves learning a distance metric that is consistent with the semantic similarities of objects. We propose a scale-invariant tracking by using a multi-layer feature aggregation scheme to make the model robust against scale variations and occlusions.
One of the central tasks of multi-object tracking involves learning a
distance metric that is consistent with the semantic similarities of objects.
The design of an appropriate loss function that encourages discriminative
feature learning is among the most crucial challenges in deep neural
network-based metric learning. Despite significant progress, slow convergence
and a poor local optimum of the existing contrastive and triplet loss based
deep metric learning methods necessitates a better solution. In this paper, we
propose cosine-margin-contrastive (CMC) and cosine-margin-triplet (CMT) loss by
reformulating both contrastive and triplet loss functions from the perspective
of cosine distance. The proposed reformulation as a cosine loss is achieved by
feature normalization which distributes the learned features on a hypersphere.
We then propose the MOTS R-CNN framework for joint multi-object tracking and
segmentation, particularly targeted at improving the tracking performance.
Specifically, the tracking problem is addressed through deep metric learning
based on the proposed loss functions. We propose a scale-invariant tracking by
using a multi-layer feature aggregation scheme to make the model robust against
object scale variations and occlusions. The MOTS R-CNN achieves the
state-of-the-art tracking performance on the KITTI MOTS dataset. We show that
the MOTS R-CNN reduces the identity switching by and on cars and
pedestrians, respectively in comparison to Track R-CNN.