We propose the use of dilated filters to construct an aggregation module for perspective-free counting. Our experiments show that our proposed network outperforms the state-of-the-art on many benchmark datasets.
We propose the use of dilated filters to construct an aggregation module in a
multicolumn convolutional neural network for perspective-free counting.
Counting is a common problem in computer vision (e.g. traffic on the street or
pedestrians in a crowd). Modern approaches to the counting problem involve the
production of a density map via regression whose integral is equal to the
number of objects in the image. However, objects in the image can occur at
different scales (e.g. due to perspective effects) which can make it difficult
for a learning agent to learn the proper density map. While the use of multiple
columns to extract multiscale information from images has been shown before,
our approach aggregates the multiscale information gathered by the multicolumn
convolutional neural network to improve performance. Our experiments show that
our proposed network outperforms the state-of-the-art on many benchmark
datasets, and also that using our aggregation module in combination with a
higher number of columns is beneficial for multiscale counting.