Published on Fri Feb 15 2019

Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization

Hesham Mostafa, Xin Wang

Modern deep neural networks are typically highly overparameterized. Pruning techniques are able to remove a significant fraction of network parameters with little loss in accuracy. Recently, techniques based on dynamic reallocation of non-zero parameters have emerged.

0
0
0
Abstract

Modern deep neural networks are typically highly overparameterized. Pruning techniques are able to remove a significant fraction of network parameters with little loss in accuracy. Recently, techniques based on dynamic reallocation of non-zero parameters have emerged, allowing direct training of sparse networks without having to pre-train a large dense model. Here we present a novel dynamic sparse reparameterization method that addresses the limitations of previous techniques such as high computational cost and the need for manual configuration of the number of free parameters allocated to each layer. We evaluate the performance of dynamic reallocation methods in training deep convolutional networks and show that our method outperforms previous static and dynamic reparameterization methods, yielding the best accuracy for a fixed parameter budget, on par with accuracies obtained by iteratively pruning a pre-trained dense model. We further investigated the mechanisms underlying the superior generalization performance of the resultant sparse networks. We found that neither the structure, nor the initialization of the non-zero parameters were sufficient to explain the superior performance. Rather, effective learning crucially depended on the continuous exploration of the sparse network structure space during training. Our work suggests that exploring structural degrees of freedom during training is more effective than adding extra parameters to the network.

Thu Mar 11 2021
Neural Networks
Emerging Paradigms of Neural Network Pruning
0
0
0
Thu May 14 2020
Machine Learning
Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers
Dynamic Sparse Training can jointly find the optimal network parameters and sparse network structure. It can have fine-grained layer-wise adjustments dynamically via backpropagation. The algorithm can train very sparse neural network models with little performance loss.
0
0
0
Wed Jul 05 2017
Neural Networks
Data-Driven Sparse Structure Selection for Deep Neural Networks
Deep convolutional neural networks have liberated its extraordinary power on various tasks. However, it is still very challenging to deploy state-of-the-art models into real-world applications due to their high computational complexity. In this paper, we propose a simple and effective framework to
0
0
0
Tue Dec 10 2019
Machine Learning
Winning the Lottery with Continuous Sparsification
The recent Lottery Ticket Hypothesis conjectures that, for a typically-sized neural network, it is possible to find small sub-networks that match the performance of the original dense counterpart. We revisit fundamental aspects of pruning and point out missing ingredients in previous approaches. We then
1
0
0
Tue Jun 25 2019
Machine Learning
The Difficulty of Training Sparse Neural Networks
Recent work has shown that sparse ResNet-50 architectures converge to solutions that are significantly worse than those found by pruning. Findings suggest traversing extra dimensions may be needed to escape stationary points found in the sparse subspace.
0
0
0
Thu Feb 04 2021
Machine Learning
Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training
In this paper, we introduce a new perspective on training deep neural networks capable of state-of-the-art performance without the need for the expensive over-parameterization. We propose the concept of In-Time Over-Parameterization (ITOP) in sparse training.
2
0
0