Published on Mon Nov 07 2016

Trusting SVM for Piecewise Linear CNNs

Leonard Berrada, Andrew Zisserman, M. Pawan Kumar

We present a novel layerwise optimization algorithm for the learning objective of Piecewise-Linear Convolutional Neural Networks. The key observation of our approach is that the problem corresponding to the parameter estimation of a layer can be formulated as a difference-of-convex (DC) program.

0
0
0
Abstract

We present a novel layerwise optimization algorithm for the learning objective of Piecewise-Linear Convolutional Neural Networks (PL-CNNs), a large class of convolutional neural networks. Specifically, PL-CNNs employ piecewise linear non-linearities such as the commonly used ReLU and max-pool, and an SVM classifier as the final layer. The key observation of our approach is that the problem corresponding to the parameter estimation of a layer can be formulated as a difference-of-convex (DC) program, which happens to be a latent structured SVM. We optimize the DC program using the concave-convex procedure, which requires us to iteratively solve a structured SVM problem. This allows to design an optimization algorithm with an optimal learning rate that does not require any tuning. Using the MNIST, CIFAR and ImageNet data sets, we show that our approach always improves over the state of the art variants of backpropagation and scales to large data and large network settings.

Sun Sep 04 2016
Machine Learning
Convexified Convolutional Neural Networks
We describe the class of convexified convolutional neural networks (CCNNs) CCNNs capture the parameter sharing of convolutionAL neural networks in a convex manner. For learning two-layer CNNs, we prove that generalization error converges to that of
0
0
0
Sat Dec 29 2018
Machine Learning
Greedy Layerwise Learning Can Scale to ImageNet
Shallow supervised 1-hidden layer neural networks have a number of favorable properties that make them easier to interpret, analyze, and optimize. Here we use 1- hidden layer learning problems to sequentially build deep networks layer by layer.
0
0
0
Mon Nov 19 2018
Machine Learning
Deep Frank-Wolfe For Neural Network Optimization
The current practice in neural network optimization is to rely on the stochastic gradient descent (SGD) algorithm. We present an optimization method that offers empirically the best of both worlds. Our algorithm yields good generalization performance while requiring only one hyper-parameter.
0
0
0
Mon Feb 24 2020
Machine Learning
Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-layer Networks
We develop exact representations of training two-layer neural networks with rectified linear units (ReLUs) in terms of a single convex program. We show that ReLU networks trained with standard weight decay are equivalent to penalized convex models.
0
0
0
Tue Aug 15 2017
Computer Vision
Improved Regularization of Convolutional Neural Networks with Cutout
Convolutional neural networks are capable of learning powerful representational spaces. However, due to the model capacity required to capture such representations, they are often susceptible to overfitting. A simple regularization technique called cutout can be used to improve performance.
0
0
0
Mon Dec 21 2020
Machine Learning
LQF: Linear Quadratic Fine-Tuning
Classifiers that are linear in their parameters, and trained by optimizing aconvex loss function, have predictable behavior with respect to changes in the training data, initial conditions, and optimization. Such desirable properties are absent in deep neural networks (DNNs) typically trained by non-linear
0
0
0