Published on Sun Jan 06 2019

Solving large-scale L1-regularized SVMs and cousins: the surprising effectiveness of column and constraint generation

Antoine Dedieu, Rahul Mazumder

The linear Support Vector Machine (SVM) is one of the most popular binary classification techniques in machine learning. The current state of the art for solving penalized SVM problems is rather nascent when compared to the usual L2-regularized linear SVM. To this end, we propose new computational algorithms for these LPs.

0
0
0
Abstract

The linear Support Vector Machine (SVM) is one of the most popular binary classification techniques in machine learning. Motivated by applications in modern high dimensional statistics, we consider penalized SVM problems involving the minimization of a hinge-loss function with a convex sparsity-inducing regularizer such as: the L1-norm on the coefficients, its grouped generalization and the sorted L1-penalty (aka Slope). Each problem can be expressed as a Linear Program (LP) and is computationally challenging when the number of features and/or samples is large -- the current state of algorithms for these problems is rather nascent when compared to the usual L2-regularized linear SVM. To this end, we propose new computational algorithms for these LPs by bringing together techniques from (a) classical column (and constraint) generation methods and (b) first order methods for non-smooth convex optimization - techniques that are rarely used together for solving large scale LPs. These components have their respective strengths; and while they are found to be useful as separate entities, they have not been used together in the context of solving large scale LPs such as the ones studied herein. Our approach complements the strengths of (a) and (b) --- leading to a scheme that seems to outperform commercial solvers as well as specialized implementations for these problems by orders of magnitude. We present numerical results on a series of real and synthetic datasets demonstrating the surprising effectiveness of classic column/constraint generation methods in the context of challenging LP-based machine learning tasks.

Wed Apr 03 2013
Artificial Intelligence
A Novel Frank-Wolfe Algorithm. Analysis and Applications to Large-Scale SVM Training
The Frank-Wolfe (FW) method has been successfully applied to train large-scale instances of non-linear Support Vector Machines (SVMs) This paper presents a novel variant of the FW method based on a new way to perform away steps.
0
0
0
Fri Jan 17 2020
Machine Learning
Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives
Recent work has shown that mixed integer programming (MIP) can be used to solve (to optimality) -regularized regression problems much larger than what was conventionally considered possible. MIP-based global optimization approaches are significantly slower compared to the relatively mature algorithms for nonconvex regularized problems.
0
0
0
Tue Sep 21 2010
Machine Learning
Safe Feature Elimination for the LASSO and Sparse Supervised Learning Problems
We describe a fast method to eliminate features (variables) in l1 -penalized least-square regression (or LASSO) problems. The elimination of features leads to a potentially substantial reduction in running time, specially for large values of the penalty parameter.
0
0
0
Thu Oct 03 2019
Machine Learning
A sparse semismooth Newton based augmented Lagrangian method for large-scale support vector machines
Support vector machines (SVMs) are successful modeling and prediction tools with a variety of applications. Previous work has demonstrated the superiority of the SVMs in dealing with the high dimensional, low sample size problems.
0
0
0
Sun Nov 16 2014
Machine Learning
HIPAD - A Hybrid Interior-Point Alternating Direction algorithm for knowledge-based SVM and feature selection
We propose a new hybrid optimization algorithm that solves theastic-net support vector machine (SVM) We demonstrate the effectiveness and efficiency of our algorithm and compare it with existing methods on a collection of synthetic and real-world data.
0
0
0
Tue Jul 26 2011
Artificial Intelligence
Submodular Optimization for Efficient Semi-supervised Support Vector Machines
In this work we present a quadratic programming approximation of the Semi-Supervised Support Vector Machine (S3VM) problem. We prove that this approximate formulation establishes a relation between the low density separation and the graph-based models of semi-supervised learning.
0
0
0