Published on Mon Feb 27 2017

Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis

Dan Garber, Ohad Shamir, Nathan Srebro

Principal Component Analysis (PCA) is a technique for estimating the leading principal component of the population covariance matrix. PCA can be done in a distributed setting in which each machine out of stores asample of points.

0
0
0
Abstract

We study the fundamental problem of Principal Component Analysis in a statistical distributed setting in which each machine out of stores a sample of points sampled i.i.d. from a single unknown distribution. We study algorithms for estimating the leading principal component of the population covariance matrix that are both communication-efficient and achieve estimation error of the order of the centralized ERM solution that uses all samples. On the negative side, we show that in contrast to results obtained for distributed estimation under convexity assumptions, for the PCA objective, simply averaging the local ERM solutions cannot guarantee error that is consistent with the centralized ERM. We show that this unfortunate phenomena can be remedied by performing a simple correction step which correlates between the individual solutions, and provides an estimator that is consistent with the centralized ERM for sufficiently-large . We also introduce an iterative distributed algorithm that is applicable in any regime of , which is based on distributed matrix-vector products. The algorithm gives significant acceleration in terms of communication rounds over previous distributed algorithms, in a wide regime of parameters.

Mon May 05 2014
Machine Learning
Optimality guarantees for distributed statistical estimation
Large data sets often require performing distributed statistical estimation. We define and study some refinements of the classical minimax risk that apply to distributed settings. We establish lower bounds for a variety of problems, including location estimation in several families.
0
0
0
Sat Sep 05 2020
Machine Learning
Communication-efficient distributed eigenspace estimation
Distributed computing is a standard way to scale up machine learning and data science algorithms. In such settings, avoiding communication amongst machines is paramount for achieving high performance. Here, we develop a communication-efficient distributed algorithm for computing the leading invariant subspace of a data matrix.
0
0
0
Mon Dec 30 2013
Machine Learning
Communication Efficient Distributed Optimization using an Approximate Newton-type Method
We present a novel Newton-type method for distributed optimization. The method is particularly well suited for stochastic optimization and learning problems. For quadratic objectives, the method enjoys a linear rate of convergence.
0
0
0
Fri Aug 27 2021
Machine Learning
FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component Analysis
Principal Component Analysis (PCA) is a fundamental data preprocessing tool in machine learning. The enormity of the dimensions and sample size have rendered centralized PCA solutions unusable. This paper proposes a distributed PCA algorithm called FAST-PCA.
0
0
0
Wed Sep 19 2012
Machine Learning
Comunication-Efficient Algorithms for Statistical Optimization
We analyze two communication-efficient algorithms for distributed statistical optimization on large-scale data sets. The first algorithm is a standard averaging method that distributes the data samples evenly. The second algorithm is based on an appropriate form of bootstrap subsampling. We show that our methods can be used to efficiently solve an advertisement prediction problem from the Chinese SoSo Search Engine.
0
0
0
Fri Jan 17 2020
Machine Learning
Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates
A novel method is proposed to obtain an asymptotically efficient estimator for large-scale distributed data sets. In this novel method, the assumption on the number of servers is more relaxed and thus practical for real-world applications.
0
0
0