Published on Wed Dec 04 2013

Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians

Constantinos Daskalakis, Gautam Kamath

We provide an algorithm for properly learning mixtures of two single-dimensional Gaussians without any separability assumptions. Our sample complexity is optimal up to optimal logarithmic factors. Our algorithm significantly improves upon both Kalai et. al., whose algorithm has a prohibitive dependence on $1/\varepsilon.

0
0
0
Abstract

We provide an algorithm for properly learning mixtures of two single-dimensional Gaussians without any separability assumptions. Given samples from an unknown mixture, our algorithm outputs a mixture that is -close in total variation distance, in time . Our sample complexity is optimal up to logarithmic factors, and significantly improves upon both Kalai et al., whose algorithm has a prohibitive dependence on , and Feldman et al., whose algorithm requires bounds on the mixture parameters and depends pseudo-polynomially in these parameters. One of our main contributions is an improved and generalized algorithm for selecting a good candidate distribution from among competing hypotheses. Namely, given a collection of hypotheses containing at least one candidate that is -close to an unknown distribution, our algorithm outputs a candidate which is -close to the distribution. The algorithm requires samples from the unknown distribution and time, which improves previous such results (such as the Scheff\'e estimator) from a quadratic dependence of the running time on to quasilinear. Given the wide use of such results for the purpose of hypothesis selection, our improved algorithm implies immediate improvements to any such use.

Sat Apr 19 2014
Machine Learning
Tight bounds for learning a mixture of two gaussians
We consider the problem of identifying the parameters of an unknown mixture of two arbitrary -dimensional gaussians from a sequence of independent random samples. Our main results are upper and lower bounds giving a computationally efficient moment-based estimator with an optimal convergence rate.
0
0
0
Fri Apr 23 2010
Machine Learning
Settling the Polynomial Learnability of Mixtures of Gaussians
Algorithm is based on work Kalai et. al. [STOC 2010] that gives an efficient algorithm for learning mixtures of two Gaussians. Algorithm employs hierarchical clustering and rescaling, with delicate methods for backtracking.
0
0
0
Thu Sep 01 2016
Machine Learning
Ten Steps of EM Suffice for Mixtures of Two Gaussians
The Expectation-Maximization (EM) algorithm is a widely used method formaximum likelihood estimation in models with latent variables. We show that the population version of EM, where the algorithm is given access to infinitely many samples, converges to the correct mean vectors.
0
0
0
Sat Oct 14 2017
Machine Learning
Near-optimal Sample Complexity Bounds for Robust Learning of Gaussians Mixtures via Compression Schemes
We prove that Gaussians can be learned with few samples. This improves both the known upper bounds and lower bounds for this problem. Any class of distributions that allows such a compression scheme can also be learned.
1
0
0
Sun Jul 12 2020
Machine Learning
Robust Learning of Mixtures of Gaussians
We resolve one of the major outstanding problems in robust statistics. In particular, if is an evenly weighted mixture of two arbitrary-dimensional Gaussians, we devise a polynomial time algorithm.
0
0
0
Wed Feb 19 2014
Machine Learning
Near-optimal-sample estimators for spherical Gaussian mixtures
We provide the first sample-efficient polynomial-time estimator for high-dimensional spherical Gaussian mixtures. The constant factor is polynomial for sample complexity and exponential for time complexity.
0
0
0