Published on Wed Oct 29 2014

High-Performance Distributed ML at Scale through Parameter Server Consistency Models

Wei Dai, Abhimanu Kumar, Jinliang Wei, Qirong Ho, Garth Gibson, Eric P. Xing

Machine Learning (ML) applications increase in data size and model complexity. Effective use of clusters requires considerable expertise in writing distributed code. The Parameter Server paradigm is a middle ground between these extremes.

0
0
0
Abstract

As Machine Learning (ML) applications increase in data size and model complexity, practitioners turn to distributed clusters to satisfy the increased computational and memory demands. Unfortunately, effective use of clusters for ML requires considerable expertise in writing distributed code, while highly-abstracted frameworks like Hadoop have not, in practice, approached the performance seen in specialized ML implementations. The recent Parameter Server (PS) paradigm is a middle ground between these extremes, allowing easy conversion of single-machine parallel ML applications into distributed ones, while maintaining high throughput through relaxed "consistency models" that allow inconsistent parameter reads. However, due to insufficient theoretical study, it is not clear which of these consistency models can really ensure correct ML algorithm output; at the same time, there remain many theoretically-motivated but undiscovered opportunities to maximize computational throughput. Motivated by this challenge, we study both the theoretical guarantees and empirical behavior of iterative-convergent ML algorithms in existing PS consistency models. We then use the gleaned insights to improve a consistency model using an "eager" PS communication mechanism, and implement it as a new PS system that enables ML algorithms to reach their solution more quickly.

Mon Dec 30 2013
Machine Learning
Petuum: A New Platform for Distributed Machine Learning on Big Data
Modern ML strategies employ fine-grained operations and scheduling. The variety of approaches tends to pull systems and algorithms in different directions. It remains difficult to find a universal platform applicable to a wide range of ML programs at scale.
0
0
0
Mon Feb 03 2020
Machine Learning
Dynamic Parameter Allocation in Parameter Servers
distributed machine learning algorithms use techniques to increase parameter access locality (PAL), achieving up to linear speed-ups. We found that existing parameter servers provide only limited support for PAL techniques, and therefore prevent efficient training. We propose to integrate dynamic parameter allocation into parameter servers.
0
0
0
Thu Dec 31 2015
Machine Learning
Strategies and Principles of Distributed Machine Learning on Big Data
The rise of Big Data has led to new demands for Machine Learning (ML) systems. In order to run ML algorithms on a distributed cluster with 10s to 1000s of machines, significant engineering efforts are required. We discuss a series of principles and strategies from our recent efforts on industrial-scale ML solutions.
0
0
0
Mon Oct 08 2018
Machine Learning
Toward Understanding the Impact of Staleness in Distributed Machine Learning
distributed machine learning (ML) systems adopt the non-synchronous execution in order to alleviate the network communication bottleneck. This results in stale parameters that do not reflect the latest updates. Despite muchDevelopments in large-scale ML, the effects of staleness on learning are inconclusive.
0
0
0
Wed Dec 09 2015
Machine Learning
Efficient Distributed SGD with Variance Reduction
Stochastic Gradient Descent (SGD) has become one of the most popular optimization methods for training machine learning models on massive datasets. SGD suffers from two main drawbacks: The noisy gradient updates have high variance, which slows down convergence as the iterates approach the optimum.
0
0
0
Tue Aug 04 2015
Machine Learning
Parameter Database : Data-centric Synchronization for Scalable Machine Learning
We propose a new data-centric synchronization framework for carrying out ML tasks in a distributed environment. Our framework exploits the iterative nature of ML algorithms and relaxes the application agnostic bulk synchronization parallel (BSP) paradigm.
0
0
0