Published on Mon Feb 17 2020

STANNIS: Low-Power Acceleration of Deep Neural Network Training Using Computational Storage

Ali HeydariGorji, Mahdi Torabzadehkashi, Siavash Rezaei, Hossein Bobarshad, Vladimir Alves, Pai H. Chou

This paper proposes a framework for distributed, in-storage training of neurological networks. Such devices not only contain hardware accelerators but also eliminate data movement between the host and storage. Experimental results show up to 2.7x speedup and 69% reduction in energy consumption.

1
0
1
Abstract

This paper proposes a framework for distributed, in-storage training of neural networks on clusters of computational storage devices. Such devices not only contain hardware accelerators but also eliminate data movement between the host and storage, resulting in both improved performance and power savings. More importantly, this in-storage processing style of training ensures that private data never leaves the storage while fully controlling the sharing of public data. Experimental results show up to 2.7x speedup and 69% reduction in energy consumption and no significant loss in accuracy.

Thu Jul 16 2020
Machine Learning
HyperTune: Dynamic Hyperparameter Tuning For Efficient Distribution of DNN Training Over Heterogeneous Systems
Distributed training is a novel approach to accelerate Deep Neural Networks(DNN) training. Stannis is a DNN training framework that improves on the shortcomings of existing distributed training frameworks. Experimental results show up to 3.1x improvement in performance and 2.45x reduction in energy consumption.
0
0
0
Fri Nov 01 2019
Machine Learning
Progressive Compressed Records: Taking a Byte out of Deep Learning Data
Deep learning accelerators efficiently train over vast and growing amounts of data. A common approach to conserve bandwidth involves resizing or compressing data before training. We introduce Progressive Compressed Records (PCRs), a data format that uses compression to reduce the overhead of fetching and transporting data.
0
0
0
Wed Feb 10 2021
Artificial Intelligence
Hybrid In-memory Computing Architecture for the Training of Deep Neural Networks
The cost involved in training deep neural networks (DNNs) has motivated the development of novel solutions for efficient DNN training accelerators. We propose a hybrid in-memory computing (HIC) architecture for the training of DNNs on hardware accelerators that results in memory-efficient inference and outperforms software accuracy.
0
0
0
Mon Feb 18 2019
Neural Networks
Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning
Deep learning (DL) models and datasets to train deep learning models scale. The limited physical memory inside the accelerator device constrains the algorithm that can be studied. Our proposal aggregates a pool of memory modules locally within the device-side interconnect.
0
0
0
Wed Oct 10 2018
Machine Learning
LIRS: Enabling efficient machine learning on NVM-based storage via a lightweight implementation of random shuffling
Machine learning algorithms, such as Support Vector Machine (SVM) and Deep Neural Network (DNN), have gained a lot of interests recently. When training a machine learning algorithm, randomly shuffle all the training data can improve the testing accuracy and boost the convergence rate. However, realizing training
0
0
0
Wed Oct 12 2016
Neural Networks
Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs
Deep Convolutional Neural Networks (CNNs) achieve state-of-the-art recognition accuracy. Due to the substantial compute and memory operations, they require significant execution time. The massive parallel computing capability of GPUs make them as one of the ideal platforms to accelerate CNNs.
0
0
0
Fri May 27 2016
Artificial Intelligence
TensorFlow: A system for large-scale machine learning
TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. It maps nodes of a dataflow graph across many machines in a cluster. TensorFlow supports a variety of applications, with particularly strong support for training and inference on deep neural networks.
0
0
0
Thu Dec 10 2015
Computer Vision
Deep Residual Learning for Image Recognition
Deeper neural networks are more difficult to train. We present a residual purposefullylearning framework to ease the training of networks that are substantially deeper than those used previously. An ensemble of these residual nets achieves a 3.57% error on the ImageNet test set.
1
0
0
Wed Oct 19 2011
Machine Learning
A Reliable Effective Terascale Linear Learning System
We present a system and a set of techniques for learning linear predictors with convex losses on terascale datasets. The result is, up to our knowledge, the most scalable and efficient linear learning system reported in the literature.
0
0
0
Thu Feb 15 2018
Machine Learning
Horovod: fast and easy distributed deep learning in TensorFlow
Training modern deep learning models requires large amounts of computation. Scaling computation from one GPU to many can enable much faster training and research progress. Existing methods for enabling multi-GPU training under the TensorFlow library may entail non-negligible communication overhead.
0
0
0
Thu Jun 08 2017
Machine Learning
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Deep learning thrives with large neural networks and large datasets. Larger networks and larger datasets result in longer training times. Distributed synchronous SGD offers a potential solution to this problem.
0
0
0
Wed Jul 24 2019
Machine Learning
Benchmarking TPU, GPU, and CPU Platforms for Deep Learning
Training deep learning models is compute-intensive and there is an industry-wide trend towards hardware specialization to improve performance. We introduce ParaDnn, aparameterized benchmark suite for deep learning that generates end-to-end models for fully connected (FC), convolutional (CNN),
0
0
0