Published on Mon May 24 2021

An In-Memory Analog Computing Co-Processor for Energy-Efficient CNN Inference on Mobile Devices

Mohammed Elbtity, Abhishek Singh, Brendan Reidy, Xiaochen Guo, Ramtin Zand

In this paper, we develop an in-memory analog computing (IMAC) architecture. IMAC can be used to realize a multilayer perceptron (MLP) classifier. CPU-IMAC architecture is proposed for convolutional neural networks (CNNs)

1
0
0
Abstract

In this paper, we develop an in-memory analog computing (IMAC) architecture realizing both synaptic behavior and activation functions within non-volatile memory arrays. Spin-orbit torque magnetoresistive random-access memory (SOT-MRAM) devices are leveraged to realize sigmoidal neurons as well as binarized synapses. First, it is shown the proposed IMAC architecture can be utilized to realize a multilayer perceptron (MLP) classifier achieving orders of magnitude performance improvement compared to previous mixed-signal and digital implementations. Next, a heterogeneous mixed-signal and mixed-precision CPU-IMAC architecture is proposed for convolutional neural networks (CNNs) inference on mobile processors, in which IMAC is designed as a co-processor to realize fully-connected (FC) layers whereas convolution layers are executed in CPU. Architecture-level analytical models are developed to evaluate the performance and energy consumption of the CPU-IMAC architecture. Simulation results exhibit 6.5% and 10% energy savings for CPU-IMAC based realizations of LeNet and VGG CNN models, for MNIST and CIFAR-10 pattern recognition tasks, respectively.

Tue Jul 06 2021
Machine Learning
CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and Precision-Programmable CNN Inference
A compact, accurate, and bitwidth-programmable in-memory computing (IMC) macro, named CAP-RAM, is presented for energy-efficient convolutional neural network (CNN) inference. The adopted semi-parallel architecture efficiently stores filters from multiple CNN layers.
1
0
0
Tue Apr 16 2019
Machine Learning
Processing-In-Memory Acceleration of Convolutional Neural Networks for Energy-Efficiency, and Power-Intermittency Resilience
A bit-wise Convolutional Neural Network (CNN) in-memory accelerator is implemented using Spin-Orbit Torque Magnetic Random Access Memory. It utilizes a novel AND-Accumulation method capable of significantly-reduced energy consumption within convolutional layers.
0
0
0
Mon Jun 22 2020
Neural Networks
Fully-parallel Convolutional Neural Network Hardware
Edge Artificial Intelligence or Edge Intelligence, is beginning to receive a tremendous amount of interest from the machine learning community. The architecture purposed can solve the difficult implementation challenges. Compared with traditional binary logic implementations, the SC-CNN architecture showed an improvement of 19.6x and 6.3x in terms of speed and energy efficiency.
0
0
0
Wed Oct 21 2020
Machine Learning
Ultra-low power on-chip learning of speech commands with phase-change memories
Embedding artificial intelligence at the edge (edge-AI) is an elegant solution to tackle the power and latency issues in the rapidly expanding Internet of Things. edge devices typically spend most of their time in sleep mode and only wake-up infrequently to process sensor data. Non-
0
0
0
Wed Sep 08 2021
Neural Networks
Resistive Neural Hardware Accelerators
Deep Neural Networks (DNNs) entail that real-world data can be learned and that decisions can be made in real-time. However, their wide adoption is hindered by a number of software and hardware limitations. The shift towards ReRAM-based in-memory computing has great potential.
1
4
6
Fri Mar 13 2020
Machine Learning
DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip Training
DNN+NeuroSim is an integrated framework to benchmark compute-in-memory (CIM)accelerators for deep neural networks. A python wrapper is developed to interface NeuroSim with a popular machine learning platform:Pytorch.
0
0
0
Tue Feb 09 2016
Machine Learning
Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time. BNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations.
0
0
0
Fri May 27 2016
Artificial Intelligence
TensorFlow: A system for large-scale machine learning
TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. It maps nodes of a dataflow graph across many machines in a cluster. TensorFlow supports a variety of applications, with particularly strong support for training and inference on deep neural networks.
0
0
0
Wed Mar 16 2016
Computer Vision
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
In Binary-Weight-Networks, the filters are approximated with binary values. This results in 32x memory saving and 58x faster convolutional operations. XNOR-Nets offer the possibility of running state-of-the-art networks in real-time.
0
0
0
Mon Sep 16 2019
Neural Networks
High-Throughput In-Memory Computing for Binary Deep Neural Networks with Monolithically Integrated RRAM and 90nm CMOS
Deep learning hardware designs have been bottlenecked by conventional SRAM due to density, leakage and parallel computing challenges. We integrated a 128x64 RRAM array with CMOS peripheral circuits including row/column decoders and flash analog-to-digital converters. Prototype chip measurements show that the proposed design achieves high binary DNN accuracy of 98.5%.
0
0
0
Unknown
other
Spin-Torque Switching with the Giant Spin Hall Effect of Tantalum
0
0
0
Unknown
other
Mixed-precision in-memory computing
0
0
0