Published on Wed Mar 15 2017

Deep Embedding Forest: Forest-based Serving with Deep Embedding Features

Jie Zhu, Ying Shan, JC Mao, Dong Yu, Holakou Rahmanian, Yi Zhang

Deep Neural Networks (DNN) have demonstrated superior ability to extract high level embedding vectors from low level features. Despite the success, the serving time is still the bottleneck due to expensive run-time computation of multiple layers of dense matrices. This work proposes a Deep Embedding Forest model that benefits from the best of both

0
0
0
Abstract

Deep Neural Networks (DNN) have demonstrated superior ability to extract high level embedding vectors from low level features. Despite the success, the serving time is still the bottleneck due to expensive run-time computation of multiple layers of dense matrices. GPGPU, FPGA, or ASIC-based serving systems require additional hardware that are not in the mainstream design of most commercial applications. In contrast, tree or forest-based models are widely adopted because of low serving cost, but heavily depend on carefully engineered features. This work proposes a Deep Embedding Forest model that benefits from the best of both worlds. The model consists of a number of embedding layers and a forest/tree layer. The former maps high dimensional (hundreds of thousands to millions) and heterogeneous low-level features to the lower dimensional (thousands) vectors, and the latter ensures fast serving. Built on top of a representative DNN model called Deep Crossing, and two forest/tree-based models including XGBoost and LightGBM, a two-step Deep Embedding Forest algorithm is demonstrated to achieve on-par or slightly better performance as compared with the DNN counterpart, with only a fraction of serving time on conventional hardware. After comparing with a joint optimization algorithm called partial fuzzification, also proposed in this paper, it is concluded that the two-step Deep Embedding Forest has achieved near optimal performance. Experiments based on large scale data sets (up to 1 billion samples) from a major sponsored search engine proves the efficacy of the proposed model.

Wed Aug 04 2021
Artificial Intelligence
Random Offset Block Embedding Array (ROBE) for CriteoTB Benchmark MLPerf DLRM Model : 1000 Compression and 2.7 Faster Inference
State-of-the-art recommendation models are one of the largest models rivalling the likes of GPT-3 and Switch Transformer. Challenges in deep learning recommendation models (DLRM) stem from dense embeddings for each of the categorical values.
1
6
17
Mon Jun 10 2019
Machine Learning
BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget
BlockSwap is a fast algorithm for choosing networks with interleaved block types. It works by passing a single minibatch of training data through randomly initialised networks. These networks can then be used as students and distilled with the original network as a teacher.
0
0
0
Thu Mar 07 2019
Machine Learning
SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems
Deep Learning (DL) algorithms are the central focus of modern machine learning systems. SLIDE (Sub-LInear Deep learning Engine) blends smart randomized algorithms, with multi-coreparalleparallelism and workload optimization. On the same CPU hardware, SLIDE is over 10x faster than TF.
1
0
1
Mon Oct 28 2019
Machine Learning
Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products
Merged-Average Classifiers via Hashing (MACH) is a generic K-classification algorithm where memory scales at O(logK) MACH is subtly a count-min sketch structure in disguise, which uses universal hashing to reduce classification with a large number of classes. Our largest model has 6.4 billion parameters and trains in less than 35 hours on a single p3.16
0
0
0
Mon Jan 25 2021
Machine Learning
TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models
The memory capacity of embedding tables in deep learning recommendation models (DLRMs) is increasing dramatically from tens of GBs to TBs across the industry. Novel solutions are urgently needed to enable fast and efficient DLRM innovations. At the same time, this must be done without having to exponentially increase infrastructure capacity.
0
0
0
Fri Nov 16 2018
Computer Vision
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
GPipe is a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers. GPipe utilizes a novel batch-splitting pipelining algorithm, resulting in almost linear speedup when a model is partitioned across multiple accelerators.
1
0
0