Published on Tue Oct 15 2019

Breadth-first, Depth-next Training of Random Forests

Andreea Anghel, Nikolas Ioannou, Thomas Parnell, Nikolaos Papandreou, Celestine Mendler-Dünner, Haris Pozidis

The proposed hybrid tree building algorithm for RF is implemented in the Snap Machine Learning framework. It speeds up the training of RFs by 7.8x on average when compared to state-of-the-art RF solvers.

0
0
0
Abstract

In this paper we analyze, evaluate, and improve the performance of training Random Forest (RF) models on modern CPU architectures. An exact, state-of-the-art binary decision tree building algorithm is used as the basis of this study. Firstly, we investigate the trade-offs between using different tree building algorithms, namely breadth-first-search (BFS) and depth-search-first (DFS). We design a novel, dynamic, hybrid BFS-DFS algorithm and demonstrate that it performs better than both BFS and DFS, and is more robust in the presence of workloads with different characteristics. Secondly, we identify CPU performance bottlenecks when generating trees using this approach, and propose optimizations to alleviate them. The proposed hybrid tree building algorithm for RF is implemented in the Snap Machine Learning framework, and speeds up the training of RFs by 7.8x on average when compared to state-of-the-art RF solvers (sklearn, H2O, and xgboost) on a range of datasets, RF configurations, and multi-core CPU architectures.

Tue May 14 2019
Machine Learning
Resource-aware Elastic Swap Random Forest for Evolving Data Streams
Adaptive Random Forest is a popular ensemble method used for continual learning. It combines adaptive leveraging bagging with fast random Hoeffding trees. The default ARF size provides competitive accuracy, but it is usually over-provisioned. This paper presents Elastic Swap Random Forest.
0
0
0
Wed Apr 26 2017
Machine Learning
Exploiting random projections and sparsity with random forests and gradient boosting methods -- Application to multi-label and multi-output learning, random forest model compression and leveraging input sparsity
Decision trees characterize the input-output relationship through a series of questions. The emergence of new applications requires scalable supervised learning algorithms.
0
0
0
Thu May 09 2019
Machine Learning
Two-stage Best-scored Random Forest for Large-scale Regression
The two-stage best-scored random forest (TBRF) is a novel method designed for large-scale regression problems. The pure randomness in TBRF leads to the almost optimal learning rates, and also makes ensemble learning possible.
0
0
0
Fri Feb 20 2015
Machine Learning
Feature-Budgeted Random Forest
We propose a novel random forest algorithm to minimize prediction error for a user-specified feature acquisition budget. We seek decision rules for prediction-time cost reduction, where complete data is available for training, but each feature can only be acquired for an additional cost.
0
0
0
Wed Apr 18 2018
Machine Learning
Exact Distributed Training: Random Forest with Billions of Examples
We introduce an exact distributed algorithm to train Random Forest models. We report its performances on artificial and real-world datasets of up to 18 billions examples. Given a dataset with 82 features, our implementation trains a tree in 22h.
0
0
0
Mon Sep 02 2019
Machine Learning
Guided Random Forest and its application to data approximation
Guided Random Forest (GRAF) extends the idea of building oblique decision trees with localized partitioning to obtain a global partitioning. GRAF yields comparable or better results on a majority of datasets.
0
0
0