Published on Tue Aug 18 2015

ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R

Marvin N. Wright, Andreas Ziegler

Ranger is a C++ application and R package. The software is a fast implementation of random forests for high dimensional data. Ensembles of classify, regression and survival trees are supported.

0
0
0
Abstract

We introduce the C++ application and R package ranger. The software is a fast implementation of random forests for high dimensional data. Ensembles of classification, regression and survival trees are supported. We describe the implementation, provide examples, validate the package with a reference implementation, and compare runtime and memory usage with other implementations. The new software proves to scale best with the number of features, samples, trees, and features tried for splitting. Finally, we show that ranger is the fastest and most memory efficient implementation of random forests to analyze data on the scale of a genome-wide association study.

Fri Nov 21 2008
Machine Learning
Random Forests: some methodological insights
Random forests is an increasingly used statistical method for classification and regression problems. It was introduced by Leo Breiman in 2001. The strategy involves a ranking of explanatory variables using the random forests score of importance.
0
0
0
Wed Oct 05 2016
Machine Learning
Generalized Random Forests
generalized random forests can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Instead of using classical kernel weighting functions that are prone to a strong curse of dimensionality, we use an adaptive weighting function from a forest designed to express heterogeneity.
0
0
0
Mon Apr 16 2018
Machine Learning
RFCDE: Random Forests for Conditional Density Estimation
Random forests is a common non-parametric regression technique which performs well for mixed-type data and irrelevant covariates. RFCDE is released under the MIT open-source license.
1
0
0
Thu Nov 15 2007
Machine Learning
Variable importance in binary regression trees and forests
We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree.
0
0
0
Mon Jun 12 2017
Machine Learning
Random Forests, Decision Trees, and Categorical Predictors: The "Absent Levels" Problem
"absent levels" occurs when there is an indeterminacy over how to handle a categorical split. This problem has never been thoroughly discussed, and its consequences have never been carefully explored. We examine how overlooking the absent levelsproblem can systematically bias a model.
0
0
0
Tue Feb 07 2012
Machine Learning
Information Forests
We describe Information Forests, an approach to classification that generalizes Random Forests. The basic idea consists of deferring classification until a measure of "classification confidence" is sufficiently high. We instead break down the data so as to maximize this measure.
0
0
0