Published on Tue Jun 22 2021

Revisiting Deep Learning Models for Tabular Data

Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, Artem Babenko

The choice between GBDT and DL models highly depends on data and there is still no universally superior solution. We demonstrate that a simple ResNet-like architecture is a surprisingly effective baseline, which outperforms most of the sophisticated models.


The necessity of deep learning for tabular data is still an unanswered question addressed by a large number of research efforts. The recent literature on tabular DL proposes several deep architectures reported to be superior to traditional "shallow" models like Gradient Boosted Decision Trees. However, since existing works often use different benchmarks and tuning protocols, it is unclear if the proposed models universally outperform GBDT. Moreover, the models are often not compared to each other, therefore, it is challenging to identify the best deep model for practitioners. In this work, we start from a thorough review of the main families of DL models recently developed for tabular data. We carefully tune and evaluate them on a wide range of datasets and reveal two significant findings. First, we show that the choice between GBDT and DL models highly depends on data and there is still no universally superior solution. Second, we demonstrate that a simple ResNet-like architecture is a surprisingly effective baseline, which outperforms most of the sophisticated models from the DL literature. Finally, we design a simple adaptation of the Transformer architecture for tabular data that becomes a new strong DL baseline and reduces the gap between GBDT and DL models on datasets where GBDT dominates.

Fri Sep 13 2019
Machine Learning
Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data
Neural Oblivious Decision Ensembles (NODE) is a new deep learning architecture designed to work with any tabular data. NODE benefits from both end-to-end gradient-based optimization and the power of multi-layer hierarchical representation learning.
Fri Aug 06 2021
Machine Learning
Simple Modifications to Improve Tabular Neural Networks
There is growing interest in neural network architectures for tabular data. This paper focuses on several such models, and proposes modifications for improving their performance. When modified, these models are shown to be competitive with leading general-purpose tabular models, including GBDTs.
Tue Aug 20 2019
Machine Learning
TabNet: Attentive Interpretable Tabular Learning
TabNet is a novel high-performance and interpretable canonical deep tabular data learning architecture. TabNet uses sequential attention to choose which features to reason from at each decision step.
Wed Jun 02 2021
Machine Learning
SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training
Tabular data underpins numerous high-impact applications of machine learning. Recent deep learning methods have achieved a degree of performance competitive with popular techniques. We devise a hybrid deep learning approach to solving tabular data problems.
Thu Jun 11 2020
Machine Learning
DNF-Net: A Neural Architecture for Tabular Data
A challenging open question in deep learning is how to handle tabular data. We present a novel generic architecture whose inductive bias elicits models whose structure corresponds to logical Boolean formulas. DNF-Net promotes localized decisions that are taken over small subsets of the features.
Wed May 16 2018
Machine Learning
Regularization Learning Networks: Deep Learning for Tabular Datasets
Deep Neural Networks (DNNs) typically underperform Gradient Boosting Trees (GBTs) on many tabular-dataset learning tasks. Regularization Learning Networks (RLNs) overcome this challenge by introducing an efficient hyperparameter tuning scheme.
Wed Feb 11 2015
Machine Learning
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalized achieves the same
Wed Oct 09 2019
HuggingFace's Transformers: State-of-the-art Natural Language Processing
Transformers is an open-source library with the goal of opening up machine learning advances to the wider machine learning community. The library consists of carefully engineered state-of-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrained models made
Mon Jun 12 2017
Attention Is All You Need
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms. Experiments on two machine translation tasks show these models to be superior in
Wed Feb 12 2020
Machine Learning
GLU Variants Improve Transformer
Gated Linear Units (arXiv:1612.08083) consist of the component-wise product of two linear projections, one of which is first passed through a sigmoid function. Variations on GLU are possible, using different nonlinear (or even horriblylinear)
Tue Feb 23 2021
Machine Learning
Do Transformer Modifications Transfer Across Implementations and Applications?
The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago. In this paper, we comprehensively evaluate many of these modifications in a shared experimental setting. We conjecture that performance improvements may strongly depend on implementation details.
Tue Apr 20 2021
Machine Learning
Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020
This paper presents the results and insights from the black-box optimization(BBO) challenge at NeurIPS 2020 which ran from July-October, 2020. The challenge emphasized the importance of evaluating derivative-free optimizers for tuning the hyperparameters of machine learning models.