Published on Wed Nov 25 2020

Supercharging Imbalanced Data Learning With Energy-based Contrastive Representation Transfer

Zidi Xiu, Junya Chen, Ricardo Henao, Benjamin Goldstein, Lawrence Carin, Chenyang Tao

Dealing with severe class imbalance poses a major challenge for real-world applications. Our proposal posits a meta-distributional scenario, where the data generating mechanism is invariant across the label-conditional feature distributions. Such causal assumption enables efficient knowledge transfer from dominant classes

0
0
0
Abstract

Dealing with severe class imbalance poses a major challenge for real-world applications, especially when the accurate classification and generalization of minority classes is of primary interest. In computer vision, learning from long tailed datasets is a recurring theme, especially for natural image datasets. While existing solutions mostly appeal to sampling or weighting adjustments to alleviate the pathological imbalance, or imposing inductive bias to prioritize non-spurious associations, we take novel perspectives to promote sample efficiency and model generalization based on the invariance principles of causality. Our proposal posits a meta-distributional scenario, where the data generating mechanism is invariant across the label-conditional feature distributions. Such causal assumption enables efficient knowledge transfer from the dominant classes to their under-represented counterparts, even if the respective feature distributions show apparent disparities. This allows us to leverage a causal data inflation procedure to enlarge the representation of minority classes. Our development is orthogonal to the existing extreme classification techniques thus can be seamlessly integrated. The utility of our proposal is validated with an extensive set of synthetic and real-world computer vision tasks against SOTA solutions.

Thu Feb 20 2020
Computer Vision
A survey on Semi-, Self- and Unsupervised Learning for Image Classification
Deep learning strategies rely heavily on labeled data. In many real-world problems, it is not feasible to create such a large amount of labeled training data. It is common to incorporate unlabeled data into the training process to reach equal results.
0
0
0
Mon Jul 06 2020
Machine Learning
Learning the Prediction Distribution for Semi-Supervised Learning with Normalising Flows
As data volumes continue to grow, the labelling process increasingly becomes a bottleneck. Impressive results have been achieved in semi-supervised learning (SSL) for image classification.
0
0
0
Wed Mar 03 2021
Machine Learning
Domain Generalization in Vision: A Survey
Domain generalization (DG)aims to achieve OOD generalization by using only source data for model learning. Since first introduced in 2011, research in DG has made great progresses. DG has covered various vision applications such as object recognition and action recognition.
0
0
0
Mon Oct 05 2020
Machine Learning
Conditional Negative Sampling for Contrastive Learning of Visual Representations
0
0
0
Fri Sep 04 2020
Computer Vision
Imbalanced Image Classification with Complement Cross Entropy
Deep learning models have achieved great success in computer vision applications. However, imbalanced class distributions still limit the wide applicability of these models. To solve this problem, in this paper, we concentrate on the study of cross entropy.
0
0
0
Wed Oct 01 2014
Machine Learning
Learning to Transfer Privileged Information
We introduce a learning framework called learning using privileged information (LUPI) to the computer vision field. The information is available at training time but not at test time. We explore two maximum-margin techniques that are able to make use of this additional source of information.
0
0
0
Wed Jul 10 2019
Machine Learning
Variational Autoencoders and Nonlinear ICA: A Unifying Framework
We build on recent developments in nonlinear ICA, which we extend to the case with noisy,undercomplete or discrete observations, integrated in a maximum likelihood framework. The result also trivially contains identifiable flow-based generative models as a special case.
1
25
134
Sat Dec 08 2018
Machine Learning
What is the Effect of Importance Weighting in Deep Learning?
Importance-weighted risk minimization is a key ingredient in many machine learning algorithms. Little is known about how it impacts over-parameterized, deep neural networks.
2
8
59
Sat Dec 20 2014
Machine Learning
Explaining and Harnessing Adversarial Examples
Machine learning models consistently misclassify adversarial examples. We argue that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature. This explanation is supported by new quantitative results.
8
4
25
Wed Apr 10 2019
Machine Learning
Large-Scale Long-Tailed Recognition in an Open World
Open Long-Tailed Recognition (OLTR) is an algorithm that maps an image to a feature space. It respects the closed-world classification while acknowledging the novelty of the open world. On three large-scale OLTR datasets, our method consistently outperforms the state-of-the-art.
1
2
20
Thu Jun 09 2011
Artificial Intelligence
SMOTE: Synthetic Minority Over-sampling Technique
An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class.
3
0
10
Thu Apr 13 2017
Machine Learning
Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning
Virtual adversarial loss is defined as the robustness of the conditional label distribution around each input data point against local perturbation. The directions in which we smooth the model are only "virtually"adversarial, we call our method virtual adversarial training (VAT)
1
0
2