Published on Wed Dec 16 2020

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts

Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie

The rapid progress in 3D scene understanding has come with growing demand for data. However, collecting and annotating 3D scenes (e.g. point clouds) is hard. We propose Contrastive Scene Contexts, a 3D pre-training method.

4
0
1
Abstract

The rapid progress in 3D scene understanding has come with growing demand for data; however, collecting and annotating 3D scenes (e.g. point clouds) are notoriously hard. For example, the number of scenes (e.g. indoor rooms) that can be accessed and scanned might be limited; even given sufficient data, acquiring 3D labels (e.g. instance masks) requires intensive human labor. In this paper, we explore data-efficient learning for 3D point cloud. As a first step towards this direction, we propose Contrastive Scene Contexts, a 3D pre-training method that makes use of both point-level correspondences and spatial contexts in a scene. Our method achieves state-of-the-art results on a suite of benchmarks where training data or labels are scarce. Our study reveals that exhaustive labelling of 3D point clouds might be unnecessary; and remarkably, on ScanNet, even using 0.1% of point labels, we still achieve 89% (instance segmentation) and 96% (semantic segmentation) of the baseline performance that uses full annotations.

Thu May 13 2021
Computer Vision
3D Spatial Recognition without Spatially Labeled 3D
We introduce WyPR, a Weakly-supervised framework for Point cloud Recognition,requiring only scene-level class tags as supervision. WyPR jointly addresses three core 3D recognition tasks: point-level semantic segmentation, 3D proposal generation, and 3D object detection.
1
2
3
Thu Jan 07 2021
Computer Vision
Self-Supervised Pretraining of 3D Features on any Point-Cloud
Pretraining is not widely used for 3D recognition tasks where state-of-the-art methods train models from scratch. A primary reason is the lack of large annotated datasets. 3D data is both difficult to acquire and time consuming to label.
0
0
0
Tue Aug 17 2021our pick
Computer Vision
RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection
RandomRooms is a new method for pre-training 3D point cloud models. It uses a synthetic CAD dataset to boost the learning on real datasets. The method establishes the new state-of-the-art on widely-used 3D detection benchmarks ScanNetV2 and SUN RGB-D.
1
0
0
Thu Nov 30 2017
Computer Vision
3DContextNet: K-d Tree Guided Hierarchical Learning of Point Clouds Using Local and Global Contextual Cues
The method exploits both local and global contextual cues imposed by the k-d tree. The method is designed to learn representation vectors progressively along the tree structure. Experiments on challenging benchmarks show that the proposed model provides discriminative feature features.
0
0
0
Mon Sep 30 2019
Computer Vision
Multi-view PointNet for 3D Scene Understanding
Fusion of 2D images and 3D point clouds is important because information from dense images can enhance sparse point clouds. The MVPNet significantly outperforms prior point cloud based approaches on the task of 3D Semantic Segmentation.
0
0
0
Mon Apr 01 2019
Computer Vision
JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds with Multi-Task Pointwise Networks and Multi-Value Conditional Random Fields
Deep learning techniques have become the to-go models for most vision-related tasks on 2D images. However, their power has not been fully realised on several tasks in 3D space. In this work, we jointly address the problems of semantic and instance segmentation of 3D point clouds.
0
0
0
Thu May 28 2020
NLP
Language Models are Few-Shot Learners
GPT-3 is an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model. It can perform tasks without gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model.
20
21
235
Sat Jun 13 2020
Machine Learning
Bootstrap your own latent: A new approach to self-supervised Learning
Bootstrap Your Own Latent (BYOL) is a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other.
1
30
112
Wed Jun 17 2020
Computer Vision
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
Unsupervised image representations have significantly reduced the gap with supervised pretraining, notably with the recent achievements of contrastive learning methods. These contrastive methods typically work online and rely on a large number of explicit pairwise feature comparisons. In this paper, we propose an online algorithm, SwAV,
1
0
4
Tue Jul 10 2018
Machine Learning
Representation Learning with Contrastive Predictive Coding
Unsupervised learning has not seen such widespread adoption, and remains an important and challenging endeavor for artificial intelligence. The key insight of our model is to learn such representations by predicting the future in latent space by using powerful autoregressive models.
1
0
3
Sun Jun 24 2012
Machine Learning
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data representation. Different representations can entangle and hide more or less the different explanatory factors of variation behind the data. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations.
1
1
2
Fri Dec 02 2016
Computer Vision
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
Point cloud is an important type of geometric data structure. Most researchers transform such data to regular 3D voxel grids or collections of images. This, however, renders data unnecessarily voluminous. In this paper, we design a novel type of neural network that directly consumes point clouds.
1
0
1