Published on Fri Mar 23 2018

Speeding-up Object Detection Training for Robotics with FALKON

Elisa Maiettini, Giulia Pasquale, Lorenzo Rosasco, Lorenzo Natale

Latest deep learning methods for object detection provide remarkable performance, but have limits when used in robotic applications. The long training time is due to the large size and imbalance of the associated training sets. In this paper we propose a novel pipeline that overcomes this problem and provides comparable performance with a 60x training speedup.

0
0
0
Abstract

Latest deep learning methods for object detection provide remarkable performance, but have limits when used in robotic applications. One of the most relevant issues is the long training time, which is due to the large size and imbalance of the associated training sets, characterized by few positive and a large number of negative examples (i.e. background). Proposed approaches are based on end-to-end learning by back-propagation [22] or kernel methods trained with Hard Negatives Mining on top of deep features [8]. These solutions are effective, but prohibitively slow for on-line applications. In this paper we propose a novel pipeline for object detection that overcomes this problem and provides comparable performance, with a 60x training speedup. Our pipeline combines (i) the Region Proposal Network and the deep feature extractor from [22] to efficiently select candidate RoIs and encode them into powerful representations, with (ii) the FALKON [23] algorithm, a novel kernel-based method that allows fast training on large scale problems (millions of points). We address the size and imbalance of training data by exploiting the stochastic subsampling intrinsic into the method and a novel, fast, bootstrapping approach. We assess the effectiveness of the approach on a standard Computer Vision dataset (PASCAL VOC 2007 [5]) and demonstrate its applicability to a real robotic scenario with the iCubWorld Transformations [18] dataset.

Wed Nov 25 2020
Computer Vision
Fast Region Proposal Learning for Object Detection for Robotics
Object detection is a fundamental task for robots to operate in unstructured environments. Unfortunately, training such systems requires several hours of GPU time. A recent method proposes an architecture that leverages on the powerful representation of deep learning descriptors, while permitting fast adaptation time.
1
1
1
Mon Dec 28 2020
Computer Vision
Data-efficient Weakly-supervised Learning for On-line Object Detection under Domain Shift in Robotics
Several object detection methods have recently been proposed in the robotics literature. Learning solely on off-line data may introduce biases and prevents adaptation to novel tasks. We compare several techniques for using weakly-supervised learning in detection pipelines to reduce model (re)training costs without compromising accuracy.
0
0
0
Wed Nov 25 2020
Computer Vision
Fast Object Segmentation Learning with Kernel-based Methods for Robotics
Object segmentation is a key component in the visual system of a robot that performs tasks like grasping and object manipulation. We propose a novel architecture that overcomes this problem and provides comparable performance in a fraction of the time required by the state-of-the-art methods.
2
2
2
Tue Mar 14 2017
Computer Vision
Geometry-Based Region Proposals for Real-Time Robot Detection of Tabletop Objects
We present a novel object detection pipeline for localization and recognition in three dimensional environments. Our approach makes use of an RGB-D sensor and combines state-of-the-art techniques from the robotics and computer vision communities to create a robust, real-time detection system.
0
0
0
Tue Jul 19 2016
Computer Vision
FusionNet: 3D Object Classification Using Multiple Data Representations
High-quality 3D object recognition is an important component of many vision and robotics systems. We tackle the object recognition problem using two data representations. We introduce new Volumetric CNN(V-CNN) architectures.
0
0
0
Mon Feb 27 2017
Computer Vision
A Dataset for Developing and Benchmarking Active Vision
The dataset includes 20,000+ RGB-D images and 50,000-2D bounding boxes of object instances densely captured in 9 unique scenes. The state of the art for object detection is still severely impacted by object scale, occlusion, and viewing direction.
0
0
0