Published on Thu Jul 29 2021

Hierarchical Self-supervised Augmented Knowledge Distillation

Chuanguang Yang, Zhulin An, Linhang Cai, Yongjun Xu

We adopt an alternative self-supervised augmented task to guide the network to learn the joint distribution of the original recognition task and auxiliary task. Our method significantly surpasses the previous SOTA SSKD with an average improvement of 2.56% on CIFAR-100.

0
0
0
Abstract

Knowledge distillation often involves how to define and transfer knowledge from teacher to student effectively. Although recent self-supervised contrastive knowledge achieves the best performance, forcing the network to learn such knowledge may damage the representation learning of the original class recognition task. We therefore adopt an alternative self-supervised augmented task to guide the network to learn the joint distribution of the original recognition task and self-supervised auxiliary task. It is demonstrated as a richer knowledge to improve the representation power without losing the normal classification capability. Moreover, it is incomplete that previous methods only transfer the probabilistic knowledge between the final layers. We propose to append several auxiliary classifiers to hierarchical intermediate feature maps to generate diverse self-supervised knowledge and perform the one-to-one transfer to teach the student network thoroughly. Our method significantly surpasses the previous SOTA SSKD with an average improvement of 2.56\% on CIFAR-100 and an improvement of 0.77\% on ImageNet across widely used network pairs. Codes are available at https://github.com/winycg/HSAKD.

Wed Nov 07 2018
Computer Vision
Amalgamating Knowledge towards Comprehensive Classification
Reusing such trained models can significantly reduce the cost of training the new models from scratch. The goal of knowledge amalgamation is to learn a lightweight student model capable of handling the comprehensive classification.
0
0
0
Sat Dec 05 2020
Neural Networks
Knowledge Distillation Thrives on Data Augmentation
knowledge distillation (KD) is a general deep neural network training framework that uses a teacher model to guide a student model. Many works have explored the rationale for its success, however, its interplay with data augmentation (DA) has not been well recognized so far.
0
0
0
Fri Sep 18 2020
Computer Vision
Densely Guided Knowledge Distillation using Multiple Teacher Assistants
knowledge distillation which guides the learning of a small student network from a large teacher network is being actively studied for model compression and transfer learning. Few studies have been performed to resolve the poor learning issue of the student network when the student and teacher model sizes significantly differ.
0
0
0
Sat May 02 2020
Computer Vision
Heterogeneous Knowledge Distillation using Information Flow Modeling
Knowledge Distillation (KD) methods are capable of transferring the knowledge of a large and complex teacher into a smaller and faster student. Early KD methods were limited to transferring knowledge only between the last layers of the networks, while latter approaches were capable of performing multi-layer KD.
0
0
0
Thu Apr 11 2019
Artificial Intelligence
Variational Information Distillation for Knowledge Transfer
We propose an information-theoreticframework for knowledge transfer. We transfer knowledge from a convolutional neural network to a multi-layer perceptron. The resulting MLP significantly outperforms the-state-of-the-art methods.
0
0
0
Wed Sep 25 2019
Machine Learning
Revisiting Knowledge Distillation via Label Smoothing Regularization
KD aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. Its success is generally attributed to the privileged information on similarities among categories provided by the teacher model. We argue that the success of KD is not fully due to the similarity information between categories from teachers, but also to the regularization of soft targets.
0
0
0
Wed Mar 21 2018
Computer Vision
Unsupervised Representation Learning by Predicting Image Rotations
Deep convolutional neural networks (ConvNets) have transformed the field of computer vision thanks to their unparalleled capacity to learn high level semantic image features. In order to successfully learn those features, they usually require massive amounts of manually labeled data, which is both expensive and impractical to scale. We propose to learn image features by training ConvNets to recognize the 2d rotation that
1
7
53
Fri Jun 12 2020
Computer Vision
Knowledge Distillation Meets Self-Supervision
knowledge distillation involves extracting the "dark knowledge" from a teacher network to guide the learning of a student network. By exploiting the similarity between those noisy self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
0
0
0
Wed Mar 30 2016
Computer Vision
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
The CFN takes image tiles as input and explicitly limits the receptive field of its early processing units to one tile at a time. By training the CFN to solve Jigsaw puzzles, we can learn both a feature mapping of object parts as well as their correct spatial arrangement.
0
0
0
Thu Apr 11 2019
Artificial Intelligence
Variational Information Distillation for Knowledge Transfer
We propose an information-theoreticframework for knowledge transfer. We transfer knowledge from a convolutional neural network to a multi-layer perceptron. The resulting MLP significantly outperforms the-state-of-the-art methods.
0
0
0
Thu Jun 04 2015
Computer Vision
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
State-of-the-art object detection networks depend on region proposalgorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks.
0
0
0
Mon Dec 12 2016
Computer Vision
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
Attention plays a critical role in human visual experience. Attention can also play an important role in applying artificial neural networks to a variety of tasks. By correctly defining attention for convolutional neural networks, we can significantly improve the performance of CNN networks.
0
0
0