Published on Fri May 28 2021

New Image Captioning Encoder via Semantic Visual Feature Matching for Heavy Rain Images

Chang-Hwan Son, Pung-Hwi Ye

Image captioning generates text that describes scenes from input images. In bad weather conditions, such as heavy rain, snow, and dense fog, the poor visibility causes a serious degradation of image quality. To address this, this study introduces a new encoder for captioning heavy rain images.

0
0
0
Abstract

Image captioning generates text that describes scenes from input images. It has been developed for high quality images taken in clear weather. However, in bad weather conditions, such as heavy rain, snow, and dense fog, the poor visibility owing to rain streaks, rain accumulation, and snowflakes causes a serious degradation of image quality. This hinders the extraction of useful visual features and results in deteriorated image captioning performance. To address practical issues, this study introduces a new encoder for captioning heavy rain images. The central idea is to transform output features extracted from heavy rain input images into semantic visual features associated with words and sentence context. To achieve this, a target encoder is initially trained in an encoder-decoder framework to associate visual features with semantic words. Subsequently, the objects in a heavy rain image are rendered visible by using an initial reconstruction subnetwork (IRS) based on a heavy rain model. The IRS is then combined with another semantic visual feature matching subnetwork (SVFMS) to match the output features of the IRS with the semantic visual features of the pretrained target encoder. The proposed encoder is based on the joint learning of the IRS and SVFMS. It is is trained in an end-to-end manner, and then connected to the pretrained decoder for image captioning. It is experimentally demonstrated that the proposed encoder can generate semantic visual features associated with words even from heavy rain images, thereby increasing the accuracy of the generated captions.

Fri Aug 30 2019
Computer Vision
Reflective Decoding Network for Image Captioning
The Reflective Decoding Network (RDN) enhances both the long-sequence dependency and position of words in a caption decoder. The model learns to collaboratively attend on both visual and textual features and meanwhile perceive each word's relative position.
0
0
0
Mon Nov 02 2020
Computer Vision
Dual Attention on Pyramid Feature Maps for Image Captioning
Generating natural sentences from images is a fundamental learning task for visual-semantic understanding in multimedia. The proposed pyramid attention and dual attention methods are highly modular, which can be inserted into various image captioning modules.
0
0
0
Sun Jun 21 2020
Computer Vision
Improving Image Captioning with Better Use of Captions
Image captioning is a multimodal problem that has drawn extensive attention. We present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation.
0
0
0
Tue Nov 27 2018
Computer Vision
Unsupervised Image Captioning
The paper is the first attempt to train an image captioning model in an unsupervised manner. Instead of relying on manually labeled image-sentence pairs, our proposed model merely requires an image set, a sentence corpus and an existing visual concept detector.
1
0
1
Wed Jul 14 2021
Computer Vision
From Show to Tell: A Survey on Image Captioning
Connecting Vision and Language plays an essential role in Generative Intelligence. In the last few years, a large research effort has been devoted to image captioning. This work aims to provide a comprehensive overview and categorization of image Captioning approaches.
5
82
235
Wed Apr 29 2020
Computer Vision
Image Captioning through Image Transformer
0
0
0
Mon Nov 21 2016
Computer Vision
Image-to-Image Translation with Conditional Adversarial Networks
conditional adversarial networks are a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping.
5
2
8
Tue Feb 10 2015
Machine Learning
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
We introduce an attention based model that automatically learns to describe the content of images. We validate the use of attention with state-of-the-art performance on three benchmark datasets.
2
1
2
Mon Dec 22 2014
Machine Learning
Adam: A Method for Stochastic Optimization
Adam is an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and has little memory requirements. It is well suited for problems that are large in terms of data and parameters.
3
0
2
Tue Dec 06 2016
Artificial Intelligence
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
Attention-based neural encoder-decoder frameworks have been widely adopted. Most methods force visual attention to be active for every generated word. The decoder likely requires little to no visual information from the image to predict non-visual words such as "the" and "of"
0
0
0
Wed Oct 16 2013
NLP
Distributed Representations of Words and Phrases and their Compositionality
The Skip-gram model is an efficient method for learning high-quality distributed vector representations. By subsampling of the frequent words we obtain significant speedup and learn more regular word representations. We also describe asimple alternative to the hierarchical softmax called negative sampling.
1
0
0
Wed May 16 2018
Computer Vision
Lightweight Pyramid Networks for Image Deraining
Existing deep convolutional neural networks have found major success in image deraining, but at the expense of an enormous number of parameters. This limits their potential application, for example in mobile devices. We propose a lightweight pyramid of networks (LPNet) for single image derained.
0
0
0