Published on Tue Mar 26 2019

Unpaired Image Captioning via Scene Graph Alignments

Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Handong Zhao, Xu Yang, Gang Wang

Most of current image captioning models heavily rely on paired image-caption datasets. We propose an unsupervised feature alignment method that maps the scene graph features from the image to the sentence modality.

0
0
0
Abstract

Most of current image captioning models heavily rely on paired image-caption datasets. However, getting large scale image-caption paired data is labor-intensive and time-consuming. In this paper, we present a scene graph-based approach for unpaired image captioning. Our framework comprises an image scene graph generator, a sentence scene graph generator, a scene graph encoder, and a sentence decoder. Specifically, we first train the scene graph encoder and the sentence decoder on the text modality. To align the scene graphs between images and sentences, we propose an unsupervised feature alignment method that maps the scene graph features from the image to the sentence modality. Experimental results show that our proposed model can generate quite promising results without using any image-caption training pairs, outperforming existing methods by a wide margin.

Sun Jun 20 2021
Computer Vision
Exploring Semantic Relationships for Unpaired Image Captioning
Image captioning has aroused great interest in both academic and industrial worlds. Most existing systems are built upon large-scale datasets with image-sentence pairs. In this work, we achieve unpaired image captioning by bridging the vision and the language.
1
0
0
Tue Feb 09 2021
Computer Vision
In Defense of Scene Graphs for Image Captioning
The mainstream image captioning models rely on Convolutional Neural Network(CNN) image features to generate captions via recurrent models. Recent studies have noted that the naive use of scene graphs from a black-box scene graph generator harms image Captioning performance.
1
0
2
Sat Oct 03 2020
NLP
Unsupervised Cross-lingual Image Captioning
Building vision-language systems only for English deprives a large part of the world's population of AI technologies' benefit. We present a novel unsupervised cross-lingual method to generate image captions in a target language without using any image-caption corpus.
0
0
0
Wed Mar 14 2018
Computer Vision
Unpaired Image Captioning by Language Pivoting
Image captioning is a multimodal task involving computer vision and natural language processing. We present an approach to the unpaired image captioning problem by language pivoting. Our method can capture the characteristics of an image captioner from the pivot language and align it to the target language.
0
0
0
Fri Sep 25 2020
Computer Vision
Are scene graphs good enough to improve Image Captioning?
Many top-performing image captioning models rely solely on object features. Recent studies propose to directly use scene graphs to introduce information about object relations into captioning.
0
0
0
Sun Jun 21 2020
Computer Vision
Improving Image Captioning with Better Use of Captions
Image captioning is a multimodal problem that has drawn extensive attention. We present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation.
0
0
0