Published on Tue Apr 09 2019

Exploring Uncertainty Measures for Image-Caption Embedding-and-Retrieval Task

Kenta Hama, Takashi Matsubara, Kuniaki Uehara, Jianfei Cai

With the wide development of black-box machine learning algorithms, the practical demand for the reliability assessment is rapidly rising. This study investigates two aspects of image-caption embedding-and-retrieval systems. The consistent performance of two uncertainty measures is observed with different datasets.

0
0
0
Abstract

With the wide development of black-box machine learning algorithms, particularly deep neural network (DNN), the practical demand for the reliability assessment is rapidly rising. On the basis of the concept that `Bayesian deep learning knows what it does not know,' the uncertainty of DNN outputs has been investigated as a reliability measure for the classification and regression tasks. However, in the image-caption retrieval task, well-known samples are not always easy-to-retrieve samples. This study investigates two aspects of image-caption embedding-and-retrieval systems. On one hand, we quantify feature uncertainty by considering image-caption embedding as a regression task, and use it for model averaging, which can improve the retrieval performance. On the other hand, we further quantify posterior uncertainty by considering the retrieval as a classification task, and use it as a reliability measure, which can greatly improve the retrieval performance by rejecting uncertain queries. The consistent performance of two uncertainty measures is observed with different datasets (MS COCO and Flickr30k), different deep learning architectures (dropout and batch normalization), and different similarity functions.

Sun Jun 17 2018
Machine Learning
Learning to Evaluate Image Captioning
Evaluation metrics for image captioning face two challenges. CIDEr, METEOR, ROUGE and BLEU often do not correlate well with human judgments. Each metric has well known blind spots to pathological caption constructions.
0
0
0
Tue Nov 17 2020
Machine Learning
Improving Calibration in Deep Metric Learning With Cross-Example Softmax
Cross-Example Softmax combines properties of top- and threshold relevancy. In each iteration, the proposed loss encourages all queries to be closer to their matching images than all non-matching images. This leads to a globally more calibrated similarity metric.
0
0
0
Thu Sep 05 2019
Computer Vision
REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning
Popular metrics used for evaluating image captioning systems, such as BLEU and CIDEr, provide a single score to gauge the system's overall effectiveness. REO assesses the quality of captions from three perspectives: relevance, extraness and omission.
0
0
0
Wed Jan 22 2020
Computer Vision
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
New vision-language pre-trained model for image-text joint embedding. Transformer-based model takes different modalities as input and models relationship.
0
0
0
Sun Sep 08 2019
NLP
Quality Estimation for Image Captions Based on Large-scale Human Evaluations
Quality Estimation (QE) attempts to model the caption quality from a human perspective. QE models trained over the coarse ratings can effectively detect and filter out low-quality image captions, thereby improving the user experience.
0
0
0
Tue Jul 14 2020
Machine Learning
Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets
A wide range of image captioning models has been developed, achieving significant improvement based on popular metrics, such as BLEU, CIDEr, and SPICE. However, although the generated captions can accurately describe the image, they are generic for similar images and lack distinctiveness.
0
0
0