Published on Tue Nov 19 2019

KISS: Keeping It Simple for Scene Text Recognition

Christian Bartz, Joseph Bethge, Haojin Yang, Christoph Meinel

We introduce a new model for scene text recognition that only consists of off-the-shelf building blocks. Our model (KISS) consists of two ResNet based feature extractors, a spatial transformer, and atransformer. We train our model only on publicly available, synthetic training

1
0
0
Abstract

Over the past few years, several new methods for scene text recognition have been proposed. Most of these methods propose novel building blocks for neural networks. These novel building blocks are specially tailored for the task of scene text recognition and can thus hardly be used in any other tasks. In this paper, we introduce a new model for scene text recognition that only consists of off-the-shelf building blocks for neural networks. Our model (KISS) consists of two ResNet based feature extractors, a spatial transformer, and a transformer. We train our model only on publicly available, synthetic training data and evaluate it on a range of scene text recognition benchmarks, where we reach state-of-the-art or competitive performance, although our model does not use methods like 2D-attention, or image rectification.

Thu May 07 2020
Computer Vision
Text Recognition in the Wild: A Survey
The history of text can be traced back over thousands of years. Rich and precise semantic information carried by text is important in a wide range of application scenarios. Text recognition in natural scenes has been an active research field in computer vision and pattern recognition.
0
0
0
Thu Jul 27 2017
Computer Vision
STN-OCR: A single Neural Network for Text Detection and Text Recognition
Detecting and recognizing text in natural scene images is a challenging, yet not completely solved task. In this paper we present STN-OCR, a step towards semi-supervised neural networks for scene text recognition.
0
0
0
Mon Nov 21 2016
Computer Vision
TextBoxes: A Fast Text Detector with a Single Deep Neural Network
This paper presents an end-to-end trainable fast scene text detector. TextBoxes detects scene text with both high accuracy and efficiency in a single network forward pass.
0
0
0
Thu Dec 14 2017
Computer Vision
SEE: Towards Semi-Supervised End-to-End Scene Text Recognition
Seal is a network that learns to detect and recognize text from natural images. It is a step towards semi-supervised neural networks for scene text detection and recognition.
0
0
0
Wed Mar 18 2020
Computer Vision
Scene Text Recognition via Transformer
Scene text recognition with arbitrary shape is very challenging due to large variations in text shapes, fonts, colors, backgrounds, etc. Most state-of-the-art algorithms rectify the input image into the normalized image, then treat the recognition as a sequence prediction task. In this paper,
0
0
0
Sat Nov 10 2018
Computer Vision
Scene Text Detection and Recognition: The Deep Learning Era
Survey aimed at summarizing andanalyzing the major changes and significant progresses of scene text detection and recognition in the deep learning era. We expect that this review paper would serve as a reference book for researchers in this field.
0
0
0
Wed Feb 11 2015
Machine Learning
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalized achieves the same
2
342
1,163
Mon Jun 12 2017
NLP
Attention Is All You Need
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms. Experiments on two machine translation tasks show these models to be superior in
50
215
883
Thu Sep 04 2014
Computer Vision
Very Deep Convolutional Networks for Large-Scale Image Recognition
Convolutional networks of increasing depth can achieve state-of-the-art results. The research was the basis of the team's ImageNet Challenge 2014.
2
2
7
Mon Sep 01 2014
NLP
Neural Machine Translation by Jointly Learning to Align and Translate
Neural machine translation is a recently proposed approach to machine translation. Unlike traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be tuned to maximize translation performance.
6
4
7
Thu Aug 08 2019
Machine Learning
On the Variance of the Adaptive Learning Rate and Beyond
The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization. Extensive experimental results on image classification, language modeling, and neural machine translation demonstrate the effectiveness and robustness of our proposed method.
2
1
3
Sat Jun 13 2015
Computer Vision
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
The goal of precipitation nowcasting is to predict the future rainfall intensity in a local region over a relatively short period of time. Very few previous studies have examined this crucial and challenging weather forecasting problem from a machine learning perspective.
1
0
1
Wed Dec 23 2020
Machine Learning
On Calibration of Scene-Text Recognition Models
The topic of confidence calibration has been an active research area for the last several decades. In this work, we focus on the calibration of STR models on the word rather than the character level. We apply existing calibration methodologies and new sequence-based extensions to numerous STR models.
0
0
0
Mon Feb 22 2021
Computer Vision
Revisiting Classification Perspective on Scene Text Recognition
The prevalent perspectives of scene text recognition are from sequence to sequence and segmentation. We revive classification perspective by devising a model named CSTR. CSTR is as simple as an image classification model like ResNet.
0
0
0
Wed Jul 22 2020
Computer Vision
FedOCR: Communication-Efficient Federated Learning for Scene Text Recognition
Scene text recognition techniques have been widely used in commercial applications. Most existing algorithms have assumed a set of shared or centralized training data. In practice, data may be distributed on different local devices that can not be centralized to share due to privacy restrictions.
0
0
0
Wed Jul 15 2020
Computer Vision
RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition
The attention-based encoder-decoder framework has recently achieved impressive results for scene text recognition. However, it performs poorly on contextless texts. We propose a novel position enhancement branch, and dynamicallyfuse its outputs with those of the decoder attention module.
0
0
0
Tue May 26 2020
Computer Vision
Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition
0
0
0