Published on Fri Oct 30 2015

Highway Long Short-Term Memory RNNs for Distant Speech Recognition

Yu Zhang, Guoguo Chen, Dong Yu, Kaisheng Yao, Sanjeev Khudanpur, James Glass

In this paper, we extend the deep long short-term memory (DLSTM) recurrent neural networks by introducing gated direct connections between memory cells. These direct links, called highway connections, enable unimpeded information flow across different layers and thus alleviate the gradient vanishing problem.

0
0
0
Abstract

In this paper, we extend the deep long short-term memory (DLSTM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers. These direct links, called highway connections, enable unimpeded information flow across different layers and thus alleviate the gradient vanishing problem when building deeper LSTMs. We further introduce the latency-controlled bidirectional LSTMs (BLSTMs) which can exploit the whole history while keeping the latency under control. Efficient algorithms are proposed to train these novel networks using both frame and sequence discriminative criteria. Experiments on the AMI distant speech recognition (DSR) task indicate that we can train deeper LSTMs and achieve better improvement from sequence training with highway LSTMs (HLSTMs). Our novel model obtains WER on AMI (SDM) dev and eval sets, outperforming all previous works. It beats the strong DNN and DLSTM baselines with and relative improvement respectively.

Tue Jan 10 2017
Artificial Intelligence
Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition
A plain LSTM has an internal memory cell that can learn long term dependencies of sequential data. It also provides a temporal shortcut path to avoid vanishing or exploding gradients in the temporal domain. 10-layer residual LSTM networks provided the lowest WER 41.0%.
0
0
0
Wed Jun 22 2016
Neural Networks
A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition
We present a comprehensive study of deep bidirectional long short-term memory(LSTM) recurrent neural network (RNN) based acoustic models for automatic speech recognition (ASR) We study the effect of size and depth and train models of up to 8 layers.
0
0
0
Thu Oct 16 2014
Neural Networks
Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition
Long short-term memory (LSTM) based acoustic modeling methods have recently been shown to give state-of-the-art performance on some speech recognition tasks. To achieve a further performance improvement, deep extensions on LSTM are investigated.
0
0
0
Tue Mar 21 2017
NLP
Deep LSTM for Large Vocabulary Continuous Speech Recognition
Recurrent neural networks (RNNs), especially long short-term memory (LSTM) RNNs are effective network for sequential task like speech recognition. Deeper LSTM models perform well on large vocabulary continuous speech recognition, but it is more difficult to train a deeper network.
0
0
0
Tue Apr 30 2019
Machine Learning
Very Deep Self-Attention Networks for End-to-End Speech Recognition
Deep Transformer networks with high learning capacity are able to exceed performance from previous end-to-end approaches. An ensemble of these models achieve 9.9% and 17.7% WER on Switchboard and CallHome test sets respectively.
0
0
0
Tue Sep 19 2017
NLP
Language Modeling with Highway LSTM
Language models (LMs) based on Long Short Term Memory (LSTM) have shown good gains in many automatic speech recognition tasks. In this paper, we extend an LSTM by adding highway networks inside an L STM and use the resulting HighwayL STM for language modeling.
0
0
0