Published on Wed Oct 28 2020

Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input

Xingchen Song, Zhiyong Wu, Yiheng Huang, Chao Weng, Dan Su, Helen Meng

Non-autoregressive (NAR) transformer models have achieved significantly.inference speedup but at the cost of inferior accuracy. Most of the NAR transformers take a fixed-length sequence filled with MASK tokens or a redundant sequence copied from encoder states as decoder input.

0
0
0
Abstract

Non-autoregressive (NAR) transformer models have achieved significantly inference speedup but at the cost of inferior accuracy compared to autoregressive (AR) models in automatic speech recognition (ASR). Most of the NAR transformers take a fixed-length sequence filled with MASK tokens or a redundant sequence copied from encoder states as decoder input, they cannot provide efficient target-side information thus leading to accuracy degradation. To address this problem, we propose a CTC-enhanced NAR transformer, which generates target sequence by refining predictions of the CTC module. Experimental results show that our method outperforms all previous NAR counterparts and achieves 50x faster decoding speed than a strong AR baseline with only 0.0 ~ 0.3 absolute CER degradation on Aishell-1 and Aishell-2 datasets.

Fri Jun 18 2021
Artificial Intelligence
An Improved Single Step Non-autoregressive Transformer for Automatic Speech Recognition
Non-autoregressive mechanisms can significantly decrease inference time for speech transformers. We propose several methods to improve the accuracy of the end-to-end CASS-NAT. Without using an external language model, the WERs of the improved C ASS-NAT, when using the three methods, are 7%~
1
0
0
Wed Oct 28 2020
NLP
CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer for Speech Recognition
We propose a CTC alignment-based single step non-autoregressive transformer(CASS-NAT) for speech recognition. The CASS- NAT has a performance reduction on WER, but is 51.2x faster in terms of RTF.
0
0
0
Sun Apr 04 2021
NLP
TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition
0
0
0
Mon Oct 26 2020
NLP
Improved Mask-CTC for Non-Autoregressive End-to-End ASR
Mask-CTC achieves remarkably fast inference speed, but its recognition performance falls behind that of conventional autoregressive (AR) systems. We propose new training and decoding methods by introducing an objective to predict the length of a partial target sequence, which allows the model to delete or insert tokens.
0
0
0
Sun Nov 10 2019
Machine Learning
Listen and Fill in the Missing Letters: Non-Autoregressive Transformer for Speech Recognition
A-CMLM and A-FMLM are two different non-autoregressive transformer structures. During training, for both frameworks, input tokens are randomly replaced by special mask tokens. The network is required to predict the tokens corresponding to those mask tokens by taking unmasked context and
0
0
0
Sat May 16 2020
NLP
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition
0
0
0