Published on Mon Jul 02 2018

Punctuation Prediction Model for Conversational Speech

Piotr Żelasko, Piotr Szymański, Jan Mizgajski, Adrian Szymczak, Yishay Carmiel, Najim Dehak

Punctuation causes problems in result presentation and confuses natural language processing algorithms. An ASR system usually does not predict any punctuation or capitalization. We train two variants of Deep Neural Network (DNN) sequence labelling models.

0
0
0
Abstract

An ASR system usually does not predict any punctuation or capitalization. Lack of punctuation causes problems in result presentation and confuses both the human reader andoff-the-shelf natural language processing algorithms. To overcome these limitations, we train two variants of Deep Neural Network (DNN) sequence labelling models - a Bidirectional Long Short-Term Memory (BLSTM) and a Convolutional Neural Network (CNN), to predict the punctuation. The models are trained on the Fisher corpus which includes punctuation annotation. In our experiments, we combine time-aligned and punctuated Fisher corpus transcripts using a sequence alignment algorithm. The neural networks are trained on Common Web Crawl GloVe embedding of the words in Fisher transcripts aligned with conversation side indicators and word time infomation. The CNNs yield a better precision and BLSTMs tend to have better recall. While BLSTMs make fewer mistakes overall, the punctuation predicted by the CNN is more accurate - especially in the case of question marks. Our results constitute significant evidence that the distribution of words in time, as well as pre-trained embeddings, can be useful in the punctuation prediction task.

Mon Aug 03 2020
NLP
Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech
Conventional approaches in speech processing typically use forced alignment to encoder per frame acoustic features to word level features. As an alternative, we explore attention based multimodal fusion and compare its performance with forced alignment. We also demonstrate the effectiveness of our semi-supervised learning approach.
0
0
0
Mon Apr 13 2020
Machine Learning
Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?
Automatic Speech Recognition (ASR) systems introduce word errors, which often confuse punctuation prediction models. These errors usually take the form of homonyms. We show how retrofitting of the word embeddings on the domain-specific data can mitigate ASR errors.
0
0
0
Sat Jun 12 2021
NLP
Incorporating External POS Tagger for Punctuation Restoration
Punctuation restoration is an important post-processing step in automatic speech recognition. Part-of-speech(POS) taggers provide informative tags, suggesting each input token's syntactic role. We incorporate an external POS tagger and fuse its predicted labels into the existing language model.
4
1
0
Thu Dec 03 2020
Machine Learning
End to End ASR System with Automatic Punctuation Insertion
Recent Automatic Speech Recognition systems have been moving towards end-to-end systems that can be trained together. Historically, there has been a lot of interest in incorporating automatic punctuation in textual or speech to text context. We propose a method to generate punctuated transcript for the
0
0
0
Mon Sep 10 2018
NLP
Towards JointUD: Part-of-speech Tagging and Lemmatization using Recurrent Neural Networks
We have extended an LSTM-based neural network designed for sequence tagging. The network was jointly trained to produce lemmas, part-of-speech tags and morphological features. The results demonstrate the viability of the proposed architecture.
0
0
0
Tue Oct 20 2020
Machine Learning
Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction
We present a novel multi-modal unspoken punctuation prediction system for the English language. The system combines acoustic and text features. This is achieved by leveraging hash-based embeddings of automatic speech recognition.
0
0
0