Published on Mon Jun 22 2020

Articulatory-WaveNet: Autoregressive Model For Acoustic-to-Articulatory Inversion

Narjes Bozorg, Michael T. Johnson

Articulatory-WaveNet is a new approach for acoustic-to-articulator inversion. The proposed system uses the WaveNet speech synthesis architecture. The system was trained and evaluated on the ElectroMagnetic Articulography corpus of Mandarin Accented English.

0
0
0
Abstract

This paper presents Articulatory-WaveNet, a new approach for acoustic-to-articulator inversion. The proposed system uses the WaveNet speech synthesis architecture, with dilated causal convolutional layers using previous values of the predicted articulatory trajectories conditioned on acoustic features. The system was trained and evaluated on the ElectroMagnetic Articulography corpus of Mandarin Accented English (EMA-MAE),consisting of 39 speakers including both native English speakers and native Mandarin speakers speaking English. Results show significant improvement in both correlation and RMSE between the generated and true articulatory trajectories for the new method, with an average correlation of 0.83, representing a 36% relative improvement over the 0.61 correlation obtained with a baseline Hidden Markov Model (HMM)-Gaussian Mixture Model (GMM) inversion framework. To the best of our knowledge, this paper presents the first application of a point-by-point waveform synthesis approach to the problem of acoustic-to-articulatory inversion and the results show improved performance compared to previous methods for speaker dependent acoustic to articulatory inversion.

Sat May 16 2020
Machine Learning
Learning Joint Articulatory-Acoustic Representations with Normalizing Flows
The articulatory geometric configurations of the vocal tract and the acoustic properties of the resultant speech sound are considered to have a strong causal relationship. This paper aims at finding a joint latent representation between the articulatory and acoustic domain for vowel sounds via invertible neural network models.
0
0
0
Sat Apr 07 2018
Machine Learning
A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis
A Wavenet vocoder outperformed classical source-filter-based vocoders. A combination of AR acoustic model and vocoder achieved a similar score.
0
0
0
Tue Apr 03 2018
Machine Learning
Neural Autoregressive Flows
Normalizing flows and autoregressive models have been successfully combined. NAF yields state-of-the-art performance on asuite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST.
0
0
0
Mon Oct 29 2018
Machine Learning
Neural source-filter-based waveform model for statistical parametric speech synthesis
Neural waveform models such as the WaveNet are used in many recent text-to-speech systems. The original WaveNet is quite slow in waveform generation because of its autoregressive (AR) structure. This study proposes a non-AR neural source-filter waveform model.
0
0
0
Fri Nov 15 2019
NLP
Independent and automatic evaluation of acoustic-to-articulatory inversion models
Reconstruction of articulatory trajectories from the acoustic speech signal has been proposed for improving speech recognition and text-to-speech synthesis. To be useful in these settings, articulatory reconstruction must be speaker independent. Standard evaluation measures are inappropriate for evaluating the speaker-independence of models.
0
0
0
Sat Apr 27 2019
Machine Learning
Neural source-filter waveform models for statistical parametric speech synthesis
Neural waveform models such as WaveNet have demonstrated better performance than conventional vocoders for statistical parametric speech synthesis. WaveNet is limited by a slow sequential waveform generation process. Some new models that use the inverse-autoregressive flow can generate a whole waveform in a one-shot manner.
0
0
0