Published on Mon Sep 19 2016

Multi-view Dimensionality Reduction for Dialect Identification of Arabic Broadcast Speech

Sameer Khurana, Ahmed Ali, Steve Renals

In this work, we present a new Vector Space Model (VSM) of speech utterances for the task of spoken dialect identification. The aim of this paper is to construct a single VSM that encodes information about spoken dialects from both the Phonotactic and Acoustic VSMs.

0
0
0
Abstract

In this work, we present a new Vector Space Model (VSM) of speech utterances for the task of spoken dialect identification. Generally, DID systems are built using two sets of features that are extracted from speech utterances; acoustic and phonetic. The acoustic and phonetic features are used to form vector representations of speech utterances in an attempt to encode information about the spoken dialects. The Phonotactic and Acoustic VSMs, thus formed, are used for the task of DID. The aim of this paper is to construct a single VSM that encodes information about spoken dialects from both the Phonotactic and Acoustic VSMs. Given the two views of the data, we make use of a well known multi-view dimensionality reduction technique known as Canonical Correlation Analysis (CCA), to form a single vector representation for each speech utterance that encodes dialect specific discriminative information from both the phonetic and acoustic representations. We refer to this approach as feature space combination approach and show that our CCA based feature vector representation performs better on the Arabic DID task than the phonetic and acoustic feature representations used alone. We also present the feature space combination approach as a viable alternative to the model based combination approach, where two DID systems are built using the two VSMs (Phonotactic and Acoustic) and the final prediction score is the output score combination from the two systems.

Wed Sep 23 2015
NLP
Automatic Dialect Detection in Arabic Broadcast Speech
We investigate different approaches for dialect identification in Arabic. We combined phonetic, lexical features from a speech recognition system, and acoustic features using the i-vector framework. We used these features in a binary classifier to discriminate between Modern Standard Arabic and Dialectal Arabic.
0
0
0
Sun Dec 13 2020
NLP
SPARTA: Speaker Profiling for ARabic TAlk
This paper proposes a novel approach to an automatic estimation of three traits from Arabic speech: gender, emotion, and dialect. The dataset was assembled from six publicly available datasets. LSTM and CNN networks were implemented using raw and pre-trained features.
0
0
0
Mon Jan 25 2021
Machine Learning
Domain-Dependent Speaker Diarization for the Third DIHARD Challenge
This report presents the system developed by the ABSP Laboratory team for the third DIHARD speech diarization challenge. We explore speaker embeddings for \emph{acoustic domain identification} (ADI) task. The performance substantially improved over that of the baseline.
0
0
0
Mon Sep 19 2016
NLP
The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition
This paper describes the Arabic Multi-Genre Broadcast (MGB-2) Challenge for SLT-2016. Unlike last year's English MGB Challenge, which focused on recognizing diverse TV genres, this year, the challenge has an emphasis on handling the diversity in dialect in Arabic speech.
0
0
0
Wed Apr 14 2021
Machine Learning
Unsupervised low-rank representations for speech emotion recognition
0
0
0
Thu May 03 2018
Machine Learning
Supervector Compression Strategies to Speed up I-Vector System Development
Front-end factor analysis (FEFA) is currently the prevalent approach to extract compact utterance-level features for automatic speaker verification (ASV) systems. We study several alternative methods, including PPCA, factor analysis and two supervised approaches. The results suggest that, in terms of ASV accuracy, the
0
0
0