Published on Mon May 18 2020

Defending Your Voice: Adversarial Attack on Voice Conversion

Chien-yu Huang, Yist Y. Lin, Hung-yi Lee, Lin-shan Lee

The paper is the first known attempt to perform an adversarial attack on voice conversion. We introduce human-imperceptible noise into the utterances of a speaker whose voice is to be defended. Preliminary experiments were conducted on two currently state-of-the-art voice conversion models.

0
0
0
Abstract

Substantial improvements have been achieved in recent years in voice conversion, which converts the speaker characteristics of an utterance into those of another speaker without changing the linguistic content of the utterance. Nonetheless, the improved conversion technologies also led to concerns about privacy and authentication. It thus becomes highly desired to be able to prevent one's voice from being improperly utilized with such voice conversion technologies. This is why we report in this paper the first known attempt to perform adversarial attack on voice conversion. We introduce human imperceptible noise into the utterances of a speaker whose voice is to be defended. Given these adversarial examples, voice conversion models cannot convert other utterances so as to sound like being produced by the defended speaker. Preliminary experiments were conducted on two currently state-of-the-art zero-shot voice conversion models. Objective and subjective evaluation results in both white-box and black-box scenarios are reported. It was shown that the speaker characteristics of the converted utterances were made obviously different from those of the defended speaker, while the adversarial examples of the defended speaker are not distinguishable from the authentic utterances.

Sun Nov 10 2019
NLP
Evaluating Voice Conversion-based Privacy Protection against Informed Attackers
Speech data conveys sensitive speaker attributes like identity or accent. Such attributes can be inferred and exploited for malicious purposes. Anonymization aims to make the data unlinkable, i.e., no utterance can be linked to its original speaker.
0
0
0
Tue Apr 09 2019
Machine Learning
CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion
Non-parallel voice conversion (VC) is a technique for learning the mapping from source to target speech. It has been challenging due to the disadvantages of the training conditions. CycleGAN-VC has provided a breakthrough and performed comparably to a parallel VC method without relying on extra data.
0
0
0
Mon Aug 05 2019
Machine Learning
V2S attack: building DNN-based voice conversion from automatic speaker verification
This paper presents a new voice impersonation attack using voice conversion. Enrolling personal voices for automatic speaker verification (ASV) offers natural and flexible biometric authentication systems.
0
0
0
Mon Apr 23 2018
Machine Learning
A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment
Voice conversion (VC) aims at conversion of speaker characteristic without altering content. Due to training data limitations and modeling imperfections, it is difficult to achieve believable speaker mimicry. We address artifact assessment using an objective approach leveraging from prior work on spoofing countermeasures (CMs) for automatic speaker verification.
0
0
0
Fri Mar 02 2018
Machine Learning
Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data
Systems for detecting voice spoofing attacks are becoming more capable. Speech synthesis and voice conversion paradigms are appearing. We developed a speech enhancement system that improves the quality of speech data found in publicly available sources.
0
0
0
Fri Jul 30 2021
Machine Learning
Practical Attacks on Voice Spoofing Countermeasures
Voice authentication has become an integral part in security-critical operations, such as bank transactions and call center conversations. We develop the first practical attack on CMs, and show how a malicious actor may efficiently craft audio samples to bypass voice authentication in its strictest form.
2
0
0