Published on Fri Jul 31 2020

TweepFake: about Detecting Deepfake Tweets

Tiziano Fagni, Fabrizio Falchi, Margherita Gambini, Antonio Martella, Maurizio Tesconi

The first dataset of \real deepfake tweets, TweepFake. It is real in the sense that each deepfake tweet was actually posted on Twitter. We collected 20,000 tweets from a total of 23 bots, imitating 17 human accounts.

8
0
6
Abstract

The recent advances in language modeling significantly improved the generative capabilities of deep neural models: in 2019 OpenAI released GPT-2, a pre-trained language model that can autonomously generate coherent, non-trivial and human-like text samples. Since then, ever more powerful text generative models have been developed. Adversaries can exploit these tremendous generative capabilities to enhance social bots that will have the ability to write plausible deepfake messages, hoping to contaminate public debate. To prevent this, it is crucial to develop deepfake social media messages detection systems. However, to the best of our knowledge no one has ever addressed the detection of machine-generated texts on social networks like Twitter or Facebook. With the aim of helping the research in this detection field, we collected the first dataset of \real deepfake tweets, TweepFake. It is real in the sense that each deepfake tweet was actually posted on Twitter. We collected tweets from a total of 23 bots, imitating 17 human accounts. The bots are based on various generation techniques, i.e., Markov Chains, RNN, RNN+Markov, LSTM, GPT-2. We also randomly selected tweets from the humans imitated by the bots to have an overall balanced dataset of 25,572 tweets (half human and half bots generated). The dataset is publicly available on Kaggle. Lastly, we evaluated 13 deepfake text detection methods (based on various state-of-the-art approaches) to both demonstrate the challenges that Tweepfake poses and create a solid baseline of detection techniques. We hope that TweepFake can offer the opportunity to tackle the deepfake detection on social media messages as well.

Sat Dec 05 2020
Machine Learning
Enhanced Offensive Language Detection Through Data Augmentation
The ICWSM-2020 Data Challenge Task 2 is aimed at identifying offensive content using a crowd-sourced dataset containing 100k labelled tweets. The dataset suffers from class imbalance, where certain labels are extremely rare. We show that applying Dager can increase the F1 score of the data
0
0
0
Mon Sep 07 2020
Machine Learning
Team Alex at CLEF CheckThat! 2020: Identifying Check-Worthy Tweets With Transformer Models
misinformation and disinformation have been thriving in social media for years. With the emergence of the COVID-19 pandemic, the political and the health misinformation merged. The fight against this infodemic has many aspects, with fact-checking and debunking false and misleading claims among the most important.
0
0
0
Fri Oct 09 2020
Machine Learning
NutCracker at WNUT-2020 Task 2: Robustly Identifying Informative COVID-19 Tweets using Ensembling and Adversarial Training
0
0
0
Mon Jan 11 2021
Machine Learning
Evaluating Deep Learning Approaches for Covid19 Fake News Detection
Social media platforms like Facebook, Twitter, and Instagram have enabled connection and communication on a large scale. These platforms have led to an increase in the creation and spread of fake news. The fake news has not only influenced people in the wrong direction but also claimed human lives.
0
0
0
Thu Nov 02 2017
Machine Learning
A Comprehensive Low and High-level Feature Analysis for Early Rumor Detection on Twitter
The objective of rumor debunking in microblogs is to detect these misinformation as early as possible. In this work, we leverage neural models in learning the hidden representations of individual rumor-related tweets at the very beginning of a rumor.
0
0
0
Fri Dec 04 2020
Machine Learning
TrollHunter [Evader]: Automated Detection [Evasion] of Twitter Trolls During the COVID-19 Pandemic
TrollHunter leverages a unique linguistic analysis of a multi-dimensional set of Twitter content features to detect whether or not a tweet was meant to Troll. TrollHunter achieved 98.5% accuracy, 75.4% precision and 69.8% recall over a dataset of 1.3 million tweets.
1
0
0
Tue Jun 12 2018
NLP
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
We describe a neural network-based system for text-to-speech (TTS) synthesis. We are able to generate speech audio in the voice of many different speakers, including those unseen during training. We demonstrate that the proposed model is able to transfer the knowledge of speaker variability to a new task.
5
640
1,391
Wed Oct 09 2019
NLP
HuggingFace's Transformers: State-of-the-art Natural Language Processing
Transformers is an open-source library with the goal of opening up machine learning advances to the wider machine learning community. The library consists of carefully engineered state-of-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrained models made
1
286
1,112
Mon Jun 12 2017
NLP
Attention Is All You Need
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms. Experiments on two machine translation tasks show these models to be superior in
50
215
883
Wed May 29 2019
NLP
Defending Against Neural Fake News
Recent progress in natural language generation has raised dual-use concerns. The technology might enable adversaries to generate neural fake news. This is targeted propaganda that closely mimics the style of real news. Developing robust verification techniques against generators like Grover is critical.
2
6
16
Thu Oct 11 2018
NLP
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT is designed to pre-train deep                bidirectional representations from unlabeled text. It can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
13
8
15
Sat Aug 24 2019
NLP
Release Strategies and the Social Impacts of Language Models
Large language models have a range of beneficial uses. However, their flexibility and generative capabilities also raise misuse concerns. This report discusses OpenAI's work related to the release of its GPT-2 language model.
3
2
15