Twitter-NFN-Detector

A project created during my final year at university that provides a system to detect neural fake news/machine-generated Text on Twitter.

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage

About The Project

https://twitter-nfn-bf.anvil.app/

Dissertation Abstract

Neural Fake News (NFN), defined as neurally-generated misinformation masquerading as legitimate news, can be a critical societal issue. In recent years, unsupervised language models (ULMs) such as Generative Pre-trained Transformer-2 (GPT-2) have proven to generate extremely coherent paragraphs of text. These systems enable malicious actors to scale up their operations by delivering automatically generated disinformation across social media. Developing defence mechanisms against NFN is critical in preventing sites such as Twitter falling victim to an upsurge in the spread of synthetic text. I thus present a system that detects machine generated text that is broadcasted on Twitter, utilising fine-tuned pre-trained language models (PTLM) trained on the classification of outputs released with GPT-2. My system applies the weights released with the OpenAI detector model and two fine-tuned models: DeBERTa and XLNet, to classify real (human-written) and fake (machine-generated) text. I find that DeBERTa achieves a 96% classification accuracy with limited resources, competing with the OpenAI detector model that achieved ~95% across three sampling methods. I argue that with access to more powerful hardware capable of processing large sequence lengths, fine-tuning DeBERTa will likely outperform OpenAI’s detector. I also investigate the presence of machine-generated tweets on Twitter and find that they are not currently ubiquitous on social media. I conclude by discussing the importance of research into the detection of machine generated content and suggest that social media platforms implement classification systems as natural language generative models popularise.

Built With

Getting Started

Prerequisites

Install models and fine-tuned weights:

OpenAI RoBERTa Detector:

wget https://openaipublic.azureedge.net/gpt-2/detector-models/v1/detector-base.pt

Model: https://huggingface.co/roberta-large

Fine-tuned DeBERTa-large - (5e-6, batch: 16, epochs: 4, warmup: 50, decay: 0.01)

https://drive.google.com/drive/folders/1P-EewnfcXvQR5UVzgavB9I_py1YFwQc7?usp=sharing

MCC: 0.913 | Accuracy: 0.956

Model: https://huggingface.co/microsoft/deberta-large

Fine-tuned XLNet-large-cased - (1e-5, batch: 16, epochs: 2, warmup: 100):

https://drive.google.com/drive/folders/1vtJ7Q2GqtOpNM7iIO5nX06we3BQJxUNr?usp=sharing

MCC: 0.771 | Accuracy: 0.878

Model: https://huggingface.co/xlnet-large-cased

(XLNet and DeBERTa were fine-tuned on the outputs from the 1.5B GPT-2 model (xl-1542M) versus WebText, the dataset used to train the GPT-2 model)

Registration:

Register for a Twitter Developer Account
Register for an anvil.works account
Register and create new Firebase project

Installation

Download models + fine-tuned weights and store in an accessible location
Generate FireBase SDK Private Key and place credentials file within Project_Main
Initialize realtime database with fake and real nodes (example)

Clone the anvil app:

https://anvil.works/build#clone:YG6YJDUEBCRAHCKA=Y4ALXWHEKMF34YBT4GEPCXJM

Set Anvil UPLINK key in main_.py and Twitter API keys in TweetGetter.py
Run the server file:

# (on the top-level directory of this repository)
pip install -r requirements.txt
python -m main_

Visit the anvil application link!

Usage

Load Tweet: Pressing 'Load Tweet' requests and classifies (fake/real probablities) the latest available Tweet containing "#news", unless specified otherwise

Custom Input: Selecting the 'Custom Input' checkbox provides the option to provide a Twitter URL or any arbitrary input to be classified (fake/real probablities)

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
Project_Main		Project_Main
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
webapp.py		webapp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project_Main

Project_Main

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

webapp.py

webapp.py

Repository files navigation

Twitter-NFN-Detector

About The Project

Built With

Getting Started

Prerequisites

Installation

Usage

About

Languages

BenF99/Twitter-NFN-Detector

Folders and files

Latest commit

History

Repository files navigation

Twitter-NFN-Detector

About The Project

Built With

Getting Started

Prerequisites

Installation

Usage

About

Resources

Stars

Watchers

Forks

Languages