Supplementary Materials for "Detecting Harmful Medical Advice by Analyzing the Characteristics of Retweeters"

This repo is home to the supplementary materials for my final project for CS8396: Data Privacy in Biomedicine (Spring 2020) under Dr. Bradly Malin at Vanderbilt University. From the abstract:

I study the ability of a model to discern, for tweets about the COVID-19 crisis and based on the characteristics of users who retweet it, whether or not a given article or tweet provides beneficial medical advice or could lead to a harmful outcome by promoting harmful medical practices or providing incomplete information that could lead to panicked action against the current medical wisdom. The model analyzes the characteristics of people who retweet the article, the pattern of how the article is retweeted, and what twitter uses say when retweeting the article. The study aims to support future work to identify and reduce the spread of panic-inducing misinformation and disinformation in an effort to help authorities better respond to health-threatening epidemics and pandemics.

The full paper can be found in this repo.

The code in this repo contains the jupyter notebook used to train and test the model as well as an additional notebook used to split the dataset into train and test segments. The tweet-uploader folder contains the code used to upload the tweets (contained in line-delimited JSON files downloaded using twarc) to MongoDB, and the tweet-uploader/viewer folder contains the simple web application used to rate these tweets.

This study does not reveal groundbreaking findings, but it did allow me to investigate a side of computer science whence I have not gone before: machine learning. In the process of coming up with the model used in this paper, I trudged through deep learning models in an attempt to model the work done by Y. Liu and Y.-F. Wu. Without any prior experience in this field, I ultimately was not successful in getting a meaningful result from PyTorch, but I later discovered simpler machine learning techniques (logistic regression and random forests) that led to a more meaningful result. The actual tweets used to train this model are not able to be shared due to the terms of service for Twitter's API, but the tweet IDs, along with my rating for each tweet, can be found in the tweets.json file.

My hope at the conclusion of this project is to continue investigating machine learning technologies, as I understand quite a bit more about the statistics behind these models after using them, and they not only make a lot more sense but seem quite a bit more useful and less "magic" to me.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
tweet-uploader		tweet-uploader
README.md		README.md
detecting-harmful-medical-advice-twitter.bib		detecting-harmful-medical-advice-twitter.bib
detecting-harmful-medical-advice-twitter.pdf		detecting-harmful-medical-advice-twitter.pdf
detecting-harmful-medical-advice-twitter.tex		detecting-harmful-medical-advice-twitter.tex
full_set_auc.png		full_set_auc.png
random_forest_analysis.ipynb		random_forest_analysis.ipynb
split_dataset.ipynb		split_dataset.ipynb
test_set_auc.png		test_set_auc.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tweet-uploader

tweet-uploader

README.md

README.md

detecting-harmful-medical-advice-twitter.bib

detecting-harmful-medical-advice-twitter.bib

detecting-harmful-medical-advice-twitter.pdf

detecting-harmful-medical-advice-twitter.pdf

detecting-harmful-medical-advice-twitter.tex

detecting-harmful-medical-advice-twitter.tex

full_set_auc.png

full_set_auc.png

random_forest_analysis.ipynb

random_forest_analysis.ipynb

split_dataset.ipynb

split_dataset.ipynb

test_set_auc.png

test_set_auc.png

Repository files navigation

Supplementary Materials for "Detecting Harmful Medical Advice by Analyzing the Characteristics of Retweeters"

About

Releases

Packages

Languages

leonm1/medical-advice-twitter-project

Folders and files

Latest commit

History

Repository files navigation

Supplementary Materials for "Detecting Harmful Medical Advice by Analyzing the Characteristics of Retweeters"

About

Resources

Stars

Watchers

Forks

Languages