Skip to content

MLH-Fellowship/Social-BERTerfly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

92 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Social BERTerfly πŸ¦‹

Open In Colab

Predicts your personality out of the 16 Myers-Briggs Type Personalities by your Twitter handle and compares your personality types with the people that you follow

It utilizes machine learning classifier and NLP using the state of the art language model - BERT (Bidirectional Encoder Representations from Transformers) to predict the personality type of the given user based on their recent tweets.

Getting Started: πŸ™Œ

How to run locally:

Follow the below steps to run and explore your personality types, as well as that of your friends!

  • git clone https://github.com/MLH-Fellowship/Social-BERTerfly.git

  • Install our model weights from the following Drive link:

    BERT_base_model

  • Place the downloaded .h5 model under server/models/.

  • Navigate to the server folder by:

    cd server/

  • Install dependencies by:

    pip install -r requirements.txt (you can install the packages in a virtualenv if you prefer)

  • Add your Twitter API keys and authorization credentials in the .env file. To get Twitter API key you can refer to this article. Do not make a PR or publish .env file with your Twitter API key and credentials. Create a separate copy of .env file in your cloned repo and delete if after use or you can uncomment the "/server/.env" in gitignore.

  • Create a new folder "twitter_data" in the same directory to store the fetched tweets.

  • Run the following in your terminal:

    flask run

    or, python app.py

  • Wait around 15 seconds for the model to load.

  • Visit the application at http://127.0.0.1:5000/ and enjoy exploring various personality traits for you and your following!

Note : Make sure to click on Submit button first to fetch the tweets and results. After the personality type is displayed on the landing page, click on Go to Dashboard for detailed analysis.

Start contributing! πŸ“£

If you wish to contribute to our model, you can take a look at our notebook, and provide suggestions or comments.

Open In Colab

An Example:

Landing Page:

h1

A brief description of personality types:

h2

Try it Out:

Head over to the Get Started section to put it your Twitter Handle and press Submit. The model should take approx. 15 sec to return your predicted personality type on the screen as follows: Homepage

Head over to the Dashboard:

Click on Go to Dashboard to get detailed personality analysis along with career suggestions.

d

Compare personality types!:

Now you can also compare your personality type against that of your followers and friends!

d2

Tech Stack:

  • Twitter API for fetching tweets
  • tweepy for connecting the API with Python (https://pypi.org/project/tweepy/)
  • Flask for the backend server
  • Google colaboratory for collaborating on the model and accessing the free TPU πŸ˜‚
  • Keras for training and testing the BERT model
  • BERT as a SOTA model for tweet predictions. (https://arxiv.org/abs/1810.04805)
  • Bootstrap for the homepage and the dashboard UI
  • chartjs for displaying graphs on the Dashboard

Implementation Details:

P.S: If you ain't into the boring stuff, head on over to the next section to contribute to our model and the app!

About MBTI

The Myers Briggs Type Indicator (or MBTI for short) is a personality type system that divides everyone into 16 distinct personality types across 4 axis:

img

  • Introversion (I) – Extroversion (E)
  • Intuition (N) – Sensing (S)
  • Thinking (T) – Feeling (F)
  • Judging (J) – Perceiving (P)

It is one of, if not the, the most popular personality test in the world. It is used in businesses, online, for fun, for research and lots more. From scientific or psychological perspective it is based on the work done on cognitive functions by Carl Jung i.e. Jungian Typology. This was a model of 8 distinct functions, thought processes or ways of thinking that were suggested to be present in the mind. Later this work was transformed into several different personality systems to make it more accessible, the most popular of which is of course the MBTI.

Dataset

For the dataset, we have used the famous Myers-Briggs Personality Type Dataset that includes a large number of people's MBTI type and content written by them. This dataset contains over 8600 rows of data, on each row is a person’s:

- Type (This persons 4 letter MBTI code/type)
- A section of each of the last 50 things they have posted (Each entry separated by "|||" (3 pipe characters))

BERT

Bidirectional Encoder Representations from Transformers (BERT) is a Transformer-based machine learning technique for natural language processing (NLP) pre-training developed by Google. BERT was created and published in 2018 by Jacob Devlin and his colleagues from Google. As of 2019, Google has been leveraging BERT to better understand user searches.

Data Fetching:

Using tweepy and Twitter API, we fetch the 50 latest tweets posted by the user according to the username entered. These tweets are stored in a .csv file and sent for preprocessing, and finally the cleaned texts are sent to the Keras model.

Data preprocessing:

We have used regex to detect special characters like '@,emojis' etc. from the posts, remove stopwords and punctuation, convert the text to lowercase and stemming to extract the root of words. The preprocessed data is split using train_test split and sent to the Keras model for predictions.

BERT Model summary:

Layer (type)                 Output Shape              Param #   
=================================================================
input_word_ids (InputLayer)  [(None, 1500)]            0         
_________________________________________________________________
tf_bert_model_1 (TFBertModel ((None, 1500, 768), (None 109482240)) 
_________________________________________________________________
tf_op_layer_strided_slice_1  [(None, 768)]             0         
_________________________________________________________________
dense_1 (Dense)              (None, 16)                12304     
=================================================================
Total params: 109,494,544
Trainable params: 109,494,544
Non-trainable params: 0

Results achieved:

We tested using a LSTM model, and BERT-base to contrast accuracies.

Model Train accuracy Validation accuracy
LSTM baseline 18.96% 16.9%
BERT-base-uncased 85% 79%

Deployment:

Uses flask for the backend and model deployment and Bootstrap for building the Dashboard and the Homepage UI.

Contributing:

Social BERTerfly is fully Open-Source and open for contributions! We request you to respect our contribution guidelines as defined in our CODE OF CONDUCT and CONTRIBUTING GUIDELINES.

Contributors

Made with ❀️️ by Team Social-BERTerfly as part of MLH Explorer Fall Fellowship 2020 Sprint3.