This repository is an archive of NLP notebooks, mainly in the purpose of setting baseline for each tasks.
(Updated 28 March, 2022)
-
Sentiment binary classification on short sentences with log likelihood-ratio method.
-
Task : Sequence Classification
-
Dataset : Google GoEmotion
-
naive_bayes_sentiment
- Task : Binary Sentiment Classification
- Model : Naive Bayes
- Result : 81 % Accuracy
(Updated 26 May, 2022)
-
These notebooks are written for kaggle competition.
-
Goal : Automatically extract feature text from human-written patient interview notes.
-
Task : Token Classification (Segal et al, 2020)
-
Dataset : NBME kaggle dataset
-
NBME_hf
- Uses HuggingFace API
- Model : DistilBERT
- Result : In Process
-
NBME_pt
- Uses PyTorch and Weights & Biases for hyperparameter tuning.
- Model : DistilBERT
- Result : Binary Cross Entropy 0.0139