Creating-Computable-Knowledge-from-Unstructured-Information

Purpose and Goals

Purpose

This repository is dedicated to the 2022 BioIT Hakathon. This team was tasked with creating and optimizing a natural language processing (NLP) pipline derived from NVIDIA's MEGATRON pipeline. The models developed in this repository will be utilized to analyze abstracts from scientific publications in the PubMed repository in order to create a knowledge graph linking disease to key genes, proteins, and drug interactions.

Goals

The goals of this project are to identifiy a biology focused training data for NLP and use it to train a deep learning model for disease- drug, disease-gene, and disease-protein interactions. Information gleaned from this model were then visualized in a knowledge graph. We use the standard Name Entity REcognition (NER), Reletive Extraction (RE), followed up Entity Linking (EL) pipeline commonly used by NVIDIA's Megatron. We compared the accuracy of this pipeline with a p-tuning and prompt tuning pipeline also within Megatron.

Experimental Setup

Training Set Data

We utilized the BioCreative data sets to train our models.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
BIO_IT_2022.pptx		BIO_IT_2022.pptx
BIO_IT_2022_edBB.pptx		BIO_IT_2022_edBB.pptx
BIO_IT_2022_edSS.pptx		BIO_IT_2022_edSS.pptx
LICENSE		LICENSE
LitCoin_EL.ipynb		LitCoin_EL.ipynb
LitCoin_EL.zip		LitCoin_EL.zip
README.md		README.md
WT_LitCoin_NER.ipynb		WT_LitCoin_NER.ipynb
WT_LitCoin_RE.ipynb		WT_LitCoin_RE.ipynb
abstract-colorectal-set.txt		abstract-colorectal-set.txt
abstracts_test.csv		abstracts_test.csv
abstracts_train.csv		abstracts_train.csv
colorectal_cancer_cms_abstracts_nothing_else		colorectal_cancer_cms_abstracts_nothing_else
dev_LitCoin_AID.tsv		dev_LitCoin_AID.tsv
dev_LitCoin_EID.tsv		dev_LitCoin_EID.tsv
dev_LitCoin_IOB.tsv		dev_LitCoin_IOB.tsv
dev_LitCoin_RE.tsv		dev_LitCoin_RE.tsv
entities_train.csv		entities_train.csv
relation_extraction.py		relation_extraction.py
relations_train.csv		relations_train.csv
sample_abstract_file		sample_abstract_file
submission_example.csv		submission_example.csv
trainLabels.txt		trainLabels.txt
trainWords.txt		trainWords.txt
train_LitCoin_AID.tsv		train_LitCoin_AID.tsv
train_LitCoin_EID.tsv		train_LitCoin_EID.tsv
train_LitCoin_IOB.tsv		train_LitCoin_IOB.tsv
train_LitCoin_RE.tsv		train_LitCoin_RE.tsv

License

BioITHackathons/Creating-Computable-Knowledge-from-Unstructured-Information

Folders and files

Latest commit

History

Repository files navigation

Creating-Computable-Knowledge-from-Unstructured-Information

Purpose and Goals

Purpose

Goals

Experimental Setup

Training Set Data

About

Topics

Resources

License

Stars

Watchers

Forks

Languages