Skip to content

indicnlp/awesome-resources-for-indic-nlp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

Works from our Volunteers

Whole list

Malayalam

mlmorph - Malayalam Morphological Analyzer using Finite State Transducer

Tamil

Datasets

Datasets in tamil text

Scrapers

  1. Tamil Etymological Dictionary
  2. Newspaper Crawlers

ML models

Text Classification model in Pytorch: Can be easily applied to other datasets, infact the linked repository also contains a dataset for film reviews in tamil.

Bengali

Bangla2Vec

Bengali News Classification

Scrapers

Bengali News Channel Scraper

Research Papers and Data

Research Papers in Bengali NLP

Hindi

NLP for Hindi

  • Contains Wikipedia Articles Dataset (55,000 articles) and scripts which were used to scrape Wikipedia and clean that dataset
  • Contains Hindi Movie Reviews Dataset and scripts which were used to scrape those Movie Reviews from Hindi News Websites
  • Contains Language Model with Perplexity ~36
  • Contains Movie Review classification model with Kappa Score ~30
  • Contains BBC News Classification Model with Accuracy ~79

Punjabi

NLP for Punjabi

  • Contains Wikipedia Articles Dataset (44,000 articles) and scripts which were used to scrape Wikipedia and clean that dataset
  • Contains BBC Punjabi News dataset and scripts which were used to scrape those News articles from Punjabi News Websites
  • Contains Language Model with Perplexity ~13
  • Contains BBC News Classification Model with kappa score ~49

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published