Skip to content

Text Classification using Machine Learning session at Lancaster Summer Schools in Corpus Linguistics

Notifications You must be signed in to change notification settings

drelhaj/MachineLearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 

Repository files navigation

MachineLearning

This repository was used in the Text Classification using Machine Learning session at Lancaster Summer Schools in Corpus Linguistics and other Digital methods #LancsSS16 and #LancsSS17 at Lancaster University, UK – 12th to 15th July 2016 and 27th - 30th June 2017. http://ucrel.lancs.ac.uk/summerschool/nlp.php

Insttructor: Dr. Mahmoud El-Haj http://www.lancaster.ac.uk/staff/elhaj

Slides are avialable online here:

Course: https://lancaster.box.com/s/fi15evvbtcs4ab0tx5zo8nxmy2yylztx

Workspace Setup: https://lancaster.box.com/s/j78l0b4197il98oze2gfqlidlsvg7jlt

The code trains classifiers for chairman's statements, governance & remuneration sections from 1,000 annual financial reports. Using WEKA Java the code does the following:

  • Creates an ARFF File
  • Train a model using different Algorithms
  • Extract n-gram features using stringToWordsVector
  • Reduce features
  • Classify unseen documents using the created models.

About

Text Classification using Machine Learning session at Lancaster Summer Schools in Corpus Linguistics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages