Python scripts to parse the Gigaword collection and perform NER tagging with StanfordNER
- Run the parse-gigaword-sgml.py to transform a document from the Gigaword-LDC collection into plain text
- Download StanfordNER
- Run it on server mode as show in start-server.sh
- Use the ner-tag.py to add named-entities tags (i.e., ORG, LOC, PER) the transformed plain text document