Skip to content

Latest commit

 

History

History
42 lines (32 loc) · 2.96 KB

part-of-speech_tagging.md

File metadata and controls

42 lines (32 loc) · 2.96 KB

Part-of-speech tagging

Part-of-speech tagging (POS tagging) is the task of tagging a word in a text with its part of speech. A part of speech is a category of words with similar grammatical properties. Common English parts of speech are noun, verb, adjective, adverb, pronoun, preposition, conjunction, etc.

Example:

Vinken , 61 years old
NNP , CD NNS JJ

UD

Universal Dependencies (UD) is a framework for cross-linguistic grammatical annotation, which contains more than 100 treebanks in over 60 languages. Models are typically evaluated based on the average test accuracy across 28 languages.

Model Avg accuracy Paper / Source
Adversarial Bi-LSTM (Yasunaga et al., 2018) 96.73 Robust Multilingual Part-of-Speech Tagging via Adversarial Training
Bi-LSTM (Plank et al., 2016) 96.40 Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss
Joint Bi-LSTM (Nguyen et al., 2017) 95.55 A Novel Neural Network Model for Joint POS Tagging and Graph-based Dependency Parsing

Penn Treebank

A standard dataset for POS tagging is the Wall Street Journal (WSJ) portion of the Penn Treebank, containing 45 different POS tags. Sections 0-18 are used for training, sections 19-21 for development, and sections 22-24 for testing. Models are evaluated based on accuracy.

Model Accuracy Paper / Source
Meta BiLSTM (Bohnet et al., 2018) 97.96 Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Tokenn Encodings
Char Bi-LSTM (Ling et al., 2015) 97.78 Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation
Adversarial Bi-LSTM (Yasunaga et al., 2018) 97.59 Robust Multilingual Part-of-Speech Tagging via Adversarial Training
Yang et al. (2017) 97.55 Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks
Ma and Hovy (2016) 97.55 End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
Feed Forward (Vaswani et a. 2016) 97.4 Supertagging with LSTMs
Bi-LSTM (Ling et al., 2017) 97.36 Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation
Bi-LSTM (Plank et al., 2016) 97.22 Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss

Go back to the README