Skip to content

Latest commit

 

History

History
42 lines (35 loc) · 1.5 KB

natural-language-processing.md

File metadata and controls

42 lines (35 loc) · 1.5 KB

NLP

Unicode Text Segmentation algorithm

  • paper: 'Unicode Standard Annex #29'
  • implemented in: 'org.apache.lucene.analysis.standard.StandardTokenizer'
  • applications: 'Segmentation'

Kstem

S-Stemmer

Lovins stemming algorithm

  • paper: 'Development of a Stemming Algorithm' (1968)
  • applications: 'Stemming'
  • processed language: 'English'
  • implemented in: 'Snowball'

Porter-Stemmer

Lancaster-Stemmer

Snowball-Stemmer

  • applications: 'Stemming'
  • processed language: various
  • implemented in (libraries): 'Lucene', 'nltk.stem.snowball.SnowballStemmer'