Skip to content

simplemma-0.9.1

Compare
Choose a tag to compare
@adbar adbar released this 20 Jan 17:07

What's Changed

  • smaller language data footprint with smallest possible impact on performance, using a combination of rules, upper limit on word length, and better data cleaning (#31)
  • unsupervised approach to affixes activated by default for some languages
  • reviewed rules for English and German (less greedy)
  • added rules for Dutch, Finnish, Polish and Russian
  • improved Russian and Ukrainian language data (#3)
  • improved tokenizer

Full Changelog: v0.9.0...v0.9.1