Skip to content

v7 Upgrade, welcome

Vat edited this page Sep 11, 2017 · 10 revisions

🕺

Version 7 is a very-exciting and very-needed change to the library. It's a many-times-rewrite to the existing api, beginning November 2016, and consisting of 700 commits.

It softens many edges in the original workflow, and offers a pretty-fresh way of working with english text.

the idea is to make it simple to 1) reach-in 2) make a change 3) output simply.

// give it your arbitrary text
var r = nlp(`Finally, the api is stable.`)

//grab a subset and make a transformation..
r.nouns().toUpperCase()

//call a subset-specific method
r.sentences().toExclamation()

//output the new thing as whatever
r.out('text')
//"Finally, the API is stable!"

in short:

nlp(myText).mySubset().subsetFn().out(myOutput)

major takeaways:

  • it's now simply called compromise (Thanks Joshua!)
  • all methods now work with terms, instead of a one-off term. - no more looping!
  • includes a clever regex-like matching scheme for grammatical patterns
  • easy-access to common text treatments (contractions, punctuation, etc)
  • one universal input, consistently tagged + parsed
  • smarter dependent/consistent/conflicting POS-tag logic

minor takeaways:

demands less working knowledge of internals + grammar 💥

no longer fusses with lumping/splitting of neighbouring terms 💥

more playful and 'bottom up' design 💥

easier matching of ad-hoc templates 💥

cuter debugging and traceable decision-making:boom:

npm install compromise

Words live as groups

Instead of single Term objects having the methods & tooling, the library now hoists all this functionality to the main API, so you can filter-down, act-upon, and inspect any list of terms, just as easy as acting on a single term.

( ie. one word is now just a list of words, of length 1. )

This way, you can work on arbitrary text without arbitrary compromise choices getting in the way:

r= nlp('singing').verbs().toPastTense()
// sang

r= nlp('would have been singing').verbs().toPastTense()
// would have sang

r= nlp('john is singing. Sara was singing.').verbs().toPastTense().out('array')
//[was, was]

no more nlp.person(), nlp.value()...

every input will now be pos-tagged, and supplied the appropriate methods for each sequence.

let r= nlp('five years old')
r.values().toNumber()
r.out('text')
// '5 years old'

if you don't trust this, you can co-erce the POS:

nlp('john is cool').tagAs('Noun').nouns().toPlural().out('text')
//john is cools

Match/subset-lookup .match()

see match syntax

nlp('john is cool and jane is nice').match('#Person is').out('array')
//[ 'john is', 'jane is']

more functionality:

nlp('john is cool and jane is nice').not('#Person is').out('array')
//[ 'cool', 'nice']
nlp('john is cool and jane is nice').matchOne('#Person is').out('array')
//[ 'john is']

output

nlp('John is cool').out('normal');
nlp('John is cool').out('text');
nlp('John is cool').out('html');
//show part-of-speech tags, etc
nlp('John is cool').out('terms');
//and soon, adhoc-scripting
nlp('John is cool').out(myFunction);

to see all the new features, see compromise.cool/demos

for the full API, see compromise.cool/docs

a huge thank you to our 45! contributors to the work.

for low-hanging fruit, checkout our todo list