Skip to content
Gez Quinn edited this page Feb 20, 2020 · 5 revisions

compromise is a modest library that does natural-language processing in javascript.

it was built to make searching and transforming human-text easy and playful.

number-parsing

v12 is the biggest (and most-proud) release in it's 8 year history. It involves 500 commits over 11 months of work.

You can read about some of the design-decisions for this update here.


Although the release is a near-complete rewrite, most compromise v11 scripts will continue to work in v12.

There are many subtle changes, and this document is intended as a upgrade guide.


Big wins:

  • v12 is considerably faster. In most cases it is 50% faster than v11.

  • v12 is considerably smaller. It is 170kb, instead of 235kb. (~40% smaller)

  • pass-by-reference issues are gone. Most github issues will close.

  • the new plugin scheme makes customization cleaner and more simple

  • .export(), .import() to serialize and compress a document object

  • new @termMethod match-syntax feature - to query non-tag info

  • .json() and .text() outputs are configurable now

  • paragraph support!

  • better unicode support

  • moved all documentation to observablehq

  • cleaned-up internal handling of whitespace/punctuation


Breaking changes:

Removed methods:

  • .getPunctuation()- use .pre() or .post()
  • .setPunctuation() - use .pre(str) or .post(str)
  • .whitespace() - use .json({terms:{whitespace:true}})
  • .flatten() - use .join()
  • .lump() - this was an anti-pattern.
  • .insertAt() - using term indexes is not fun!
  • .reduce() - not sure if this ever even worked?
  • .normal() - use .text('normal')
  • nouns().articles() - use nouns().json()
  • nlp.clone() - removed, now that nlp.extend() is more-direct.

Removed tags:

the v12 feature - @termMethod allows you to query things that are not in the term's tags. This allows us to clear-up the following tags:

  • #Comma - @hasComma
  • #Quotation - .quotations() has been improved
  • #ClauseEnd - .clauses() has been heavily-improved
  • #NumberRange - the compromise-numbers plugin cleans these things up.

Moved methods:

our new plugin scheme allows us to easily add all sorts of behaviour to compromise classes. This has allowed us to separate some functionality into plugins. These are very easy to include (promise!):

  • .values() - number parsing has been moved to compromise-numbers

  • .ngrams() - ngram functionality has been moved to compromise-ngrams

  • .dates() - date parsing has been moved to compromise-dates

  • .adjectives() - adjective conjugation has been moved to compromise-adjectives

  • .contractions() - now returns only contractions, and not possible-contractions. .contract() is now a stand-alone method.

  • .out('html') - html output has been moved to compromise-output

These plugins can just be applied like this:

const nlp = require('compromise')
nlp.extend(require('compromise-plugin-foo'))

Once the plugin is applied, things should work just as normal.

Misc breaking

  • .map(), .forEach(), .filter(), .some() all return full Doc objects of length 1 (instead of an undocumented internal object)

  • results of .canbe() are more like .match()

  • .normalize() doesn't transform numbers anymore - use compromise-numbers

  • more consistent behaviour for .replace('foo [bar]', 'baz')

  • .numbers() results no longer include Units, by default. Get them with .numbers().units()

  • .verbs() results no longer include leading/trailing Adverbs. Get them with .verbs().adverbs()

  • the internal compromise api has changed considerably. If you were 'reaching in' to the internal objects in v11, you'll see many changes.

  • removed no-longer-needed prefix_ and _suffix operators from match syntax

  • .toCamelCase() no-longer capitalizes char[0]. Run .toCamelCase().toTitleCase() for this.

non-breaking changes:

New Methods:

  • .reverse() -
  • .unique() - remove duplicates using 'root'
  • .cache() - speed-up matches and lookups
  • .uncache() - manually disable the cache
  • .join() - search between sentences, for example
  • .lookAhead() - match through the terms before your current match
  • .lookBehind() -match through the terms after your current match
  • .lists() - find all comma-seperated natural-language lists
  • .matchOne() - return the first .match()
  • .segment() - split a document according to a given label
  • .export() - serialize and compress the document for saving/moving

New Constructor methods

  • .extend() - change any internal compromise data
  • .load() - create a new document from .export() results

Misc new features