Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TitleCased numbers in ProperNouns #993

Open
thegoatherder opened this issue Jan 16, 2023 · 3 comments
Open

TitleCased numbers in ProperNouns #993

thegoatherder opened this issue Jan 16, 2023 · 3 comments

Comments

@thegoatherder
Copy link
Contributor

In v13, Any #TextValue such as five or twelve is correctly tagged, regardless of the casing of the string.
In v14, the tagger fails to add #TextValue for any word-number with a first uppercase letter - instead the word gets tagged as #ProperNoun

e.g. The Twelve Days of Christmas
Got:
Twelve => #ProperNoun
Want:
Twelve => #TextValue #Value

Possibly related #992 ?

Example

const nlp = require('compromise')
const nlpDates = require('compromise-dates')
const nlpNumbers = require('compromise-numbers')
nlp.plugin(nlpDates)
nlp.plugin(nlpNumbers)

const text = [
  // uppercase #TextValue fails to tag as #TextValue
  'I watched the Twelve Days of Christmas',
  'I put him on the Two-Week Wait cohort',
  'I saw him Five years ago',
  // lowercase #TextValue correctly tagged as #TextValue
  'I watched the twelve days of christmas',
  'I put him on the two-week wait cohort',
  'I saw him five years ago',
]

text.forEach((t) => {
  const doc = nlp(t)
  const result = doc.has('#Value') ? 'MATCH SUCCESS' : 'MATCH FAIL'
  console.log(result)
  console.log(doc.debug())
})

v14 output

MATCH FAIL

  ┌─────────
  │ 'I'        - Noun, Pronoun
  │ 'watched'  - Verb, PastTense
  │ 'the'      - Determiner
  │ 'Twelve'   - ProperNoun, Noun
  │ 'Days'     - Noun
  │ 'of'       - Preposition, Date
  │ 'Christmas'  - Date, Noun, Holiday


View { ptrs: undefined }
MATCH FAIL

  ┌─────────
  │ 'I'        - Noun, Pronoun
  │ 'put'      - Verb, PresentTense, Infinitive
  │ 'him'      - Noun, Pronoun
  │ 'on'       - Preposition
  │ 'the'      - Determiner
  │ 'Two'      - ProperNoun, Noun
  │ 'Week'     - Date, Noun, Duration
  │ 'Wait'     - Noun, Singular, ProperNoun
  │ 'cohort'   - Noun, Singular


View { ptrs: undefined }
MATCH FAIL

  ┌─────────
  │ 'I'        - Noun, Pronoun
  │ 'saw'      - Verb, PastTense
  │ 'him'      - Noun, Pronoun
  │ 'Five'     - ProperNoun, Noun
  │ 'years'    - Date, Noun, Duration
  │ 'ago'      - Date


View { ptrs: undefined }
MATCH SUCCESS

  ┌─────────
  │ 'I'        - Noun, Pronoun
  │ 'watched'  - Verb, PastTense
  │ 'the'      - Determiner
  │ 'twelve'   - Value, TextValue, Cardinal, Date
  │ 'days'     - Date, Noun, Duration
  │ 'of'       - Preposition, Date
  │ 'christmas'  - Date, Noun, Holiday


View { ptrs: undefined }
MATCH SUCCESS

  ┌─────────
  │ 'I'        - Noun, Pronoun
  │ 'put'      - Verb, PresentTense, Infinitive
  │ 'him'      - Noun, Pronoun
  │ 'on'       - Preposition, Date
  │ 'the'      - Determiner, Date
  │ 'two'      - Value, TextValue, Cardinal, Date
  │ 'week'     - Date, Noun, Duration
  │ 'wait'     - Noun, Singular
  │ 'cohort'   - Noun, Singular


View { ptrs: undefined }
MATCH SUCCESS

  ┌─────────
  │ 'I'        - Noun, Pronoun
  │ 'saw'      - Verb, PastTense
  │ 'him'      - Noun, Pronoun
  │ 'five'     - Value, TextValue, Cardinal, Date, DateShift
  │ 'years'    - Date, Noun, Duration, DateShift
  │ 'ago'      - Date, DateShift

v13 output

MATCH SUCCESS
=====
  -----
  | 'I'        - Pronoun, Noun, Singular
  | 'watched'  - PastTense, Verb
  | 'the'      - Determiner
  | 'Twelve'   - TextValue, Value, Cardinal, ProperNoun, Noun, Date
  | 'Days'     - Duration, Date, Noun, Plural
  | 'of'       - Date
  | 'Christmas'  - Holiday, Date, Noun

Doc$1 {
  list: [ Phrase$3 { start: 'i-gsfbh1W', length: 7, isA: 'Phrase' } ]
}
MATCH SUCCESS
=====
  -----
  | 'I'        - Pronoun, Noun, Singular
  | 'put'      - Infinitive, PresentTense, Verb
  | 'him'      - Pronoun, Noun, Singular
  | 'on'       - Date
  | 'the'      - Determiner, Date
  | 'Two'      - TextValue, Value, Cardinal, ProperNoun, Noun, Singular, Date
  | 'Week'     - Duration, Date, Noun, Singular
  | 'Wait'     - ProperNoun, Noun, Singular
  | 'cohort'   - Noun, Singular

Doc$1 {
  list: [ Phrase$3 { start: 'i-okIEpXA', length: 9, isA: 'Phrase' } ]
}
MATCH SUCCESS
=====
  -----
  | 'I'        - Pronoun, Noun, Singular
  | 'saw'      - PastTense, Verb
  | 'him'      - Pronoun, Noun, Singular
  | 'Five'     - TextValue, Value, Cardinal, ProperNoun, Noun, Date, DateShift
  | 'years'    - Duration, Date, Noun, Plural, DateShift
  | 'ago'      - Date, DateShift

Doc$1 {
  list: [ Phrase$3 { start: 'i-fzB8oMG', length: 6, isA: 'Phrase' } ]
}
MATCH SUCCESS
=====
  -----
  | 'I'        - Pronoun, Noun, Singular
  | 'watched'  - PastTense, Verb
  | 'the'      - Determiner
  | 'twelve'   - TextValue, Value, Cardinal, Date
  | 'days'     - Duration, Date, Noun, Plural
  | 'of'       - Date
  | 'christmas'  - Holiday, Date, Noun

Doc$1 {
  list: [ Phrase$3 { start: 'i-kLi0Yga', length: 7, isA: 'Phrase' } ]
}
MATCH SUCCESS
=====
  -----
  | 'I'        - Pronoun, Noun, Singular
  | 'put'      - Infinitive, PresentTense, Verb
  | 'him'      - Pronoun, Noun, Singular
  | 'on'       - Date
  | 'the'      - Determiner, Date
  | 'two'      - TextValue, Value, Cardinal, Date
  | 'week'     - Duration, Date, Noun, Singular
  | 'wait'     - Infinitive, PresentTense, Verb
  | 'cohort'   - Noun, Singular

Doc$1 {
  list: [ Phrase$3 { start: 'i-rDNGLHS', length: 9, isA: 'Phrase' } ]
}
MATCH SUCCESS
=====
  -----
  | 'I'        - Pronoun, Noun, Singular
  | 'saw'      - PastTense, Verb
  | 'him'      - Pronoun, Noun, Singular
  | 'five'     - TextValue, Value, Cardinal, Date, DateShift
  | 'years'    - Duration, Date, Noun, Plural, DateShift
  | 'ago'      - Date, DateShift

Doc$1 {
  list: [ Phrase$3 { start: 'i-VCEua12', length: 6, isA: 'Phrase' } ]
}

@spencermountain spencermountain changed the title v14 regression in capitalised #TextValue tagging ambiguously TitleCased numbers Jan 17, 2023
@spencermountain
Copy link
Owner

tricky one - 'the Twelve Days of Christmas' is a ProperNoun, just like 'Visa One Express' or whatever.

for your purpose, if you want to avoid this, you may want to just add doc.match('(one|two|..twelve)').tag('Value').

not sure what to do on this

@spencermountain spencermountain changed the title ambiguously TitleCased numbers TitleCased numbers in ProperNouns Jan 17, 2023
@thegoatherder
Copy link
Contributor Author

I’m not sure I understand why they have to be mutually exclusive? Can’t it be a #Value, #TextValue and #ProperNoun all at the same time? That would appear semantically correct to me.

@thegoatherder
Copy link
Contributor Author

@spencermountain the strings were correctly labelled both #TextValue, #Value and #ProperNoun in v13. Can’t we just follow the same logic in v14? Why can’t a #ProperNoun also be a #Value?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants