Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: "My favorite time of the year" in .nouns() response #1096

Open
MarketingPip opened this issue Mar 13, 2024 · 3 comments
Open

[Issue]: "My favorite time of the year" in .nouns() response #1096

MarketingPip opened this issue Mar 13, 2024 · 3 comments
Labels

Comments

@MarketingPip
Copy link
Contributor

Just doing some playing again and seen this gets tagged as all nouns.

import nlp from "https://esm.sh/compromise"

let doc = nlp("my favorite time of the year")

console.log(doc.nouns().out("array")) ["my favorite time of the year"]

I am sure there are various other sentences that some more rules need to be added for. If you have any ideas for DBs with POS tagged we could throw at this to identify some other patterns etc.

As I am sure we could open issues like this one all day long etc..

@spencermountain
Copy link
Owner

hey Jared, good catch. The tags are correct, but this is a case of .nouns() getting overly-excited.

image

.nouns() has always been noun-phrasey, and I'm not sure how best to tokenize this phrase, if we were to split it up further.
it does seem awkward though, I agree.
cheers

@spencermountain spencermountain changed the title [Issue]: "My favorite time of the year" tagged as all nouns. [Issue]: "My favorite time of the year" in .nouns() response Apr 1, 2024
@MarketingPip
Copy link
Contributor Author

@spencermountain maybe split by compound nouns? As they should be grouped together but other nouns not...?

Ps; I think I got a list somewhere to of compound nouns to throw at you from awhile back somewhere too!

@MarketingPip
Copy link
Contributor Author

MarketingPip commented Apr 14, 2024

@spencermountain - I don't know if best approach to this is writing a rule for this such as is "[#PossessiveNoun] #Adjective" > tag group (0) as possessive determiner.

Which this response from GPT might help write this rule:

In English grammar, possessive pronouns typically do not directly modify adjectives. Instead, they typically modify nouns. For example:

Possessive pronoun modifying a noun: "That is my favorite book."
Adjective modifying a noun: "That is a beautiful book."
However, there are cases where possessive pronouns can indirectly modify adjectives through the noun they are associated with:

"That is my favorite red book."
Here, "my" is a possessive pronoun modifying the noun "book," and "red" is an adjective modifying "book." So indirectly, "my" can influence the adjective "red" by modifying the noun "book."

Then regardless still tokenize all nouns out as single noun or compound nouns. So this doesn't occur just with this phrase, as there were countless phrasing tokenizing as chunks almost.. (Maybe we have to peak into library and see if something is going on...?)

As always too, hope you had an awesome weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants