You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My overall goal is to identify PresentTense sentences with a #Person or #Pronoun subject.
While troubleshooting, I ran into an issue with the following sentence.
nlp("She led a diverse advisory committee throughout the process that included membership from schools, city planning departments, state engineering departments, transit agencies, and AARP.").sentences().json()
Clearly the verb should be led, not “state engineering”. I think the trouble here is that both state and engineering can be verbs, but in this sentence should be adjectives of “departments.”
This does work though:
nlp("She led a diverse advisory committee throughout the process that included membership from city planning departments and state engineering departments.").sentences().json()
{
subject: "she"
verb: "led"
predicate: "a diverse advisory committee throughout the proces…ing departments and state engineering departments"
grammar: { tense: "PastTense"}
}
In both cases, the subject is properly identified (as “she”), but when there is a comma separated list of items which include potential verbs (“state” and “engineering”) it seems to skip over the sentence’s actual verb (“led”) and chooses one of the potential verbs from the list (“state”).
Is there something I can do on my side to resolve this, or is there a fix that can be done within this package?
The text was updated successfully, but these errors were encountered:
hey Ryan - yep, your intuitions are correct. The tagger gets borked on the ambiguous words and the long list.
I think the error is in .clauses() - it may be confusing those commas with sentence fragments, then interpreting '^state engineering ..' like '^keep engineering..'.
You can poke-around at the tagger logs if you're interested, by adding nlp.verbose('tagger'). Fraid I don't see a quick solution on this sentence, other than adding nlp(txt, {'state engineering':'Noun'}) as a quick one.
In all, I'd really like to improve the logic for parsing the SVO of a sentence. You can see the logic right now is really poor.
Let me know if you find anything else
cheers
Thank you for the suggestion for the quick fix. That obviously fixes just that sentence. I'll keep poking around to see if I can solve it more generally. Thanks for your quick response!
My friend who lives in London looks like Homer Simpson.
This should have two clauses: "My friend looks like Homer Simpson" and "who lives in London"
Compromise only finds a single clause and identifies the sentence's verb as "lives" instead of "looks".
I'll keep working through considerations on clause and main clause identification that handles comma separated lists and dependent clauses within independent clauses.
ryancasburn-KAI
changed the title
Issue with subject and verb determination in .sentences() with comma separated list of items which include potential verbs
Inaccurate Clause determination leads to wrong choice of verb for the sentence.
Sep 6, 2023
My overall goal is to identify PresentTense sentences with a #Person or #Pronoun subject.
While troubleshooting, I ran into an issue with the following sentence.
Which returns
Clearly the verb should be led, not “state engineering”. I think the trouble here is that both state and engineering can be verbs, but in this sentence should be adjectives of “departments.”
This does work though:
In both cases, the subject is properly identified (as “she”), but when there is a comma separated list of items which include potential verbs (“state” and “engineering”) it seems to skip over the sentence’s actual verb (“led”) and chooses one of the potential verbs from the list (“state”).
Is there something I can do on my side to resolve this, or is there a fix that can be done within this package?
The text was updated successfully, but these errors were encountered: