Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inaccurate Clause determination leads to wrong choice of verb for the sentence. #1035

Open
ryancasburn-KAI opened this issue Sep 6, 2023 · 3 comments
Labels

Comments

@ryancasburn-KAI
Copy link
Contributor

My overall goal is to identify PresentTense sentences with a #Person or #Pronoun subject.

While troubleshooting, I ran into an issue with the following sentence.

nlp("She led a diverse advisory committee throughout the process that included membership from schools, city planning departments, state engineering departments, transit agencies, and AARP.").sentences().json()

Which returns

{
  subject: "she"
  verb: "state engineering"
  predicate: "departments"
  grammar: {tense: "PresentTense"}
}

Clearly the verb should be led, not “state engineering”. I think the trouble here is that both state and engineering can be verbs, but in this sentence should be adjectives of “departments.”

This does work though:

nlp("She led a diverse advisory committee throughout the process that included membership from city planning departments and state engineering departments.").sentences().json()
{
  subject: "she"
  verb: "led"
  predicate: "a diverse advisory committee throughout the proces…ing departments and state engineering departments"
  grammar: { tense: "PastTense"}
}

In both cases, the subject is properly identified (as “she”), but when there is a comma separated list of items which include potential verbs (“state” and “engineering”) it seems to skip over the sentence’s actual verb (“led”) and chooses one of the potential verbs from the list (“state”).

Is there something I can do on my side to resolve this, or is there a fix that can be done within this package?

@spencermountain
Copy link
Owner

hey Ryan - yep, your intuitions are correct. The tagger gets borked on the ambiguous words and the long list.
I think the error is in .clauses() - it may be confusing those commas with sentence fragments, then interpreting '^state engineering ..' like '^keep engineering..'.

You can poke-around at the tagger logs if you're interested, by adding nlp.verbose('tagger'). Fraid I don't see a quick solution on this sentence, other than adding nlp(txt, {'state engineering':'Noun'}) as a quick one.

In all, I'd really like to improve the logic for parsing the SVO of a sentence. You can see the logic right now is really poor.
Let me know if you find anything else
cheers

@ryancasburn-KAI
Copy link
Contributor Author

Thank you for the suggestion for the quick fix. That obviously fixes just that sentence. I'll keep poking around to see if I can solve it more generally. Thanks for your quick response!

@ryancasburn-KAI
Copy link
Contributor Author

I'm an engineer, not a linguist, but I do agree that .clauses() is the origin and I think there may be other issues too:

Found this here:

My friend who lives in London looks like Homer Simpson.

This should have two clauses: "My friend looks like Homer Simpson" and "who lives in London"

Compromise only finds a single clause and identifies the sentence's verb as "lives" instead of "looks".

I'll keep working through considerations on clause and main clause identification that handles comma separated lists and dependent clauses within independent clauses.

@ryancasburn-KAI ryancasburn-KAI changed the title Issue with subject and verb determination in .sentences() with comma separated list of items which include potential verbs Inaccurate Clause determination leads to wrong choice of verb for the sentence. Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants