Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid lemma for had contraction #5373

Closed
pszpetkowski opened this issue Apr 28, 2020 · 2 comments · Fixed by #5379
Closed

Invalid lemma for had contraction #5373

pszpetkowski opened this issue Apr 28, 2020 · 2 comments · Fixed by #5379
Labels
feat / lemmatizer Feature: Rule-based and lookup lemmatization lang / en English language data and models perf / accuracy Performance: accuracy

Comments

@pszpetkowski
Copy link

I'm not sure if this issue is in scope of this project, since as far as I know it's only possible to figure if the 'd contraction is actually had or would from the context of the sentence, but most of the time spaCy seems to work with contractions as expected and it would be nice to be able to rely on it.

How to reproduce the behaviour

import spacy
nlp = spacy.load("en_core_web_lg")
doc = nlp("I'd a dream")
print(doc[1].lemma_)
> would

The result I'd expect to print is have instead of would.

Your Environment

  • spaCy version: 2.2.4
  • Platform: Linux-5.6.7-arch1-1-x86_64-with-glibc2.2.5
  • Python version: 3.8.2
@svlandeg svlandeg added feat / lemmatizer Feature: Rule-based and lookup lemmatization lang / en English language data and models perf / accuracy Performance: accuracy labels Apr 29, 2020
@adrianeboyd
Copy link
Contributor

Thanks for the report! This is coming from a rule (in the tokenizer exceptions) that assigns the lemma/tag would/MD to the contraction 'd. I think it would make sense to remove would/MD and let the tagger handle it instead. The tagger is still probably going to get this wrong a fair amount of the time (and the tagger will probably do better on 3rd person pronouns than 1st/2nd), but it doesn't make sense for a rule to say it's always would.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 5, 2021

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feat / lemmatizer Feature: Rule-based and lookup lemmatization lang / en English language data and models perf / accuracy Performance: accuracy
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants