Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn about nonexistent OMW offsets #2974

Merged
merged 1 commit into from
Apr 12, 2022
Merged

Warn about nonexistent OMW offsets #2974

merged 1 commit into from
Apr 12, 2022

Conversation

ekaf
Copy link
Contributor

@ekaf ekaf commented Apr 3, 2022

Fixes #2973: instead of raising a fatal error when an offset is not found in WordNet, only raise a warning, so that batch processing can continue. This is especially handy with OMW, which is known to contain some references to nonexistent offsets (cf. omwn/omw-data#24 (comment)).

from nltk.corpus import wordnet as wn

print(wn.synsets('osim', lang='hrv'))

With this PR, a list of offfsets is returned, with None instead of eventual incorrect offsets, in addition to any correct offset:

[Synset('apart.r.01'), None]

Meanwhile, a non-fatal warning is wriiten to the standard error stream, with details about the nonexistent synset offset:

nltk/corpus/reader/wordnet.py:1536: UserWarning: No WordNet synset found for pos=a at offset=2002046.
  f"No WordNet synset found for pos={pos} at offset={offset}."

Note: maybe it could be worthwile to go through the 12 calls of the WordNetError class in wordnet.py, to see if some other of the raised errors could also be replaced by a non-fatal call of warnings.warn().

@stevenbird stevenbird merged commit da22bb0 into nltk:develop Apr 12, 2022
@stevenbird
Copy link
Member

Thanks @ekaf. Please feel free to raise an issue concerning the other uses of WordNetError.

@ekaf ekaf mentioned this pull request Apr 14, 2022
@ekaf ekaf deleted the warn_omw branch April 16, 2022 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nonexistent OMW offsets
2 participants