New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support OMW 1.4 #2899
Support OMW 1.4 #2899
Conversation
Example use, adapted from #2423 (comment)
Wordnet v. 3.1
Synset('dog.n.01') Synset('spy.n.01') |
Wordnet v. 3.0
als: 5988 words in 4675 synsets |
After the latest commit, pytest succeeds on windows but fails on mac and ubuntu.
|
@ekaf |
Annoyingly, this does not seem to be working: The broken cache of nltk_data is still being used. I can't really tell why. The This is getting a bit frustrating. |
@tomaarsen, there seem to be workarounds at r-lib/actions#86 |
@tomaarsen, here is a commit that looks like it worked: geocompx/geocompr@9189efb |
@tomaarsen: prefixing the key with "new-" in .github/workflows/ci.yaml actually cleared the cache. This needs to also be done a second place in the file, for the new cache to be used instead of the old one. Maybe an explanation why only changing the secret didn't work could be that this variable is interpreted as void (I'm just guessing...). |
This comment was marked as spam.
This comment was marked as spam.
@tomaarsen, the changes to .github/workflows/ci.yaml don't belong in this PR, since they solve a completely different problem. So maybe that part should be split out into another PR about clearing cached dependencies. However, since I don't control the ${{ secrets.CACHE_VERSION }} variable, I feel that you would be better equipped to handle this. On the other hand, the update to wordnet.py is acutely needed in order to fix the new issue #2905 (comment), which arises because the new OMW package was merged into nltk_data, without also merging the present PR. Please let me know me if there is anything I can do about this. |
Even if this fails because of some cache issue, we can just merge as I know this passes locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've scheduled some time this morning to look at this PR. I've reverted to using the "normal" key, and it seems like the cache has been refreshed by now. I've also created a helper method for Synset.definition() and Synset.examples(), as the code for these was near identical.
Beyond that, I had to update some doctests which were failing due to the nltk_data changes.
If these tests are failing for you locally, then either:
- Your
omw
nltk_data is outdated, or - You've updated your
omw
nltk_data, but the old files were not removed. Deleting theomw
folder withinnltk_data
and re-downloading will solve this. Alternatively, you can delete the entirenltk_data
and redownload it all.
This PR is ready for merging as far as I can tell.
The problematic thing is - nltk_data cannot be pinned to some older version. People can't say "Oh, my NLTK is locked to 3.2.5, so I'll use the nltk_data that works with that version". Because of these changes, no NLTK version works like expected, with the exception of this PR.
It is a priority that we merge this PR, and publish a new version.
In part due to this PR and its consequences, I believe it's time to release 3.7.0 rather than 3.6.6. After all, the nltk_data changes essentially deprecate all currently released NLTK versions, I'm afraid.
@tomaarsen , I'm sorry for all the trouble you have with this PR.
|
@ekaf This is likely a consequence of having an outdated |
@tomaarsen yes, you are right, with the new inaugural package all tests now succeed. |
Glad to hear! I'll merge this, so people with issues like #2905 at least have a solution that isn't just using this PR. Thanks for these changes, and thanks for bearing with me while we've been having these cache issues. |
Definitions and examples also work with Albanian ('als'):
Wordnet v. 2021
als lemmas:[Lemma('school.n.02.mësonjëtore'), Lemma('school.n.02.shkollë')]
als definition:['institucion arsimor ku mëson dhe edukohet në mënyrë të organizuar brezi i ri; një institucion i tillë i specializuar; ndërtesa e këtij institucioni']
als examples:['Shkolla është ndërtuar më 1932', 'Ai shkon në shkoll çdo ditë'] |
This PR adapts the multilingual functions in wordnet.py to use the new OMW-data 1.4 (nltk/nltk_data#171), the recent release of the Open Multilingual Wordnet.
The directory structure of the new nltk_data/corpora/omw package has a slightly different layout, where each folder name indicates the provenance of any number of wordnets included in the corresponding folder.
For English and Italian, OMW now includes wordnets from two different provenances, so the lang parameter needs to eventually encode the provenance, in cases where more wordnets exist for the same language.
Also, in addition to lemmas, some wordnets in OMW 1.4 now also include definitions (def) and examples (exe).
This PR supports both the new and the old omw formats.