New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using the latest Wordnet 2021 version #2885
Comments
Since globalwordnet/english-wordnet#777 (comment) was recently solved, the only problem remaining is the sense ordering. So the difference between the candidate packages is not so big now, and the safest choice might be to use the official package at https://john.mccr.ae/oewn2021/english-wordnet-2021.zip |
OEWN 2021 is now supported through #2860 |
@ekaf So are there remaining problems with OEWN 2021 support ? |
The present issue is closed because the problem is not in NLTK support, but in OEWN 2021 itself, as explained in the issues cited above (globalwordnet/english-wordnet#773 (comment), globalwordnet/english-wordnet#774 (comment)), which are still open. |
Open English Wordnet 2021 (https://github.com/globalwordnet/english-wordnet) was recently released in a format compatible with NLTK:
https://john.mccr.ae/oewn2021/english-wordnet-2021.zip
It can work out-of-the-box with NLTK's wordnet.py module, by just replacing nltk_data/corpora/wordnet. However, a few thousand problems have already been reported in 3 issues: (globalwordnet/english-wordnet#773 (comment), globalwordnet/english-wordnet#774 (comment) and globalwordnet/english-wordnet#777 (comment)). In particular, no parser would be able to separate the examples from the definitions.
An alternative package with fewer problems is available from the X-englishwordnet project (https://github.com/x-englishwordnet/wndb):
https://x-englishwordnet.github.io/wndb/xewn_compat.zip
This package has quoted examples, and only half the sense number instability, compared with the official package. So the dilemma is which package to use: the official or the alternative?
Open English Wordnet has anounced that support for this legacy database format will be discontinued sometime in the future,
and recommends using the XML format instead. However, this appears to have the same sense stability problems as the WNDB format used by NLTK.
The text was updated successfully, but these errors were encountered: