Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use it with NLTK? #771

Closed
mirfan899 opened this issue Nov 2, 2021 · 5 comments
Closed

How to use it with NLTK? #771

mirfan899 opened this issue Nov 2, 2021 · 5 comments
Labels
enhancement New feature or request

Comments

@mirfan899
Copy link

mirfan899 commented Nov 2, 2021

Is your feature request related to a problem? Please describe.
I'm trying to use english-wordnet-2021.zip with NLTK. I have copied the files and placed it in wordnet directory of nltk_data

Some synsets works but when I try to use all the synsets it throws error which I am unable to fix.

Here is what it looks like.

synsets = wordnet.all_synsets()
output = []
# for synset in synsets:
for index, synset in enumerate(synsets):
    try:
        print(wordnet.synset(synset._name).lemmas())
        for lemma in wordnet.synset(synset._name).lemmas():
            t = lemma.synset()._name.split(".")[1]
            key = lemma.key()
            print(key)
            r = str(wordnet.ss2of(lemma.synset()))
            offset = r[:r.index("-")]+t
            print(offset)
            output.append(key + " " + "wn:"+offset)
    except:
        print(synset , "name Not found==================================================")

For example word agricultural throws error.

from nltk.corpus import wordnet
print(wordnet.synsets("agricultural"))
Traceback (most recent call last):
  File "/home/irfan/environments/ewiser/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1440, in _synset_from_pos_and_line
    columns_str, gloss = data_file_line.strip().split("|")
ValueError: too many values to unpack (expected 2)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "<input>", line 2, in <module>
  File "/home/irfan/environments/ewiser/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1611, in synsets
    for p in pos
  File "/home/irfan/environments/ewiser/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1613, in <listcomp>
    for offset in index[form].get(p, [])
  File "/home/irfan/environments/ewiser/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1414, in synset_from_pos_and_offset
    synset = self._synset_from_pos_and_line(pos, data_file_line)
  File "/home/irfan/environments/ewiser/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1530, in _synset_from_pos_and_line
    head_lemma = synset.similar_tos()[0]._lemmas[0]
  File "/home/irfan/environments/ewiser/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 198, in similar_tos
    return self._related("&")
  File "/home/irfan/environments/ewiser/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1079, in _related
    r = [get_synset(pos, offset) for pos, offset in pointer_tuples]
  File "/home/irfan/environments/ewiser/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1079, in <listcomp>
    r = [get_synset(pos, offset) for pos, offset in pointer_tuples]
  File "/home/irfan/environments/ewiser/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1414, in synset_from_pos_and_offset
    synset = self._synset_from_pos_and_line(pos, data_file_line)
  File "/home/irfan/environments/ewiser/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1524, in _synset_from_pos_and_line
    raise WordNetError("line %r: %s" % (data_file_line, e)) from e
nltk.corpus.reader.wordnet.WordNetError: line '02020442 00 a 01 rural 0 011 ! 02022522 a 0101 + 04961506 n 0102 & 02020981 a 0000 & 02021158 a 0000 & 02021320 a 0000 & 02021613 a 0000 & 02021727 a 0000 & 02021895 a 0000 & 02022057 a 0000 & 02022225 a 0000 & 02022388 a 0000 | of or relating to the countryside as opposed to the city; living in or characteristic of farming or country life| living in or characteristic of farming or country life; rural people; large rural households; unpaved rural roads; an economy that is basically rural; rural electrification; rural free delivery  \n': too many values to unpack (expected 2)

After removing the extra |, at line 11279 of data.adj, it throws another error.

from nltk.corpus import wordnet
print(wordnet.synsets("agricultural"))
Traceback (most recent call last):
  File "<input>", line 2, in <module>
  File "/home/irfan/environments/ewiser/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1611, in synsets
    for p in pos
  File "/home/irfan/environments/ewiser/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1613, in <listcomp>
    for offset in index[form].get(p, [])
  File "/home/irfan/environments/ewiser/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1419, in synset_from_pos_and_offset
    raise WordNetError("No WordNet synset found for pos={0} at offset={1}.".format(pos,offset))
nltk.corpus.reader.wordnet.WordNetError: No WordNet synset found for pos=a at offset=2020981.

Describe the solution you'd like
Is there a way to use it with NLTK properly.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@mirfan899 mirfan899 added the enhancement New feature or request label Nov 2, 2021
@jmccrae
Copy link
Member

jmccrae commented Nov 2, 2021

My understanding is that support is under development from the NLTK team. I think @ekaf can comment in more detail.

@ekaf
Copy link
Contributor

ekaf commented Nov 2, 2021

It is still too early to use the EWN 2021 release candidate with NLTK, because issues are being fixed just these days. The situation is already much improved with the very latest release though. There are more details under issue #747.

@mirfan899
Copy link
Author

Okay, thanks. Will check later.

@arademaker
Copy link
Member

Maybe you can consider using the new wn Python package?

https://github.com/goodmami/wn

@ekaf
Copy link
Contributor

ekaf commented Dec 7, 2021

The 'develop' branch of NLTK now supports OEWN 2021 through nltk/nltk#2860

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants