Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synset offset problems in data.verb #776

Closed
ekaf opened this issue Nov 10, 2021 · 2 comments
Closed

Synset offset problems in data.verb #776

ekaf opened this issue Nov 10, 2021 · 2 comments
Labels
release format This issue refers to the WNDB or RDF export, so no changes will be made to this repository

Comments

@ekaf
Copy link
Contributor

ekaf commented Nov 10, 2021

Release format
WNDB

Describe the bug
Synset offset problems in data.verb in the new OEWN 2021 release

To Reproduce

grep 2766066 {data.verb,index.sense,index.verb}

index.sense:stress%2:29:01:: 02766066 4 0
index.verb:stress v 4 3 + @ ~ 4 0 01007737 00977858 01791013 02766066  

Expected behavior

The 02766066 offset should exist in data.verb, but it doesn't

Additional context

The test by @mirfan899 (#771), which recently succeeded, now fails:

  File "/home/E/Prog/Script/py/Nltk/Branches/Develop/nltk/corpus/reader/wordnet.py", line 1554, in _synset_from_pos_and_line
    sense_index = offsets.index(synset._offset)
ValueError: 2764280 is not in list

This is the converse problem: the 2764280 offset is in data.verb, but not in the indexes.

@ekaf ekaf added the release format This issue refers to the WNDB or RDF export, so no changes will be made to this repository label Nov 10, 2021
@ekaf
Copy link
Contributor Author

ekaf commented Nov 10, 2021

Joining data verb with the verb offsets in index.sense fails with the following statistics:

  70 Missing index
  69 Missing data
 139 total

Maybe the fact that these numbers are roughly balanced in both files could explain something?

@jmccrae
Copy link
Member

jmccrae commented Nov 10, 2021

Fixed (cause of this was I did something stupid)

@jmccrae jmccrae closed this as completed Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release format This issue refers to the WNDB or RDF export, so no changes will be made to this repository
Projects
None yet
Development

No branches or pull requests

2 participants