Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewind Wordnet data files? #2867

Closed
ekaf opened this issue Oct 27, 2021 · 1 comment · Fixed by #2868
Closed

Rewind Wordnet data files? #2867

ekaf opened this issue Oct 27, 2021 · 1 comment · Fixed by #2868
Labels

Comments

@ekaf
Copy link
Contributor

ekaf commented Oct 27, 2021

Before the recent merge of #2860, wordnet.get_version() would fail whenever a read had already occurred from the data.adj file:

import nltk
from nltk.corpus import wordnet as wn
print(wn.get_version())

3.0

print(wn.synset_from_pos_and_offset('a',1740))

Synset('able.a.01')

print(wn.get_version())

None

The cause of the problem is that synset_from_pos_and_offset goes to a given offset in a data file, reads a line, and stays there. This function is called all the time by other functions in the wordnet module. For ex., in wordnet.doctest, synset_from_pos_and_offset is called indirectly over 10,000 times.

But now that get_version ensures that reading the data.adj starts at byte offset 0, it probably doesn't matter anywhere else, because there may not be another function that needs to read from the beginning of the data file.

It might be safer to add data_file.seek(0) at the end of synset_from_pos_and_offset, but I am not sure that it is worth the additional processing. On the other hand, seek(0) is so cheap an operation that the time penalty would hardly be noticeable.

@tomaarsen
Copy link
Member

I wouldn't worry much about additional processing costs of adding seek(0). I'm more concerned about correct program execution here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants