Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicates in wordnet hypernyms closure #3244

Open
ekaf opened this issue Mar 25, 2024 · 0 comments · May be fixed by #3245
Open

Duplicates in wordnet hypernyms closure #3244

ekaf opened this issue Mar 25, 2024 · 0 comments · May be fixed by #3245

Comments

@ekaf
Copy link
Contributor

ekaf commented Mar 25, 2024

The relation closure for wordnet synsets is supposed to prevent duplicates in the output. However, the duplicates check fails to detect some repetitions, which occur when there are multiple paths to a given synset. In the following output for ex., the branch going from 'taxonomic_group.n.01' to 'entity.n.01' appears twice, because it is reachable by two different paths:

from nltk.corpus import wordnet as wn
ss=wn.synset("calamagrostis.n.01")
print(list(ss.closure(lambda s: s.hypernyms())))

[Synset('gramineae.n.01'), Synset('monocot_genus.n.01'), Synset('monocot_family.n.01'), Synset('genus.n.02'), Synset('family.n.06'), Synset('taxonomic_group.n.01'), Synset('taxonomic_group.n.01'), Synset('biological_group.n.01'), Synset('biological_group.n.01'), Synset('group.n.01'), Synset('group.n.01'), Synset('abstraction.n.06'), Synset('abstraction.n.06'), Synset('entity.n.01'), Synset('entity.n.01')]

Produce an SVG image to illustrate this graph:

from nltk.parse.dependencygraph import dot2img
print(dot2img(wn.digraph([ss])))

calamagrostis-out-closureyield txt

@ekaf ekaf linked a pull request Mar 25, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant