Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reference to entropy implementation used #3229

Merged
merged 9 commits into from Jan 29, 2024
Merged

Conversation

mbauwens
Copy link
Contributor

This is in reference to issue #2961.

This is just a simple addition to the docs, making it more clear that the entropy implementation in NLTK is the one based on the Shannon-McMillan-Breiman theorem, as used and referenced by Jurafsky (Chapter 3 page 23 - 3.49) and Jordan Boyd-Graber (lm.txt file in this comment).

This PR does not include the alternative implementation of the entropy function that was proposed in the issue.

Making it more clear that the entropy implementation in NLTK is the one based on the Shannon-McMillan-Breiman theorem, as used and referenced by Jurafsky and Jordan Boyd-Graber.
Copy link
Contributor

@iliakur iliakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits, otherwise lgtm, thanks!

nltk/lm/api.py Outdated Show resolved Hide resolved
nltk/lm/api.py Show resolved Hide resolved
mbauwens and others added 2 commits January 26, 2024 17:34
Co-authored-by: Ilia Kurenkov <ilia.kurenkov@gmail.com>
Co-authored-by: Ilia Kurenkov <ilia.kurenkov@gmail.com>
Copy link
Contributor Author

@mbauwens mbauwens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes in formatting of text. Look's good!

@mbauwens mbauwens requested a review from iliakur January 26, 2024 16:37
add entropy_extended with length normalisation and relative frequency weighting. Also adapt perplexity_extended to work with this new entropy measure.
nltk/lm/api.py Outdated Show resolved Hide resolved
@mbauwens mbauwens marked this pull request as draft January 26, 2024 18:58
@mbauwens mbauwens marked this pull request as ready for review January 26, 2024 19:11
@mbauwens mbauwens requested a review from iliakur January 26, 2024 19:11
@mbauwens
Copy link
Contributor Author

@iliakur yeah, so a bit messy here, but I've reverted the PR to just the update to the docs. I now have a separate branch on my fork that I'll be getting ready for a PR for an alternative entropy/perplexity method, including tests :)

@iliakur
Copy link
Contributor

iliakur commented Jan 27, 2024

@stevenbird @tomaarsen this PR is ready to merge. I've been away from nltk development long enough that I figured I'd check in with you two about any formalities that may be missing.

@tomaarsen
Copy link
Member

I'm afraid I've also not had much time for NLTK recently. That said, I think this is good to go. Thanks for taking care of this, and nice to see you here again @mbauwens.

  • Tom Aarsen

@tomaarsen tomaarsen linked an issue Jan 29, 2024 that may be closed by this pull request
@tomaarsen tomaarsen merged commit f2a92bd into nltk:develop Jan 29, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nltk.lm.api entropy formula source?
3 participants