Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add a stats.differential_entropy function #13631

Merged
merged 15 commits into from
Apr 1, 2021

Conversation

vnmabus
Copy link
Contributor

@vnmabus vnmabus commented Mar 1, 2021

Adds a function to compute differential entropy given a sample from a continuous distribution.

Reference issue

Closes #4080.

What does this implement/fix?

Adds a differential_entropy function, which computes differential entropy using the Vasicek estimator.

@tylerjereddy tylerjereddy added enhancement A new feature or improvement scipy.stats labels Mar 2, 2021
Copy link
Contributor

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 % patch diff coverage of the new code and the paper is really well cited as Matt recently noted.

I added a few comments before the stats regulars have a deeper look.

scipy/stats/_distn_infrastructure.py Outdated Show resolved Hide resolved
scipy/stats/_distn_infrastructure.py Outdated Show resolved Hide resolved
scipy/stats/tests/test_distributions.py Outdated Show resolved Hide resolved
scipy/stats/tests/test_distributions.py Outdated Show resolved Hide resolved
scipy/stats/_distn_infrastructure.py Show resolved Hide resolved
Copy link
Member

@ilayn ilayn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes in the text

scipy/stats/_distn_infrastructure.py Outdated Show resolved Hide resolved
scipy/stats/_distn_infrastructure.py Outdated Show resolved Hide resolved
scipy/stats/_distn_infrastructure.py Outdated Show resolved Hide resolved
Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Good documentation, good input validation, good tests, good PEP8. "No" might be an acceptable answer to all these comments. (Haven't checked the code, yet, but it is short so there probably won't be much to say.)

scipy/stats/tests/test_distributions.py Outdated Show resolved Hide resolved
scipy/stats/_distn_infrastructure.py Show resolved Hide resolved
Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming that this is in very good shape and could merge as-is. But before I do, I'd appreciate another maintainer's thoughts on which file this should be in (here) and whether it is important to add a heuristic for the window size (here).

Copy link
Member

@ilayn ilayn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the general parts this looks good. I'll leave it to stats residents about your points Matt.

@vnmabus
Copy link
Contributor Author

vnmabus commented Mar 9, 2021

There is a segmentation fault in a MacOS test, but I think it has nothing to do with these changes.

@mdhaber mdhaber closed this Mar 20, 2021
@mdhaber mdhaber reopened this Mar 20, 2021
@mdhaber
Copy link
Contributor

mdhaber commented Mar 20, 2021

Re-running tests. I think this just needs opinions from a second maintainer who is familiar with stats. Depending on their thoughts, it may be ready to merge as-is.
Update: email sent to developer mailing list.

Copy link

@tupuinui tupuinui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few remarks on tests on my side. Also, if you know you will add more methods, I would still suggest to have the parameter method now in case the following PR do not make it for the release. Just my opinion for discussion here.

Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that, I used the wrong account.

scipy/stats/tests/test_distributions.py Outdated Show resolved Hide resolved
scipy/stats/_distn_infrastructure.py Outdated Show resolved Hide resolved
scipy/stats/tests/test_distributions.py Outdated Show resolved Hide resolved
- Testing of scalar results now also uses `assert_allclose`.
- We use the context manager form of `assert_raises`.
scipy/stats/_distn_infrastructure.py Show resolved Hide resolved
scipy/stats/_distn_infrastructure.py Outdated Show resolved Hide resolved
Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as @mdhaber proposes to do a follow-up PR to move the functions to morestats.py. The small doctest remark I had could be addressed there (if you agree OC).

Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if there's any more feedback from the mailing list. If not, I'll go ahead and merge this weekend, then submit a followup to move this into morestats.py if it still seems like a good idea.

@tirthasheshpatel
Copy link
Member

tirthasheshpatel commented Mar 29, 2021

@vnmabus Can you also merge master, please? This will help get rid of some unrelated documentation build errors. Once you do that, you can add autodoc_typehints='none' to doc/source/conf.py to get rid of the other warnings. Please see this diff for reference (ignore the .github/workflows change.).

@mdhaber
Copy link
Contributor

mdhaber commented Mar 29, 2021

I merged master.

@tirthasheshpatel
Copy link
Member

tirthasheshpatel commented Mar 30, 2021

Thanks for the changes @vnmabus! The doc builds pass now and the annotations also look correct. So, +1 for merging! @rgommers You were complaining about warnings with autodoc_typehints='none' in #13631 (comment) and gh-11749 but the builds pass. So, do you agree that this is safe to merge?

@tupui
Copy link
Member

tupui commented Mar 30, 2021

Indeed the logs are clean. It's great if we can finally do inline type hints! I wonder why it works now.

@mdhaber
Copy link
Contributor

mdhaber commented Apr 1, 2021

OK I'm going to go ahead and merge this. In a followup PR I'll suggest moving it to a more appropriate file.

@mdhaber mdhaber merged commit 25286d5 into scipy:master Apr 1, 2021
@mdhaber
Copy link
Contributor

mdhaber commented Apr 1, 2021

Thanks @vnmabus and reviewers @tupui @ilayn!

@tupui
Copy link
Member

tupui commented Apr 1, 2021

Thanks @mdhaber. So it means we can do inline types now right? We should make sure everyone is aware of this and start enforcing it. What do you think?

@mdhaber
Copy link
Contributor

mdhaber commented Apr 1, 2021

I don't know. As for enforcing it, that should be determined by the maintainers group as a whole. Please explain the advantages when proposing it. I don't know anything about it yet. My only experience with inline types so far is CI failures and unfamiliar-looking function signatures, so I would need to understand that the benefits outweigh the hassle before supporting it. I'm happy to learn things that will improve SciPy for users or make SciPy more maintainable, though.

@tupui
Copy link
Member

tupui commented Apr 2, 2021

The immediate advantage with type hints is the support from IDE. If the whole code is typed, your IDE can catch type errors better, and just help you when you type. It's difficult to come back to not typed code after that 😉.
The only thing I don't like is that it would be redundant with types in docstrings. There might be a way to tell Sphinx to use the types. Or it's a matter of time before someone does a module for that.

@tirthasheshpatel
Copy link
Member

tirthasheshpatel commented Apr 2, 2021

Would also like to mention the mypyc project that generates efficient C/Cython code from typed python code. It's very premature and not ready to use now but it can offer a good performance benefit to SciPy in the future.

@tupui
Copy link
Member

tupui commented Apr 2, 2021

Would also like to mention the mypyc project that generated efficient C/Cython code from typed python code. It's very premature and not ready to use now but it can offer a good performance benefit to SciPy in the future.

Never heard of it, thanks! Sounds a bit like Pythran.

@ilayn
Copy link
Member

ilayn commented Apr 3, 2021

While it totally helps a professional programmer in the general setting which something I notice everyday in my colleagues' workflow professionally in Go or other langs, I find it not so helpful in scientific code since most types that are relevant are ints floats bools and similar. In turn they make the function signatures very ugly and very often unreadable. As an example, I use Spyder and it already handles the signatures just fine enough and the complicated object linking like BankCustomerReport object is passed to a BankReporter function etc. is not present so I think we can handle . To be honest, I don't see the point of using types in Python but what do I know as a regular user.

@vnmabus
Copy link
Contributor Author

vnmabus commented Apr 3, 2021

To be honest, I don't see the point of using types in Python but what do I know as a regular user.

Well, to be honest I have found a lot of errors thanks to type annotations in my thesis project (which is scientific code), but what do I know as a regular user 😜.

@vnmabus
Copy link
Contributor Author

vnmabus commented Apr 3, 2021

The immediate advantage with type hints is the support from IDE. If the whole code is typed, your IDE can catch type errors better, and just help you when you type. It's difficult to come back to not typed code after that wink.
The only thing I don't like is that it would be redundant with types in docstrings. There might be a way to tell Sphinx to use the types. Or it's a matter of time before someone does a module for that.

There is a way to tell Sphinx to use the types. Use autodoc_typehints="description". But that does not currently work for class parameters and attributes in my experience, and the types shown are the same that you use in the annotation, so programmers should be familiar with type hints in order to read them.

@tirthasheshpatel
Copy link
Member

In turn they make the function signatures very ugly and very often unreadable.

I absolutely agree they can be ugly at times. I also agree that it makes code unreadable at first but once you get used to it you can read and understand it just like normal Python code. Anyway, we can always create separate .pyi files to let the end-users enjoy the merits of type hints and not get bothered by them in the source code.

@vnmabus vnmabus deleted the differential_entropy branch April 5, 2021 08:43
@rgommers rgommers changed the title Differential entropy ENH: add a stats.differential_entropy function Apr 5, 2021
@rgommers rgommers mentioned this pull request Apr 5, 2021
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

entropy in Scipy
9 participants