wip: add statistical tests between histograms #503

ikrommyd · 2023-06-10T22:43:30Z

I'm starting an early PR to add statistical tests to histograms after this feature was requested in #500 and I actually believe it is a good and useful feature.
This is an early WIP PR but I'm starting it now in order to make it publicly visible and ask for feedback on the statistics.

ikrommyd · 2023-06-18T19:43:53Z

I think so far all 4 implemented tests have a decent core functionality. I'm saying this by running toys and looking at the pvalue distributions because they should be uniform(0, 1) under the null hypothesis.
Next steps are to add the option to consider flow bins in the tests and I also think there is a way to take bin errors into consideration when peforming Kolmogorov tests.

ikrommyd · 2023-06-19T18:48:12Z

@nsmith- I don't think having an option to consider flow bins in ks_1samp tests makes sense right?
Cause like where exactly in the flow bins are you gonna compare the empirical CDF with an actual CDF function. The flow bins technically extend to infinity.

henryiii · 2023-06-20T17:28:28Z

I'm thinking having flow=True doesn't make ever make sense, how about we just remove that argument for all of them?

src/hist/basehist.py

ikrommyd · 2023-06-20T17:40:43Z

I'm thinking having flow=True doesn't make ever make sense, how about we just remove that argument for all of them?

Hmmm, for chi2 tests it does. You are including the observed number of counts in the flow bins in your tests statistic and comparing vs the expected number of counts (in one sample tests) or with the observed number of counts of another histogram (in two sample tests).
For Kolmogorov tests it's iffy. It doesn't make sense to me at all for one sample, and for two sample it's kinda weird. You can see I did some trick there where I sent the flow bin centers to infinity. It is included in ROOT though for two sample tests betwen TH1s but we can remove that.

ikrommyd · 2023-06-20T17:43:05Z

By the way, pre-commit is complaining for "Too many public methods (23/20)" in basehist.

ikrommyd · 2023-06-23T00:21:17Z

@fabriceMUKARAGE Why did you change it from draft?

fabriceMUKARAGE · 2023-06-23T00:57:38Z

@iasonkrom. oooh, that was a mistake I made when checking the unsuccessful checks. My bad, can we bring it back to the draft?

ikrommyd · 2023-06-23T01:08:14Z

Yup it's back to draft. I was just scared about an accidental merge. Don't worry about failing pytests though. I'm changing things in the statistics and therefore the tests fail if I don't update the values there. But the "real" tests are ran offline where I run toy models to see if my statistics are correct. Proper pytests will be written later since I'm tired of chaging them all the time to pass

fabriceMUKARAGE · 2023-06-23T01:19:57Z

I see it's back to draft mode now. That's true, the proper pytests can be done later, thanks for the insights.

Added pytest.importorskip("mplhep")

henryiii · 2023-06-30T21:09:26Z

src/hist/stats.py

+        )
+
+    if mode == "exact":
+        success, d, pvalue = _attempt_exact_2kssamp(n1, n2, g, d, alternative)


FYI, this really worries me. Using a hidden object is a really bad idea - see copier-org/copier#1225 for the most recent time I've seen someone broken by this! Is it possible to avoid this?

If you mean the function written by scipy, I was trying to avoid writing too many lines since after defining the test statistic from the histograms, performing the test is the same. I could write a similar function and have it here in hist but it would be mainly done by copying code from scipy

I've just added the code in the stats.py file instead of importing from scipy. Do you think this is resolved?

ikrommyd · 2023-06-30T21:16:40Z

@henryiii BTW, I was just busy with other things, I’ll get back to this PR soon. Sorry if I’m being slow.

ikrommyd added 4 commits June 11, 2023 01:38

add only very basic structure

0510a3a

first attemptt at chisquare tests

60443c4

split tests into 4 functions

70e5a71

small updates in chisquare tests

4f8ea69

github-actions bot added the needs changelog label Jun 10, 2023

ikrommyd marked this pull request as draft June 10, 2023 22:43

ikrommyd added 7 commits June 11, 2023 16:05

some fixes for now

07eee77

first attempt at ks_1samp

7efd00e

Merge branch 'main' into feat-stat-tests

b09154c

Merge branch 'main' into feat-stat-tests

5653638

first attempt at ks_2samp

0e0819e

Merge branch 'main' into feat-stat-tests

aab240a

use dtype=int when calculating total entries

75a262b

ikrommyd added 3 commits June 19, 2023 02:13

add flow option in chisquare tests

d21759f

bin centers for ecdf is more correct

d0ed901

add flow option in ks_2samp

4d517f7

henryiii reviewed Jun 20, 2023

View reviewed changes

src/hist/basehist.py Show resolved Hide resolved

henryiii reviewed Jun 20, 2023

View reviewed changes

src/hist/basehist.py Show resolved Hide resolved

Fabricefabfab and others added 2 commits June 22, 2023 21:42

wip: chisquare test suggestions

7d60389

style: pre-commit fixes

223ef96

fabriceMUKARAGE marked this pull request as ready for review June 22, 2023 21:46

ikrommyd marked this pull request as draft June 23, 2023 01:04

fabriceMUKARAGE and others added 3 commits June 27, 2023 21:41

fix: Hist requires mplhep to plot

0c0d92a

Added pytest.importorskip("mplhep")

fix: mplhep required to plot

abcda1b

Added pytest.importorskip("mplhep")

style: pre-commit fixes

e630e3f

henryiii reviewed Jun 30, 2023

View reviewed changes

ikrommyd added 4 commits July 3, 2023 14:43

avoid hidden scipy object

85e9955

Merge branch 'main' into feat-stat-tests

a6ecc71

similar docstrings

d171bd9

pylint compliance

ccc3dae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip: add statistical tests between histograms #503

wip: add statistical tests between histograms #503

ikrommyd commented Jun 10, 2023

ikrommyd commented Jun 18, 2023

ikrommyd commented Jun 19, 2023

henryiii commented Jun 20, 2023

ikrommyd commented Jun 20, 2023

ikrommyd commented Jun 20, 2023

ikrommyd commented Jun 23, 2023

fabriceMUKARAGE commented Jun 23, 2023

ikrommyd commented Jun 23, 2023

fabriceMUKARAGE commented Jun 23, 2023

henryiii Jun 30, 2023

ikrommyd Jun 30, 2023

ikrommyd Jul 3, 2023

ikrommyd commented Jun 30, 2023

wip: add statistical tests between histograms #503

Are you sure you want to change the base?

wip: add statistical tests between histograms #503

Conversation

ikrommyd commented Jun 10, 2023

ikrommyd commented Jun 18, 2023

ikrommyd commented Jun 19, 2023

henryiii commented Jun 20, 2023

ikrommyd commented Jun 20, 2023

ikrommyd commented Jun 20, 2023

ikrommyd commented Jun 23, 2023

fabriceMUKARAGE commented Jun 23, 2023

ikrommyd commented Jun 23, 2023

fabriceMUKARAGE commented Jun 23, 2023

henryiii Jun 30, 2023

Choose a reason for hiding this comment

ikrommyd Jun 30, 2023

Choose a reason for hiding this comment

ikrommyd Jul 3, 2023

Choose a reason for hiding this comment

ikrommyd commented Jun 30, 2023