Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add a function that gives values split into categories #438

Open
Dominic-Stafford opened this issue Aug 8, 2022 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@Dominic-Stafford
Copy link

We're currently migrating our analysis framework from using coffea histograms [1] to this package, and one difference we've encountered is that while in coffea the values method would give a mapping from all the identifiers of the different category axes to the corresponding bins:

>>> hist.values()
{('duck',): array(5.), ('goose',): array(6.)}

hist would just give the bare array:

>>> hist.values()
array([5., 6.])

Sometimes the latter is more useful, but it would also be nice to have a function that gave the first output, as working out which bins correspond to which categories can be hard for larger histograms. I'm not entirely sure what this function should be called (or if it should be an option of values), but if you feel this would be helpful I'd be happy to try implementing it

[1] https://coffeateam.github.io/coffea/modules/coffea.hist.html

@Dominic-Stafford Dominic-Stafford added the enhancement New feature or request label Aug 8, 2022
@henryiii
Copy link
Member

henryiii commented Aug 8, 2022

You have an array each - could this be more than one value? ('duck',): array([1,2,3])? If not, then I think that's just zip(hist.axes[0], hist.values), and I'd rather document a simple procedure than make a method you have to learn and look up for it unless it's quite natural and expected.

I think this would be a Stack method, actually. Maybe even .values on a Stack? Also we might not currently support a Stack of single bin histograms, but that could be fixed if we don't.

@Dominic-Stafford
Copy link
Author

Yes, the arrays can be more than one value, in general it's a mapping from a tuple of all the combinations of category axes an array of the value of the remaining axes, so for instance for a hist with two category axes, "species" and "colour", and a regular axis with three bins, the output might be:

{('duck', 'red'): array([5., 4., 2.]), ('goose', 'red'): array([6., 3., 7.]), ('duck', 'blue'): array([3., 1., 5.]), ('goose', 'blue'): array([1., 2., 4.])}

I hadn't looked at the stack function till now, as it seems currently the only operation one can directly do on a stack is plot it (and I would prefer the values as a dict to inspect in the terminal/manipulate in code), but adding a .values function there which gave a dict might be nice, though for the case of multiple categories this wouldn't be sufficient. Maybe one could also add the ability to call .stack on a Stack, then finally do .values, but I think it might be simpler to have a single function to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants