Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smarter UHI in slices #485

Open
henryiii opened this issue Dec 19, 2020 · 10 comments
Open

Smarter UHI in slices #485

henryiii opened this issue Dec 19, 2020 · 10 comments
Labels
good first issue Good for newcomers

Comments

@henryiii
Copy link
Member

Axis proxy objects passed to the UHI locators will gain _slice_ and _slice_index_ properties that allow a UHI locator to detect the context of use. They will be optional from the locator's point of view - existing locators will continue to work.

This will allow locators to detect which part of a slice they are in, allowing much more powerful locators to be written by users.

We will use this new power to fix bh.overflow as an endpoint of a slice.

@HDembinski, on the matter of modifying bh.loc to avoid flow bins - You were in favor of it not hiding flow bins in slices either, @jpivarski was neutral, and I was in favor of it avoiding flow bins in slices. I'm willing to concede that now; it's easier to understand it as "the index at this point" if it uniformly includes flow bins in both cases, and matches Boost.Histogram better. As long as we have the ability to write a smart locator, I'm okay with the default one being straightforward but possibly exposing flow bins. For most users, I'm hoping they will leave of the end point for inclusive of flow, and use 0 / len for cutting flow bins off, not relying on bh.loc at the edges.

I do think a smart, range inclusive locator that does not include flow would be quite useful, and might be introduced in hist one day. But I do think that's not bh.loc / hist.loc.

Originally posted by @henryiii in #479 (comment)

@henryiii henryiii added this to the 1.1.0 milestone Dec 19, 2020
@HDembinski
Copy link
Member

I made this comment before: it is generally bad design to make objects context aware. I wrote down the alternative, which is to replace the slice object instead of making locators slice aware. The drawback of that is that the special slice syntax does not work anymore, but it is clean.

I think we also agreed that we don't need slice-aware locators right now. Adding infrastructure for stuff that is not currently needed is adding unnecessary complexity.

@HDembinski
Copy link
Member

I forgot the argument about bh.overflow.

@henryiii
Copy link
Member Author

In our call, the main outcome was that that locators needed to be smarter (this was @jpivarski's idea, and we all agreed on this). We came to that decision because we were discussing whether bh.loc should include the flow bins or not when in a slice, and the only way to write a locator that could even do that was to make the locators smarter. The secondary outcome was that we decided to make the default locators not include the flow bins; you were against it, I was for it, and Jim really just cared about the point above, because that gives users the ability to write both locators and ignore what we chose for the default. Looking at it further, I'm still slightly in favor of avoiding it, but I'm willing to concede your point for the default bh.loc.

This is completely orthogonal to the other problem which I didn't even realize was causing an issue in the call - the upper limit of a slice is non-inclusive, so the meaning of a locator in the upper bin might be different than the lower bin. This is a second argument in favor of smart locators - this is valuable information that users writing locators need to be able to express all possibilities in a locator.

There probably is a decision to be made here - do we want the default locators to be smart, expressing 'meaning' vs. a simple transform. Currently bh.loc is a dumb locator: loc(.5) turns into the bin number, and that's it. But that's why you were confused and tried to write this sort of test: h[loc(-100):loc(100):sum] == h[::sum]. loc(100) turns into len(axis) exactly as you'd expect, but Python slices do not include the upper bound, so this does not include the bin with loc(100) in it. You would have to write h[loc(-100):loc(100)+1:sum]. You wrote your tests assuming that bh.loc was smart - if you were confused, imagine new users!

This is also the issue with bh.overflow - h[bh.underflow:bh.overflow:sum] does not include the overflow bin! Regardless of what we do with bh.loc, bh.overflow should include the overflow even if it is the endpoint! It should have the meaning of "overflow", not just exactly convert to len.

I am willing to concede and let bh.loc include the flow bins if we want it to be just a simple value-to-bin meaning. Changing the meaning of bh.loc to be smarter will break older code; if we were writing it from scratch, I'd probably make it smart and be inclusive - but that's a pretty big change. Keeping the default simple and letting users add fancier ones is probably best here.

Your new value_slice object is completely against the design of UHI. The U in UHI stands for Unified or Universal - anyone can write a locator, and they don't need to get us to modify boost-histogram to use it. Coffea may want locators that work for them instead of ours - and they can write them, there's nothing special about ours. Slices are provided by Python, they can come from anywhere. You want to add a custom, boost-histogram only object that no other library can provide and you are designing it to fix only one, very specific problem. It will complicate the internals of boost-histogram to check for this special object and transform it.

it is generally bad design to make objects context aware

Not if it is useful information to know where you are in a slice. The slices do not know which axis (as in, what axis number) they are operating with - that would be too much context information. But being able to tell h[a] from h[:a] is not too much context information. However, to your point, maybe removing _slice_ from the context information, and only including _slice_index_ might be a good idea. I wasn't being conservative with context information, but maybe it's better to be.

_slice_index_ would be None in a non-slice index, and 0 or 1 in a slice.

@henryiii
Copy link
Member Author

henryiii commented Dec 21, 2020

PS: I also don't want to duplicate the interface, which is why I am not proposing we add a separate "smart loc", just that we make it possible for a future user like Hist, Coffea, etc to write one. If we really want a smart loc, we should probably find some way to transition to bh.loc being smart. That's another strike against value_slice too.

@HDembinski
Copy link
Member

In our call, the main outcome was that that locators needed to be smarter (this was @jpivarski's idea, and we all agreed on this).

That was not the main outcome for me. The main outcome was the decision to use the behavior of bh.loc that you suggested, which is simple and consistent. I never agreed to the motion that locators should be context aware.

But that's why you were confused and tried to write this sort of test: h[loc(-100):loc(100):sum] == h[::sum]. loc(100) turns into len(axis) exactly as you'd expect, but Python slices do not include the upper bound, so this does not include the bin with loc(100) in it. You would have to write h[loc(-100):loc(100)+1:sum]. You wrote your tests assuming that bh.loc was smart - if you were confused, imagine new users!

I was confused, because I remembered that we decided a while ago that value selections should be inclusive on both ends and I expected boost-histogram to work like Boost.Histogram. You explained very well why that would be weird, and I eventually agreed. By no means did I agree to make bh.loc context aware. It is better if loc is dumb, because then its behavior is easily predictable. Things should behave consistently and interfaces should be plain. There should be no hidden modification of the behavior depending on context. I want bh.loc to behave the same way in a slice as it getitem. That's the main point we agree to the in call.

@HDembinski
Copy link
Member

HDembinski commented Dec 22, 2020

I can include the underflow or overflow bin already by not specifying the lower or upper bound in the slice, which is perfect for me and I think also for the users. This whole discussion about h[loc(-100):loc(100):sum] was only to clarify that there is now yet another difference between C++ Boost.Histogram and boost-histogram. We should continue to keep these differences minimal, but internal consistency with Python idioms was and is a primary goal for boost-histogram, and so some sacrifices have to be made.

@HDembinski
Copy link
Member

HDembinski commented Dec 22, 2020

Your new value_slice object is completely against the design of UHI. The U in UHI stands for Unified or Universal - anyone can write a locator, and they don't need to get us to modify boost-histogram to use it. Coffea may want locators that work for them instead of ours - and they can write them, there's nothing special about ours. Slices are provided by Python, they can come from anywhere. You want to add a custom, boost-histogram only object that no other library can provide and you are designing it to fix only one, very specific problem. It will complicate the internals of boost-histogram to check for this special object and transform it.

You are just fixated on using locators and use the Python slice. The natural object to express a range is a slice-like object, not some context-aware locators inside a Python slice. Be honest, you want the short-hand slice syntax and that's your main motivation here. You could just add custom slice objects to the UHI if you wanted to. Then everyone can write slice objects with special semantics.

If you insist on context-aware locators, we need another call with Jim, because I am strongly against that. I also don't see a need to add context-aware locators. The original problem is solved, one-sided slices include the corresponding overflow bins.

@HDembinski
Copy link
Member

HDembinski commented Dec 22, 2020

Summary:
Context-aware locators are bad. For consistency, locators should behave in a slice exactly like they behave in getitem.

@jpivarski
Copy link
Member

If you insist on context-aware locators, we need another call with Jim, because I am strongly against that.

Which needs another call, making context-aware locators available for users writing their own locators or using that context-dependence in locators provided by boost-histogram?

We did talk about both on the last call. If we do another call, let's do it over a Google Doc and post the conclusions as minutes. I don't mind another conversation, but I think we're going in circles.

@henryiii
Copy link
Member Author

We did talk about both on the last call. If we do another call, let's do it over a Google Doc and post the conclusions as minutes. I don't mind another conversation, but I think we're going in circles.

Sorry, doesn't work. I took notes last time and immediately posted them: #479 (comment) explicitly to avoid issues like this, as they've happened in the past. But it didn't work.


Outcome of meeting:

  • UHI locators need to be able to detect if they are in a slice to enable advanced tricks. They only receive the axis as an argument, so my thought would be to include a _slice_ object on the axis (proxy) that they receive - this would even allow them to detect what the action was. You would also need to indicate which item in the slice was being called, _slice_index_?
  • The default bh.loc would avoid including overflow/underflow bins by default, following the standard indexing rule that including an endpoint does not include flow. Writing a custom loc function that would include these would be easy.
  • Boost.Histogram should follow the current design and include the underflow/overflow bin if included in the crop command. We don't actually use any boost-histogram indexing commands that data coordinates, because we always convert to indices first using UHI.

Originally posted by @henryiii in #479 (comment)

@henryiii henryiii modified the milestones: 1.1.0, 1.2.0 Jul 7, 2021
@henryiii henryiii removed this from the 1.3.0 milestone Apr 15, 2022
@gohil-jay gohil-jay added the good first issue Good for newcomers label Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
Status: No status
Development

No branches or pull requests

4 participants