Adds collector accumulator #378

HDembinski · 2020-06-22T17:51:26Z

C++ collector class for weights
Python wrapper for collector (halfway)
Tests
Docs

I don't know how to wrap this in an awkward array as a view. Without awkward, it would most naturally be represented as array((<shape of histogram>), dtype=object)

HDembinski · 2020-06-22T18:14:45Z

@henryiii To add a new accumulator, I currently have to change the code in many places. Not only do I need to register my accumulator in C++, but I also need to add it explicitly to src/boost_histogram/accumulators.py and src/boost_histogram/cpp/accumulators.py. It would be great to automate this. Adding things should be easy.

HDembinski · 2020-06-22T21:11:38Z

@henryiii mypy fails with a wrong positive. When are we dropping Python 2 support? It is hindering this patch.

henryiii · 2020-06-22T21:21:05Z

You can disable mypy with # type: ignore if you need to.

But "Adding things should be easy." - we need to be careful - the procedure is clear and standard - if we automate too much, either with runtime magic (bad) or generation scripts (better), then that introduces more tooling to maintain, more special things unique to this one library only. Are we really planning for that many additions here? We have to recompile anyway, and we don't have this exposed as a public API for external extension modules, so keeping it a little repetitive but simple should benefit us in the long run.

Now if we come up with a way to add custom additions (which should be doable for storages), then we would benefit from a generation tool, that would be a public API and should be designed as such (and then used internally, too).

When are we dropping Python 2 support?

With Version 1.0, probably mid-Summer. However, it is acceptable to leave off some features as Python 3 only.

HDembinski · 2020-06-22T21:39:39Z

But "Adding things should be easy." - we need to be careful - the procedure is clear and standard - if we automate too much, either with runtime magic (bad) or generation scripts (better), then that introduces more tooling to maintain, more special things unique to this one library only. Are we really planning for that many additions here? We have to recompile anyway, and we don't have this exposed as a public API for external extension modules, so keeping it a little repetitive but simple should benefit us in the long run.

I can't follow your reasoning. The accumulators are a customization point, perhaps not for users but for us devs. When I add an accumulator, I don't want to manually change the code in several places.

Why not drop Python 2 support now? 1.0 seems arbitrary. It is either dropping Python 2 or I have to rewrite my code for this patch.

HDembinski · 2020-06-22T21:41:29Z

If you look into the code, you can see how I automated this.

HDembinski · 2020-06-22T21:46:27Z

Any repetition in code is bad, we want to be DRY.

henryiii · 2020-06-23T01:20:19Z

array((), dtype=object)

It's slow and ugly, but fine for a first run. We could add easily Awkward support later.

henryiii · 2020-06-23T01:32:11Z

Why not drop Python 2 support now? 1.0 seems arbitrary.

Randomly deciding that a feature patch should cause a major Python compatibility change is arbitrary. I have an outline and plan that has been announced and followed for about a year. Only the timing has been thrown off (mostly by COVID-19 creating an extra month of work for me). We need a roughly feature complete version (1.0), and then we can drop Python 2 support. That way, if we are picked up by experiment stacks that are stuck in Python 2, we can still be used, and we can back port fixes if needed. That's why I've put so much work into the Python 2 porting of a variety of features. If you want to wait until 1.0 is ready to merge this patch, though, that's fine with me. I can also help fix it in the near future.

henryiii · 2020-06-23T02:30:40Z

Any repetition in code is bad, we want to be DRY.

This is not an absolute rule, just a guiding principle. Also, we really aren't talking about code duplication, but rather the equivalent of definitions - it's a little irritating to list items in multiple places, but it provides static code analysis benefits - not just for MyPy, but also for code completion tools, Sphinx (which can't build the C++ code, so relies on the Python files only), and for human readers of the code. For a simplified and rather bad comparison, this is why from x import * is bad - you can't see where things come from without digging further, but if you list each item instead of *, you can trace down where things come from easily. Not an absolute rule either, but often, for mostly static code, being explicit helps tools and users down the line.

I'm not against code duplication, but I don't like additions that break static analysis.

henryiii · 2020-06-23T02:33:29Z

src/boost_histogram/accumulators.py

@@ -1,15 +1,23 @@
 # -*- coding: utf-8 -*-
 from __future__ import absolute_import, division, print_function

-from ._core.accumulators import Sum, Mean, WeightedSum, WeightedMean


This is taking a very simple static list, and doing run time manipulations with a function that has more lines than the code it replaces, breaking static analysis. We are also losing any ability to not follow the specific naming scheme in the future if something different is added.

If we add unit tests for a new type here, that will immediately break if a developer forgets to update this static list.

Also, PyBind11 is anything but DRY...

Remember, _core is monkey-patched for documentation, so everything in it should be explicitly imported. It is also ignored for static analysis, so there again, everything should be explicitly imported. Explicit is better than implicit.

This (unrelated to list accumulators) change is also what is breaking Python 2!

I don't know what pybind11 has to do with it, and on the contrary, it is a good example for being dry. It is even stated in their docs, that they strongly prefer minimal code to do the work. Minimal code equates avoiding redundancy.

Your counter arguments make no sense to me. The code is explicit, explicit in the forwarding and transformation rules. I don't have a problem with not being able to do static analysis here.

If you have another solution that allows me to easily add an accumulator without changing the code in several places, then go ahead. For now this is better than it was before.

HDembinski · 2020-06-30T08:40:49Z

Any repetition in code is bad, we want to be DRY.

This is not an absolute rule, just a guiding principle. Also, we really aren't talking about code duplication, but rather the equivalent of definitions - it's a little irritating to list items in multiple places, but it provides static code analysis benefits - not just for MyPy, but also for code completion tools, Sphinx (which can't build the C++ code, so relies on the Python files only), and for human readers of the code. For a simplified and rather bad comparison, this is why from x import * is bad - you can't see where things come from without digging further, but if you list each item instead of *, you can trace down where things come from easily. Not an absolute rule either, but often, for mostly static code, being explicit helps tools and users down the line.

I'm not against code duplication, but I don't like additions that break static analysis.

We have different priorities. I consider static analyis a minor priority, because it is really not that important in this library. A good design is one, which requires changes only in one place to add a new accumulator. One of the core principles of boost::histogram is to make it easy to add new storages, axes, accumulators. I want the same to be true for boost-histogram. We have rules how the Pythonic names relate to the C++ names. These rules can be written in code.

Edit: To be precise, I want it to be easy to add accumulators, axes, and storages in C++. The wrapping to Python should work largely automatic, using TMP in C++ and dynamic processing on the Python side.

HDembinski · 2020-06-30T08:42:01Z

We need a roughly feature complete version (1.0), and then we can drop Python 2 support.

Why do we need that? Only because you wrote it in a plan?

wip

115a781

tests for weight_collector and generic accumulators modules

c95c3bd

return and test an array view

7df2b0e

deleting _load

9ddb6f9

henryiii reviewed Jun 23, 2020

View reviewed changes

henryiii mentioned this pull request Sep 28, 2020

refactor!: view should return a consistent object #459

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds collector accumulator #378

Adds collector accumulator #378

HDembinski commented Jun 22, 2020 •

edited

HDembinski commented Jun 22, 2020

HDembinski commented Jun 22, 2020

henryiii commented Jun 22, 2020

HDembinski commented Jun 22, 2020

HDembinski commented Jun 22, 2020

HDembinski commented Jun 22, 2020

henryiii commented Jun 23, 2020

henryiii commented Jun 23, 2020

henryiii commented Jun 23, 2020 •

edited

henryiii Jun 23, 2020 •

edited

henryiii Jun 23, 2020

henryiii Jun 23, 2020

henryiii Jun 23, 2020

HDembinski Jun 30, 2020

HDembinski Jun 30, 2020

HDembinski Jun 30, 2020

HDembinski commented Jun 30, 2020 •

edited

HDembinski commented Jun 30, 2020

Adds collector accumulator #378

Are you sure you want to change the base?

Adds collector accumulator #378

Conversation

HDembinski commented Jun 22, 2020 • edited

HDembinski commented Jun 22, 2020

HDembinski commented Jun 22, 2020

henryiii commented Jun 22, 2020

HDembinski commented Jun 22, 2020

HDembinski commented Jun 22, 2020

HDembinski commented Jun 22, 2020

henryiii commented Jun 23, 2020

henryiii commented Jun 23, 2020

henryiii commented Jun 23, 2020 • edited

henryiii Jun 23, 2020 • edited

Choose a reason for hiding this comment

henryiii Jun 23, 2020

Choose a reason for hiding this comment

henryiii Jun 23, 2020

Choose a reason for hiding this comment

henryiii Jun 23, 2020

Choose a reason for hiding this comment

HDembinski Jun 30, 2020

Choose a reason for hiding this comment

HDembinski Jun 30, 2020

Choose a reason for hiding this comment

HDembinski Jun 30, 2020

Choose a reason for hiding this comment

HDembinski commented Jun 30, 2020 • edited

HDembinski commented Jun 30, 2020

HDembinski commented Jun 22, 2020 •

edited

henryiii commented Jun 23, 2020 •

edited

henryiii Jun 23, 2020 •

edited

HDembinski commented Jun 30, 2020 •

edited