Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cannot write Int64 hist (with np.uint64 backend) to root file using uproot #526

Open
op3 opened this issue Aug 17, 2023 · 3 comments

Comments

@op3
Copy link

op3 commented Aug 17, 2023

Describe the bug

It is not possible to write hist objects with storage=hist.storage.Int64() to root files using uproot

Steps to reproduce

The following code reproduces the problem:

import hist
import uproot

with uproot.recreate("out.root") as f:
    hist_double = hist.Hist(hist.axis.Regular(10, 0, 10), storage=hist.storage.Double())
    hist_int64 = hist.Hist(hist.axis.Regular(10, 0, 10), storage=hist.storage.Int64())
    f["hist_double"] = hist_double
    f["hist_int64"] = hist_int64

hist_double is written as expected. root files can easily store int64 histograms, however, the last line results in the following exception:

ValueError: data to convert to TArray must have signed integer or floating-point type, not dtype('>i8')

The uproot to_TArray function looks it only supports writing signed integers, and hist.storage.Int64() (surprisingly?) uses a np.uint64 backend for Int64(). Is there a workaround to write hist objects to root files using uproot without conversion to Double?

@jpivarski
Copy link
Member

Just as a side-comment about surprise: if an unsigned 64-bit integer needs to be extended to handle signed values, the only possible extension is floating point:

>>> np.array([1, 2, 3], np.uint64) + np.array([1, 2, 3], np.int8)
array([2., 4., 6.])

(Floating point numbers are the only ones that cover the domain of large uint64 values, greater than 2**63 - 1, as well as negative values. There isn't an integer type with both features.)


On the actual issue, the Uproot to_TArray function only handles the dtypes listed because those are the only ones that have a TArray specialization in ROOT:

We could change to_TArray to do some conversions for the TArray types that don't exist in ROOT. Technically, the proper thing to do with uint64 is to expand it to float64, not truncate it to int64. Although we rarely care about the values between 2**63 - 1 and 2**64 - 1, this type promotion is well established, to the point where someone else would think it's a bug if uint64 got converted into int64.

Should this instead be an issue on boost-histogram about hist.storage.Int64 being unsigned? Does this Storage type support negative weights (or any weights at all)? If so, then it can't be unsigned. Anyway, the name is misleading; "Int64" is taken to mean "signed" in many languages and libraries.

@op3
Copy link
Author

op3 commented Aug 18, 2023

boost-histogram defines the different storage types here, and int64 is defined to use int64_t. This was previously changed from uint64_t to int64_t.

Indeed, despite conflicting documentation but in agreement with its name, Int64 hist-objects will happily accept negative integers:

myh = hist.Hist.new.Regular(10, 0, 10).Int64()
myh[0] = -2  # works
assert(myh.values().dtype == np.int64)  # True
assert(myh.values().dtype.type == np.int64)  # False
assert(myh.values().dtype.type == np.long)  # True

So, there is no reason why storing Int64 histograms (hist/boost-histogram) into root files should not be possible, but uproot disagrees (because issubclass(np.longlong, np.int64) == False).

See also this numpy bug report.

@henryiii
Copy link
Member

Conflicting docs probably came from the original start using uint. Only having 9223372036854775807 as a maximum instead of 18446744073709551615 isn't a bit deal, and it's nice to be able to support negative weights, histogram subtraction, etc. And it was needed to match NumPy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants