Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for uint16, uint32, and uint64 #58734

Open
Tracked by #58743
pmeier opened this issue May 21, 2021 · 22 comments
Open
Tracked by #58743

Support for uint16, uint32, and uint64 #58734

pmeier opened this issue May 21, 2021 · 22 comments
Labels
ezyang's list Stuff ezyang doesn't want to lose feature A request for a proper, new feature. module: python array api Issues related to the Python Array API oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@pmeier
Copy link
Collaborator

pmeier commented May 21, 2021

The array API specification stipulates the data types that we need to support to be compliant. Currently we are missing support for uint16, uint32, and uint64.

cc @mruberry @rgommers @asmeurer @leofang @AnirudhDagar @asi1024 @emcastillo @kmaehashi @ezyang @msaroufim @wconstab @bdhirsh @anijain2305 @zou3519 @gchanan @soumith @ngimel

@pmeier pmeier added the module: python array api Issues related to the Python Array API label May 21, 2021
@H-Huang H-Huang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 22, 2021
@rgommers
Copy link
Collaborator

Someone just asked about other uint types on Slack. And from pytorch/vision#4326 (comment):

This is because PyTorch doesn't (yet) support uint16, and is also a problem when reading PNG images of type uint16.

16-bit image support making uint16 the most interesting one of the missing dtypes had been my guess before.

There are no plans to work on this issue currently I think, unless more demand materializes.

@NicolasHug
Copy link
Member

NicolasHug commented Nov 13, 2021

There are no plans to work on this issue currently I think, unless more demand materializes.

I'll add one :)

Another tangible need for uint16 support in torchvision is pytorch/vision#4731

We added support for native 16 bits png decoding in torchvision, but we can't make this API public for now, because we output int32 tensors and this wouldn't be compatible with the rest of our transforms.
It'd be great if we could make it public because Pillow 16 bits png support is fairly limited.

@kernelmethod
Copy link

kernelmethod commented Feb 20, 2022

Bumping this.

My research collaborators and I are working on some cryptographic applications where we could really use uint32 / uint64. Some operations on Z_{2^64} that we'd like to calculate with secure multi-party computation, e.g. comparison, would be a lot more straightforward to implement with unsigned integers as the underlying dtype.

@a-gn
Copy link

a-gn commented Jan 18, 2023

We have a GIS pipeline that uses image transforms from multiple projects, some of which only support uint, but uint8 leads to too much loss of color information. We could really use uint16 support.

@neelnanda-io
Copy link

I would appreciate uint16 support! I'm trying to do NLP stuff with a large dataset of tokens between 0 and 51000, and it's annoying to consume double the storage to keep them as int32s (I'm currently storing them as uint16 via HuggingFace, but I need to load them as NumPy and manually convert them)

@oliver-batchelor
Copy link

I'm doing work on HDR imaging and we read images from the camera as 16-bit unsigned. It's possible to work around it by using other frameworks but it would be really useful.

@VladShtompel
Copy link

I'm doing work on HDR imaging and we read images from the camera as 16-bit unsigned. It's possible to work around it by using other frameworks but it would be really useful.

This is exactly the issue me and my team are faced with right now.

@StrongChris
Copy link

I'm doing work with DICOM data that is often 10 or even 14 unsigned bits. A uint16 would be very nice for these! My work is focused on speed so using the smallest possible datatype would be very appreciated.

@ezyang
Copy link
Contributor

ezyang commented Apr 22, 2023

We should add these dtypes, and then build out support via PT2. We probably aren't going to add kernels for everything but Triton makes it very easy to JIT compile these operations.

@NicolasHug
Copy link
Member

@ezyang would Triton be able to enable CPU support?

@ezyang
Copy link
Contributor

ezyang commented Apr 24, 2023

Not Triton per se, but we have a CPU inductor backend, so the answer is yes!

@soulitzer
Copy link
Contributor

From Triage review: We still need some limited eager support, e.g. factory functions, conversion functions. Also consideration with autocast? (maybe not too bad?)

@vadimkantorov
Copy link
Contributor

Also, bit ops are only well-defined/standardized in CPUs for unsigned dtypes if I understand well: #105465

@vadimkantorov
Copy link
Contributor

uint16 would also be useful for interop with opencv (CV_16U dtype)

@tchaton
Copy link

tchaton commented Nov 22, 2023

Hey, torch.uint16 would be good to encode text into tokens to reduce memory footprint from uint32 when the vocab isn't too big.

@smdrnks
Copy link

smdrnks commented Nov 22, 2023

+1, also have a language modelling use case where uint16 could save quite some memory. Would be great to have this.

@DrDryg
Copy link

DrDryg commented Dec 7, 2023

I would also appreciate support for uint. Im developing software using for image processing with libtorch as a backend and it would be very useful with support for uint. In particular uint16 but uint32 and uint64 would be nice too.

@penguinwu penguinwu added the feature A request for a proper, new feature. label Dec 12, 2023
@vadimkantorov
Copy link
Contributor

A related issue on having uint16 images:

@ezyang ezyang added the ezyang's list Stuff ezyang doesn't want to lose label Jan 1, 2024
@ezyang
Copy link
Contributor

ezyang commented Jan 1, 2024

Some dumb problems we will have to work out.

  • uint8 tensor accumulates into int64 tensor, which makes sense when you don't have uint64 tensor but makes a lot less sense when you have uint64 tensor. This leads to a potential inconsistency with the larger types; in particular, it is an incredibly bad idea for uint64 to accumulate into int64, and uint32 is probably not a good idea either. A short term stop gap is to leave sum unimplemented but chances are someone will come asking for it. By the way, this is a use case for defining arithmetic operations on our bits types: uint should have a wider accumulate type to prevent overflow, but bits would never widen and always do modular arithmetic.

@vadimkantorov
Copy link
Contributor

I think it's better to go ahead and have some dtype representations even if meaningful ops are not supported at first and only conversions/casts/reinterprets/restride are implemented: mainly for expected interop and faithfulness of representation. As long as there is a dedicated docs page for that dtype with explained quirks, I think it's fine

Same reasoning might apply for the following :)

@vadimkantorov
Copy link
Contributor

Regarding sum, maybe a transitory option might be to require explicit args specifying the out_dtype and acc_dtype? (it would also be nice to elide temporary full upcasting allocations
#55366)

ezyang added a commit that referenced this issue Jan 3, 2024
The dtypes are very useless right now (not even fill works), but it makes torch.uint16, uint32 and uint64 available as a dtype.

Towards #58734

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this issue Jan 3, 2024
The dtypes are very useless right now (not even fill works), but it makes torch.uint16, uint32 and uint64 available as a dtype.

Towards #58734

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this issue Jan 4, 2024
The dtypes are very useless right now (not even fill works), but it makes torch.uint16, uint32 and uint64 available as a dtype.

Towards #58734

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this issue Jan 4, 2024
The dtypes are very useless right now (not even fill works), but it makes torch.uint16, uint32 and uint64 available as a dtype.

Towards #58734

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this issue Jan 5, 2024
The dtypes are very useless right now (not even fill works), but it makes torch.uint16, uint32 and uint64 available as a dtype.

Towards #58734

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this issue Jan 5, 2024
The dtypes are very useless right now (not even fill works), but it makes torch.uint16, uint32 and uint64 available as a dtype.

Towards #58734

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this issue Jan 7, 2024
The dtypes are very useless right now (not even fill works), but it makes torch.uint16, uint32 and uint64 available as a dtype.

Towards #58734

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: #116594
Approved by: https://github.com/albanD
ghstack dependencies: #116698, #116693
@ionutmodo
Copy link

ionutmodo commented Mar 6, 2024

Hi guys! I would like to add another usage for uint16 on GPUs, which can be used in designing efficient adaptive sparse optimizers.

thewtex added a commit to thewtex/itk-wasm that referenced this issue Apr 18, 2024
When working with torch, the output is often float32. Torch does not
have good support for conversion to uint types:

  pytorch/pytorch#58734

Support float32 and the signed integer types for convenience.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ezyang's list Stuff ezyang doesn't want to lose feature A request for a proper, new feature. module: python array api Issues related to the Python Array API oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests