Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ABI break] Add new structs with version info and readonly flag #101

Closed
wants to merge 5 commits into from

Conversation

tirthasheshpatel
Copy link
Contributor

@tirthasheshpatel tirthasheshpatel commented Mar 25, 2022

Closes #34
Supersedes #72

This PR adds version info to the DLTensor struct. This is under discussion in #34 (see this comment). Since this is an ABI break, the ABI version has been incremented to 2 and the DLPack Version has been bumped to v0.7.0. This also updates the Python spec with a detailed upgrade policy and future ABI compatibility. The upgrade policy has been designed such that the libraries that want to keep supporting (exporting and importing) the old ABI or support multiple DLPack and ABI versions can do so without breaking compatibility with the old Python API.

Comment on lines 212 to 220
/*! \brief Mark the data readonly. */
uint8_t readonly;
/*!
* \brief Endianness of the data. 1 for non-native endianness and
* 0 for native endianness.
*/
uint8_t endianness;
/*! \brief Alignment of the data. */
uint32_t alignment;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes still need to be discussed. Since we are breaking the ABI by adding the version info, it would be good to discuss these here and add them too (so we don't need to break the ABI again)

*/
struct {
uint8_t dlpack;
uint8_t abi;
Copy link
Member

@tqchen tqchen Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us define DLPackVersion struct, for backward compatibility, consider put it in the end(after byte_offset), we need to consider the alignment properties of these fields. Specifically, Version should align to 32 or 64 bit.

Putting them after the main struct will likely help alignment of the current fields as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also make followup compiler support easier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

include/dlpack/dlpack.h Outdated Show resolved Hide resolved
Comment on lines 74 to 75
The producer must set the ``PyCapsule`` name to ``"dltensor"`` if ABI
version 1 is requested and ``"versioned_dltensor"`` if ABI version >= 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked in #98 (review) and I will ask again: With all these changes flush in, shouldn't we bump the DLPack version at least one more time to include #98 #100 before breaking ABIs? Otherwise, ABI version is never set to 1 in any outside work; it only lives in the dev branch here, and this doc change doesn't really make much sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in the release note we can warn downstream that in the next release we'll introduce ABI breaking changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently this is the third time I am asking the same question: #34 (comment). Please, can we get it addressed? @rgommers @tqchen @tirthasheshpatel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked in #98 (review) and I will ask again: With all these changes flush in, shouldn't we bump the DLPack version at least one more time to include #98 #100 before breaking ABIs? Otherwise, ABI version is never set to 1 in any outside work; it only lives in the dev branch here, and this doc change doesn't really make much sense.

Oh, yes, thanks for noticing! I will propose a PR bumping the DLPack version and rebase here once that's merged.

Apparently this is the third time I am asking the same question: #34 (comment).

Sorry for not answering sooner!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have bumped the DLPack version here itself since it makes sense to upgrade both the DLPack and ABI version for an ABI break. Sounds good?

Comment on lines 156 to 164
* ``__dlpack__`` should accept a ``version`` keyword (a Python tuple
``(dlpack_version, dlpack_abi_version)``) which is set to ``None`` by default.
Consumers can use this kwarg to request certain versions of DLPack. If
``version=None`` or the ABI version 1 is requested:

* a capsule named ``"dltensor"`` which uses the old ABI should be returned
(if the producer wants to keep supporting it) or
* a ``BufferError`` should be raised (if the producer doesn't want to keep
support for the old ABI)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, but I think we are moving too fast in this PR. Removing __dlpack_info__ is a design flaw (see also #34 (comment)): Without the producer providing this info upon the consumer's request, how does the consumer know

  • the max version supported by the producer (without a try-except loop until no error)?
  • the sensible combination of dlpack_version and dlpack_abi_version? For example I can pass meaningless combos like (0.5, 2) (such combo does not exist).

Shouldn't the burden be fallen on the producer since they know the best? Maybe we should move the discussion back to #34?

@wjakob
Copy link

wjakob commented Apr 2, 2022

An ABI break may also be an opportunity to reexamine the signed/unsignedness of all DLTensor members (PR #102).

@tirthasheshpatel tirthasheshpatel changed the title [ABI break] Add version info, readonly, endianness, and alignment fields in DLTensor [ABI break] Add new structs with version info and readonly flag Apr 4, 2022
@tirthasheshpatel
Copy link
Contributor Author

Thanks for the help so far everyone! I think the discussion on #34 has settled on adding the __dlpack_info__ method which returns the max version supported by the producer. I have updated the PR with the new spec (that includes __dlpack_info__). I haven't changed capsule renaming yet. @leofang Let us know if remaining the capsule makes sense to you when the new ABI is exported (as mentioned in #34, it provides some extra protection without breaking backward compatibility).

@mattip
Copy link

mattip commented Apr 4, 2022

I originally was in favor of capsule renaming, but I think it is not needed given the consensus around the protocol. While it does provide another layer of safety, I think it is redundant.

@wjakob
Copy link

wjakob commented Apr 4, 2022

There are still a number of things I find confusing about the Python specification. Some of them are related to this specific PR. Some of them are things that are unclear in general.

  1. The stream argument of __dlpack__() can presumably be used to pass a CUDA stream object. Is the interface specific to CUDA? If so, the documentation should probably say so.

CUDA streams are not a natural representation in CUDA. Even in C++, they are simply a pointer to an opaque NVIDIA data structure. Are they passed as capsules, uintptr_t cast to a python int, etc.? The documented interface lacks type annotations to make this clear.

  1. The document mentions a new __dlpack_info__ function but does not provide a clear type-checkable signature of this new function. This is a recipe for inconsistencies. I recommend providing an example raw python version with MyPy-style type signature as an implementation guide.

  2. What are the requirements on the version attribute of __dlpack__? Is it a Python int? If so, this could be stated in the type signature.

@tirthasheshpatel
Copy link
Contributor Author

tirthasheshpatel commented Apr 4, 2022

  • The document mentions a new __dlpack_info__ function but does not provide a clear type-checkable signature of this new function. This is a recipe for inconsistencies. I recommend providing an example raw python version with MyPy-style type signature as an implementation guide.

  • What are the requirements on the version attribute of __dlpack__? Is it a Python int? If so, this could be stated in the type signature.

Yeah, making the signatures explicit should make it easier to understand the spec, thanks.

  1. The stream argument of __dlpack__() can presumably be used to pass a CUDA stream object. Is the interface specific to CUDA? If so, the documentation should probably say so.

stream is an integer; the ID of the GPU device where the data is present. It is specific to CUDA and ROCm (and the devices that use a stream mechanism). This is explicitly stated in the semantics section of the python specification. Is it not clear enough?

@tirthasheshpatel
Copy link
Contributor Author

I originally was in favor of capsule renaming, but I think it is not needed given the consensus around the protocol. While it does provide another layer of safety, I think it is redundant.

Okay, I have removed capsule renaming now.

Also, I have added release notes and updated the DLPack diagram with the new structs. This PR is ready for review again from my side.

@tqchen
Copy link
Member

tqchen commented Apr 5, 2022

@tirthasheshpatel just realized that we checkedin the png image to the repo in a previous path. Is it possible to checkin the image to a different location/repo? On one hand we would like to keep repo self-contained, on the other hand, it can be a bit annoying for a repo to contain a rich history of binaries through multiple revisions.

One approach is to send a PR to a separate repo instead (we used https://github.com/dmlc/web-data) for some of that purposes then link to via https://gitcontent

@tirthasheshpatel
Copy link
Contributor Author

One approach is to send a PR to a separate repo instead (we used https://github.com/dmlc/web-data) for some of that purposes then link to via https://gitcontent

Sounds good! I will remove the image here and propose a PR on dmlc/web-data.

@tqchen
Copy link
Member

tqchen commented Apr 5, 2022

#104

doesn't support any version below the producer's maximum version, a
``BufferError`` should be raised. Similarly, If the producer doesn't
support the requested version, it should raise a ``BufferError``.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the consumer does not specify a version? As I understand things, for the forseeable future the producer should return a V1 structure. So effectively the default is 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the consumer does not specify a version? As I understand things, for the forseeable future the producer should return a V1 structure. So effectively the default is 1.

Yes, the default is 1. I will update the spec to say that instead.

@wjakob
Copy link

wjakob commented Apr 5, 2022

stream is an integer; the ID of the GPU device where the data is present. It is specific to CUDA and ROCm (and the devices that use a stream mechanism). This is explicitly stated in the semantics section of the python specification. Is it not clear enough?

I actually find this part of the documentation quite confusing (That said, I am only familiar with the CUDA part and can't speak about ROCm).

There are three concepts in CUDA that play a role here:

  • Each compatible graphics card/accelerator in a machine is given CUDA device ID.
  • An application can talk to such a CUDA device by creating a CUDA context (details). The CUDA runtime API creates a default context (primary context), but this is not necessarily only one. An application could in principle create multiple contexts to launch applications on the same GPU, each with its own virtual address space on the GPU (meaning that a memory allocation in one context cannot be accessed by another context).
  • In each context, the application can create multiple CUDA streams (details). A CUDA stream has the purpose of ordering kernel launches with respect to each other. It does not play a role with regards to the residency of memory allocations. (Although, to make this even more complicated, the latest CUDA versions now also have what is called stream-ordered memory allocations,I digress...)

Of these three, only the CUDA device ID is representable as an integer. Both CUDA context and CUDA stream are represented by opaque handles to a proprietary data structures, in both CUDA runtime and driver APIs. Essentially a void* pointer.

I think that there is I think a fundamental confusion between the word "stream" and "CUDA context" or "CUDA device ID" in the DLPack documentation. I suspect that will you want to call this "device ID" and not "stream" and also establish the convention that CUDA memory allocations are assumed to be located in the primary context associated with the associated CUDA device ID.

There is one situation in which the notion of a stream would make sense to me: if the tensor is not yet computed and the __dlpack__ function needs to perform a kernel launch to actually compute the data that will then be wrapped. In that case, we might want this computation to be ordered with respect to other computation performed by the caller (which might be issued to the same context+stream pair). However, in this case, providing the stream as an int would not be a suitable interface.

DLPackVersion version;
/*! \brief Mark the data readonly. */
uint8_t readonly;
} DLTensorVersioned;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually one thing that might be useful is to think about how to minimizing the change. Specifically, in light of S0 in #104. We may not need to introduce a separate versioned struct.

The old framework can still use the DLTensor(as implemented in the versioned version) as it is, as long as there is no reliance of sizeof(DLTensor). Perhaps we can rename the old one as DLTensorLegacy; In the similar spirit, it is safe to static_cast a DLTensor* to DLTensorLegacy*, as a result, we do not need to introduce two versions of DLManagedTensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old framework can still use the DLTensor(as implemented in the versioned version) as it is, as long as there is no reliance of sizeof(DLTensor). Perhaps we can rename the old one as DLTensorLegacy; In the similar spirit, it is safe to static_cast a DLTensor* to DLTensorLegacy*, as a result, we do not need to introduce two versions of DLManagedTensor

But DLManagedTensor doesn't have a pointer to the DLTensor struct. Instead, it is just a plain DLTensor dl_tensor field. So, changing the sizeof(DLTensor) will change alignment and size of DLManagedTensor. Which is why we need to rename both structs. I also considered renaming the old structs to DLTensorLegacy and DLManagedTensorLegacy but it seems equivalent to instead add new structs with a different name (new implementations will be able to see both and can use the struct they want to export...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, you are indeed right. This would make the change ABI breaking. Although it would make future ABI change hard because all of them will be non-backward compact due to the change of deleter offset. Would be great to have more deliberation of this topic.

One possible choice is to append other flags(version, read_only) to DLManagedTensor instead. This of course won't be too ideal in case we want to change fields in DLTensor, but would maintain backward compact

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this just push the problem one abstraction layer higher? Now DLManagedTensor has a breaking ABI change.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see. If we move them to DLManagedTensor, we can ensure the location of the deleter function is still at the same offset. Then the code

DLManagedTensorVersioned *managed = (DLManagedTensorVersioned *)PyCapsule_GetPointer(self, "dltensor");
if (managed->deleter) {
         managed->deleter(managed);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although not ideal, I too think it'd be better to add the new fields to the DLManagedTensor instead. This would eliminate the requirement of the *Versioned structs and also make it easier to implement support for the new ABI.

on the array object, which will be called from within ``from_dlpack``,
to access the data, to get the maximum supported DLPack version, and
to query what device the array is on (may be needed to pass in the
correct stream, e.g. in the case of multiple GPUs).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you break this into 3 separate points? They can be subpoints of 2

@@ -96,7 +112,7 @@ C API which is called either when the refcount on the capsule named
PyObject *type, *value, *traceback;
PyErr_Fetch(&type, &value, &traceback);

DLManagedTensor *managed = (DLManagedTensor *)PyCapsule_GetPointer(self, "dltensor");
DLManagedTensorVersioned *managed = (DLManagedTensorVersioned *)PyCapsule_GetPointer(self, "dltensor");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we distinguish between these two cases? For legacy capsules, we must cast to the older struct, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we distinguish between these two cases? For legacy capsules, we must cast to the older struct, right?

The offset of dl_tensor.device doesn't change with this ABI breaking change (and we also don't anticipate such a change in the future). So, casting it to either struct and accessing the field should work.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the location of the deleter function called below does change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the location of the deleter function called below does change.

Oh, sorry. I misunderstood: I thought you were talking about the __dlpack_device__ method. We can have different deleter functions (one for each supported ABI). Not the cleanest way, but can be done using templating.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, this is code created by the producer, not the consumer. When the consumer calls obj.__dlpack__, the producer creates a capsule, and the capsule's deleter function consists of the code here.

docs/source/python_spec.rst Outdated Show resolved Hide resolved
Co-authored-by: Matti Picus <matti.picus@gmail.com>
@leofang leofang mentioned this pull request May 24, 2022
@tqchen
Copy link
Member

tqchen commented Jan 5, 2023

closed in favor of #113

We would like to thanks @tirthasheshpatel for your effort on bringing in this change onward

@tqchen tqchen closed this Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Future ABI compatibility
5 participants