[ABI break] Add new structs with version info and readonly flag #101

tirthasheshpatel · 2022-03-25T12:01:22Z

Closes #34
Supersedes #72

This PR adds version info to the DLTensor struct. This is under discussion in #34 (see this comment). Since this is an ABI break, the ABI version has been incremented to 2 and the DLPack Version has been bumped to v0.7.0. This also updates the Python spec with a detailed upgrade policy and future ABI compatibility. The upgrade policy has been designed such that the libraries that want to keep supporting (exporting and importing) the old ABI or support multiple DLPack and ABI versions can do so without breaking compatibility with the old Python API.

tirthasheshpatel · 2022-03-25T12:05:28Z

include/dlpack/dlpack.h

+  /*! \brief Mark the data readonly. */
+  uint8_t readonly;
+  /*!
+  * \brief Endianness of the data. 1 for non-native endianness and
+  *  0 for native endianness.
+  */
+  uint8_t endianness;
+  /*! \brief Alignment of the data. */
+  uint32_t alignment;


These changes still need to be discussed. Since we are breaking the ABI by adding the version info, it would be good to discuss these here and add them too (so we don't need to break the ABI again)

tqchen · 2022-03-25T14:53:23Z

include/dlpack/dlpack.h

+  */
+  struct {
+    uint8_t dlpack;
+    uint8_t abi;


Let us define DLPackVersion struct, for backward compatibility, consider put it in the end(after byte_offset), we need to consider the alignment properties of these fields. Specifically, Version should align to 32 or 64 bit.

Putting them after the main struct will likely help alignment of the current fields as well.

This will also make followup compiler support easier

Done, thanks!

include/dlpack/dlpack.h

leofang · 2022-03-25T15:13:12Z

docs/source/python_spec.rst

+The producer must set the ``PyCapsule`` name to ``"dltensor"`` if ABI
+version 1 is requested and ``"versioned_dltensor"`` if ABI version >= 2


I asked in #98 (review) and I will ask again: With all these changes flush in, shouldn't we bump the DLPack version at least one more time to include #98 #100 before breaking ABIs? Otherwise, ABI version is never set to 1 in any outside work; it only lives in the dev branch here, and this doc change doesn't really make much sense.

Also, in the release note we can warn downstream that in the next release we'll introduce ABI breaking changes.

Apparently this is the third time I am asking the same question: #34 (comment). Please, can we get it addressed? @rgommers @tqchen @tirthasheshpatel

I asked in #98 (review) and I will ask again: With all these changes flush in, shouldn't we bump the DLPack version at least one more time to include #98 #100 before breaking ABIs? Otherwise, ABI version is never set to 1 in any outside work; it only lives in the dev branch here, and this doc change doesn't really make much sense.

Oh, yes, thanks for noticing! I will propose a PR bumping the DLPack version and rebase here once that's merged.

Apparently this is the third time I am asking the same question: #34 (comment).

Sorry for not answering sooner!

I have bumped the DLPack version here itself since it makes sense to upgrade both the DLPack and ABI version for an ABI break. Sounds good?

leofang · 2022-03-25T15:25:21Z

docs/source/python_spec.rst

+* ``__dlpack__`` should accept a ``version`` keyword (a Python tuple
+  ``(dlpack_version, dlpack_abi_version)``) which is set to ``None`` by default.
+  Consumers can use this kwarg to request certain versions of DLPack. If
+  ``version=None`` or the ABI version 1 is requested:
+
+  * a capsule named ``"dltensor"`` which uses the old ABI should be returned
+    (if the producer wants to keep supporting it) or
+  * a ``BufferError`` should be raised (if the producer doesn't want to keep
+    support for the old ABI)


I am sorry, but I think we are moving too fast in this PR. Removing __dlpack_info__ is a design flaw (see also #34 (comment)): Without the producer providing this info upon the consumer's request, how does the consumer know

the max version supported by the producer (without a try-except loop until no error)?

the sensible combination of dlpack_version and dlpack_abi_version? For example I can pass meaningless combos like (0.5, 2) (such combo does not exist).

Shouldn't the burden be fallen on the producer since they know the best? Maybe we should move the discussion back to #34?

wjakob · 2022-04-02T10:25:30Z

An ABI break may also be an opportunity to reexamine the signed/unsignedness of all DLTensor members (PR #102).

tirthasheshpatel · 2022-04-04T06:39:42Z

Thanks for the help so far everyone! I think the discussion on #34 has settled on adding the __dlpack_info__ method which returns the max version supported by the producer. I have updated the PR with the new spec (that includes __dlpack_info__). I haven't changed capsule renaming yet. @leofang Let us know if remaining the capsule makes sense to you when the new ABI is exported (as mentioned in #34, it provides some extra protection without breaking backward compatibility).

mattip · 2022-04-04T06:59:05Z

I originally was in favor of capsule renaming, but I think it is not needed given the consensus around the protocol. While it does provide another layer of safety, I think it is redundant.

wjakob · 2022-04-04T07:48:14Z

There are still a number of things I find confusing about the Python specification. Some of them are related to this specific PR. Some of them are things that are unclear in general.

The stream argument of __dlpack__() can presumably be used to pass a CUDA stream object. Is the interface specific to CUDA? If so, the documentation should probably say so.

CUDA streams are not a natural representation in CUDA. Even in C++, they are simply a pointer to an opaque NVIDIA data structure. Are they passed as capsules, uintptr_t cast to a python int, etc.? The documented interface lacks type annotations to make this clear.

The document mentions a new __dlpack_info__ function but does not provide a clear type-checkable signature of this new function. This is a recipe for inconsistencies. I recommend providing an example raw python version with MyPy-style type signature as an implementation guide.
What are the requirements on the version attribute of __dlpack__? Is it a Python int? If so, this could be stated in the type signature.

tirthasheshpatel · 2022-04-04T15:53:25Z

The document mentions a new __dlpack_info__ function but does not provide a clear type-checkable signature of this new function. This is a recipe for inconsistencies. I recommend providing an example raw python version with MyPy-style type signature as an implementation guide.

What are the requirements on the version attribute of __dlpack__? Is it a Python int? If so, this could be stated in the type signature.

Yeah, making the signatures explicit should make it easier to understand the spec, thanks.

The stream argument of __dlpack__() can presumably be used to pass a CUDA stream object. Is the interface specific to CUDA? If so, the documentation should probably say so.

stream is an integer; the ID of the GPU device where the data is present. It is specific to CUDA and ROCm (and the devices that use a stream mechanism). This is explicitly stated in the semantics section of the python specification. Is it not clear enough?

tirthasheshpatel · 2022-04-05T12:10:04Z

I originally was in favor of capsule renaming, but I think it is not needed given the consensus around the protocol. While it does provide another layer of safety, I think it is redundant.

Okay, I have removed capsule renaming now.

Also, I have added release notes and updated the DLPack diagram with the new structs. This PR is ready for review again from my side.

tqchen · 2022-04-05T12:58:58Z

@tirthasheshpatel just realized that we checkedin the png image to the repo in a previous path. Is it possible to checkin the image to a different location/repo? On one hand we would like to keep repo self-contained, on the other hand, it can be a bit annoying for a repo to contain a rich history of binaries through multiple revisions.

One approach is to send a PR to a separate repo instead (we used https://github.com/dmlc/web-data) for some of that purposes then link to via https://gitcontent

tirthasheshpatel · 2022-04-05T13:02:13Z

One approach is to send a PR to a separate repo instead (we used https://github.com/dmlc/web-data) for some of that purposes then link to via https://gitcontent

Sounds good! I will remove the image here and propose a PR on dmlc/web-data.

tqchen · 2022-04-05T13:40:05Z

#104

mattip · 2022-04-05T19:13:59Z

docs/source/python_spec.rst

+doesn't support any version below the producer's maximum version, a
+``BufferError`` should be raised. Similarly, If the producer doesn't
+support the requested version, it should raise a ``BufferError``.
+


What happens if the consumer does not specify a version? As I understand things, for the forseeable future the producer should return a V1 structure. So effectively the default is 1.

What happens if the consumer does not specify a version? As I understand things, for the forseeable future the producer should return a V1 structure. So effectively the default is 1.

Yes, the default is 1. I will update the spec to say that instead.

wjakob · 2022-04-05T20:22:49Z

stream is an integer; the ID of the GPU device where the data is present. It is specific to CUDA and ROCm (and the devices that use a stream mechanism). This is explicitly stated in the semantics section of the python specification. Is it not clear enough?

I actually find this part of the documentation quite confusing (That said, I am only familiar with the CUDA part and can't speak about ROCm).

There are three concepts in CUDA that play a role here:

Each compatible graphics card/accelerator in a machine is given CUDA device ID.
An application can talk to such a CUDA device by creating a CUDA context (details). The CUDA runtime API creates a default context (primary context), but this is not necessarily only one. An application could in principle create multiple contexts to launch applications on the same GPU, each with its own virtual address space on the GPU (meaning that a memory allocation in one context cannot be accessed by another context).
In each context, the application can create multiple CUDA streams (details). A CUDA stream has the purpose of ordering kernel launches with respect to each other. It does not play a role with regards to the residency of memory allocations. (Although, to make this even more complicated, the latest CUDA versions now also have what is called stream-ordered memory allocations,I digress...)

Of these three, only the CUDA device ID is representable as an integer. Both CUDA context and CUDA stream are represented by opaque handles to a proprietary data structures, in both CUDA runtime and driver APIs. Essentially a void* pointer.

I think that there is I think a fundamental confusion between the word "stream" and "CUDA context" or "CUDA device ID" in the DLPack documentation. I suspect that will you want to call this "device ID" and not "stream" and also establish the convention that CUDA memory allocations are assumed to be located in the primary context associated with the associated CUDA device ID.

There is one situation in which the notion of a stream would make sense to me: if the tensor is not yet computed and the __dlpack__ function needs to perform a kernel launch to actually compute the data that will then be wrapped. In that case, we might want this computation to be ordered with respect to other computation performed by the caller (which might be issued to the same context+stream pair). However, in this case, providing the stream as an int would not be a suitable interface.

tqchen · 2022-04-05T20:55:14Z

include/dlpack/dlpack.h

+  DLPackVersion version;
+  /*! \brief Mark the data readonly. */
+  uint8_t readonly;
+} DLTensorVersioned;


Actually one thing that might be useful is to think about how to minimizing the change. Specifically, in light of S0 in #104. We may not need to introduce a separate versioned struct.

The old framework can still use the DLTensor(as implemented in the versioned version) as it is, as long as there is no reliance of sizeof(DLTensor). Perhaps we can rename the old one as DLTensorLegacy; In the similar spirit, it is safe to static_cast a DLTensor* to DLTensorLegacy*, as a result, we do not need to introduce two versions of DLManagedTensor

The old framework can still use the DLTensor(as implemented in the versioned version) as it is, as long as there is no reliance of sizeof(DLTensor). Perhaps we can rename the old one as DLTensorLegacy; In the similar spirit, it is safe to static_cast a DLTensor* to DLTensorLegacy*, as a result, we do not need to introduce two versions of DLManagedTensor

But DLManagedTensor doesn't have a pointer to the DLTensor struct. Instead, it is just a plain DLTensor dl_tensor field. So, changing the sizeof(DLTensor) will change alignment and size of DLManagedTensor. Which is why we need to rename both structs. I also considered renaming the old structs to DLTensorLegacy and DLManagedTensorLegacy but it seems equivalent to instead add new structs with a different name (new implementations will be able to see both and can use the struct they want to export...)

Ah sorry, you are indeed right. This would make the change ABI breaking. Although it would make future ABI change hard because all of them will be non-backward compact due to the change of deleter offset. Would be great to have more deliberation of this topic.

One possible choice is to append other flags(version, read_only) to DLManagedTensor instead. This of course won't be too ideal in case we want to change fields in DLTensor, but would maintain backward compact

Doesn't this just push the problem one abstraction layer higher? Now DLManagedTensor has a breaking ABI change.

Ahh, I see. If we move them to DLManagedTensor, we can ensure the location of the deleter function is still at the same offset. Then the code

DLManagedTensorVersioned *managed = (DLManagedTensorVersioned *)PyCapsule_GetPointer(self, "dltensor"); if (managed->deleter) { managed->deleter(managed);

Although not ideal, I too think it'd be better to add the new fields to the DLManagedTensor instead. This would eliminate the requirement of the *Versioned structs and also make it easier to implement support for the new ABI.

mattip · 2022-04-07T12:44:56Z

docs/source/python_spec.rst

+   on the array object, which will be called from within ``from_dlpack``,
+   to access the data, to get the maximum supported DLPack version, and
+   to query what device the array is on (may be needed to pass in the
+   correct stream, e.g. in the case of multiple GPUs).


Can you break this into 3 separate points? They can be subpoints of 2

mattip · 2022-04-07T12:53:11Z

docs/source/python_spec.rst

@@ -96,7 +112,7 @@ C API which is called either when the refcount on the capsule named
      PyObject *type, *value, *traceback;
      PyErr_Fetch(&type, &value, &traceback);

-      DLManagedTensor *managed = (DLManagedTensor *)PyCapsule_GetPointer(self, "dltensor");
+      DLManagedTensorVersioned *managed = (DLManagedTensorVersioned *)PyCapsule_GetPointer(self, "dltensor");


How can we distinguish between these two cases? For legacy capsules, we must cast to the older struct, right?

How can we distinguish between these two cases? For legacy capsules, we must cast to the older struct, right?

The offset of dl_tensor.device doesn't change with this ABI breaking change (and we also don't anticipate such a change in the future). So, casting it to either struct and accessing the field should work.

I think the location of the deleter function called below does change.

I think the location of the deleter function called below does change.

Oh, sorry. I misunderstood: I thought you were talking about the __dlpack_device__ method. We can have different deleter functions (one for each supported ABI). Not the cleanest way, but can be done using templating.

Ahh, this is code created by the producer, not the consumer. When the consumer calls obj.__dlpack__, the producer creates a capsule, and the capsule's deleter function consists of the code here.

docs/source/python_spec.rst

Co-authored-by: Matti Picus <matti.picus@gmail.com>

tqchen · 2023-01-05T18:09:12Z

closed in favor of #113

We would like to thanks @tirthasheshpatel for your effort on bringing in this change onward

Add version info, readonly, endianness, and alignment fields in DLTensor

efbf825

tirthasheshpatel mentioned this pull request Mar 25, 2022

Future ABI compatibility #34

Closed

tirthasheshpatel commented Mar 25, 2022

View reviewed changes

tqchen reviewed Mar 25, 2022

View reviewed changes

include/dlpack/dlpack.h Outdated Show resolved Hide resolved

Add DLPackVersion and remove alignment and endianness

3d35d82

leofang reviewed Mar 25, 2022

View reviewed changes

leofang suggested changes Mar 25, 2022

View reviewed changes

Add versioned structs and update spec

a96ada5

tirthasheshpatel changed the title ~~[ABI break] Add version info, readonly, endianness, and alignment fields in DLTensor~~ [ABI break] Add new structs with version info and readonly flag Apr 4, 2022

Remove capsule renaming, add release notes, update disgram

165f09f

tirthasheshpatel mentioned this pull request Apr 5, 2022

Add DLPack diagram dmlc/web-data#268

Merged

tqchen mentioned this pull request Apr 5, 2022

[DISCUSS][RFC] DLPack Versioning and ABI Update #104

Open

mattip reviewed Apr 5, 2022

View reviewed changes

tqchen requested changes Apr 5, 2022

View reviewed changes

mattip reviewed Apr 7, 2022

View reviewed changes

docs/source/python_spec.rst Outdated Show resolved Hide resolved

Update docs/source/python_spec.rst

56b1128

Co-authored-by: Matti Picus <matti.picus@gmail.com>

tirthasheshpatel mentioned this pull request Apr 15, 2022

[ABI break] Append new fields to DLManagedTensor #105

Closed

leofang mentioned this pull request May 24, 2022

Request to release 0.7 #107

Closed

tqchen closed this Jan 5, 2023

		The producer must set the ``PyCapsule`` name to ``"dltensor"`` if ABI
		version 1 is requested and ``"versioned_dltensor"`` if ABI version >= 2

[ABI break] Add new structs with version info and readonly flag #101

[ABI break] Add new structs with version info and readonly flag #101

Conversation

tirthasheshpatel commented Mar 25, 2022 • edited

Choose a reason for hiding this comment

tqchen Mar 25, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tirthasheshpatel Mar 25, 2022 • edited

Choose a reason for hiding this comment

tirthasheshpatel Apr 4, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wjakob commented Apr 2, 2022

tirthasheshpatel commented Apr 4, 2022

mattip commented Apr 4, 2022

wjakob commented Apr 4, 2022

tirthasheshpatel commented Apr 4, 2022 • edited

tirthasheshpatel commented Apr 5, 2022

tqchen commented Apr 5, 2022

tirthasheshpatel commented Apr 5, 2022

tqchen commented Apr 5, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wjakob commented Apr 5, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tirthasheshpatel Apr 7, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Jan 5, 2023

tirthasheshpatel commented Mar 25, 2022 •

edited

tqchen Mar 25, 2022 •

edited

tirthasheshpatel Mar 25, 2022 •

edited

tirthasheshpatel Apr 4, 2022 •

edited

tirthasheshpatel commented Apr 4, 2022 •

edited

wjakob commented Apr 5, 2022 •

edited

tirthasheshpatel Apr 7, 2022 •

edited