[REVIEW] Add cuda_event type #870

achirkin · 2021-09-15T08:27:29Z

Add new cuda_event and cuda_event_view wrappers, similar to cuda_stream and cuda_stream_view.
Add extra functionality to cuda_stream and cuda_stream_view to interact with the added event types.

achirkin · 2021-09-15T08:30:50Z

A few days ago @jrhemstad mentioned you'd welcome this wrapper. I tried to follow cuda_stream approach as close as possible, so some copy-pasting is there.

include/rmm/cuda_event.hpp

jrhemstad

My feeling is that both wait and record should actually be members of cuda_stream_view. In thinking of member functions as verbs, a stream waits on an event or a stream records an event. The relationship of the thing doing the action and the thing being acted on feels inverted to make these members of the event.

achirkin · 2021-09-15T14:37:29Z

Thanks, @jrhemstad ! I wouldn't mind moving the wait and record, but then we need to make a choice: shall I add another event_view, or better just make duplicate methods for rmm::cuda_event and cudaEvent_t? And there is also wait without arguments, which synchronises with the host (cudaEventSynchronize), which I cannot move to stream_view.

On the other hand, I can just change the wording to something like... synchronize/wait_by and push/record_in?

jrhemstad · 2021-09-15T15:07:21Z

Yes, I think we will need an event_view type.

And there is also wait without arguments, which synchronises with the host (cudaEventSynchronize), which I cannot move to stream_view.

Good point. We should keep that as a wait function in the event type.

achirkin · 2021-09-15T15:21:52Z

There would also be some inconvenience in having to explicitly convert stream types to the stream_view to access its member functions if we move wait and record there, right? Still better than having them in cuda_event? :)

jrhemstad · 2021-09-15T15:23:09Z

We should design for a world where cudaStream_t isn't used directly anymore.

Eventually these types will all exist in libcu++.

achirkin · 2021-09-16T12:45:19Z

Done. Sorry I don't have the rights to set github labels here.

achirkin · 2021-09-16T12:48:45Z

Also note, I added a function that doesn't strictly belong to the PR (I hope it's ok). I need it in cuml to decide if I need one worker stream or two (if I cannot use the default). I thought it would be useful for others too:

rmm/include/rmm/cuda_stream_view.hpp

Lines 96 to 118 in b55088b

    
             /** 
        
              *  @brief Tells if the viewed CUDA stream is implicitly synchronized with the given stream. 
        
              * 
        
              *  This can happen e.g. 
        
              *   if the two views point to the same stream 
        
              *   or sometimes when one of them is the legacy default stream. 
        
              */ 
        
             bool is_implicitly_synchronized(cuda_stream_view other) const 
        
             { 
        
               // any stream is "synchronized" with itself 
        
               if (value() == other.value()) return true; 
        
               // legacy + blocking streams 
        
               unsigned int flags = 0; 
        
               if (is_default()) { 
        
                 RMM_CUDA_TRY(cudaStreamGetFlags(other.value(), &flags)); 
        
                 if ((flags & rmm::STREAM_NON_BLOCKING) == 0) return true; 
        
               } 
        
               if (other.is_default()) { 
        
                 RMM_CUDA_TRY(cudaStreamGetFlags(value(), &flags)); 
        
                 if ((flags & rmm::STREAM_NON_BLOCKING) == 0) return true; 
        
               } 
        
               return false; 
        
             }

jrhemstad · 2021-09-16T13:12:53Z

include/rmm/cuda_event_view.hpp

+enum cuda_event_flags {
+  /** Default event flag. */
+  EVENT_DEFAULT = cudaEventDefault,
+  /** Event uses blocking synchronization. */
+  EVENT_BLOCKING_SYNC = cudaEventBlockingSync,
+  /** Event will not record timing data. */
+  EVENT_DISABLE_TIMING = cudaEventDisableTiming,
+  /** Event is suitable for interprocess use. cudaEventDisableTiming must be set. */
+  EVENT_INTERPROCESS = cudaEventInterprocess
+};


Come to think of it, I wonder if these things should be encoded in the type of the cuda_event/event_view. For example, the elapsed_time_since function currently accepts any event_view, but really it only works with an event that was created where cudaEventDisableTiming wasn't specified. That's the kind of thing that the type system should be used to enforce.

Hmm, I'm personally not sure if it's worth the cost. I think lifting cudaEventDisableTiming into something like cuda_event_with_timing would be nice. But the same thing would cause problems for cuda_event_view: we'd get the exception from elapsed_time_since moved to the implicit conversion/constructor from cudaEvent_t. cudaEventBlockingSync does not seem to affect the api at all, so there is no point in lifting it. Not sure if it makes sense for cudaEventInterprocess and the future flags to come either, and it may seem illogical to a user that the meaning of some of the flags is duplicated in the type system.

I was thinking something more like cuda_event<Properties...>.

So the API for elapsed_time_since would be something like:

template <typename... Properties> float elapsed_time_since(event_view<Properties...> e){ static_assert( /* Properties does not contain `cudaEventDisableTiming` */ ) }

cudaEventBlockingSync does not seem to affect the api at all, so there is no point in lifting it

Maybe. Maybe not though. I might want to specialize/overload a function for an event that is blocking vs. non-blocking.

Likewise with cudaEventInterprocess, I might write a function that I require the specified event is capable of working with IPC. I don't believe there is even a way to query a cudaEvent_t after the fact to detect if it was created with cudaEventInterprocess, so if a user passed in an incorrectly created stream, an error likely wouldn't be detected until I attempted to use that event in another API like cudaIpcGetEventHandle.

My point is that there are a number of ways a user can misuse events. We can either detect those at runtime and throw exceptions, or we can detect them at compile time. Personally, I try and push as many errors to compile time as possible.

Thinking through all the complexities and corner cases like this is exactly why we shouldn't have to be designing these things ourselves and should be provided by CUDA, but I digress...

Can you convince me, @jrhemstad that there is never a need to decide the properties of an event at runtime?

Nope, I definitely can't. I think what the right thing to do here is use a pattern like std::span.

A span can have a statically known size, like std::span<int, 5> is a span of 5 ints.

Or it can have a dynamically known size std::span<int>. The way they do this is they have a special sentinel value for the second template argument that says "the size of this span is dynamic". It's called std::dynamic_extent and it's the default value for the second template argument.

So we could use this same pattern for an event type where you could have:

namespace cuda_event{ struct timing{}; struct IPC{}; struct dynamic{ cudaEventFlags f; dynamic(cudaEventFlags f) : f{f} }; } cuda_event<timing> e0; // This is an event where timing hasn't been disabled cuda_event<timing, IPC> e1; // Event that supports timing and IPC cuda_event<dynamic> e2{/*dynamic value*/};

This way you can have best of both worlds. If I'm writing a function and I want to statically declare that an event passed into my function supports timing then I can specify cuda_event<timing,...>. If I want to support an event with dynamic properties I can just do cuda_event<dynamic>.

Don't get me wrong, it's a lot of machinery to make this work, but that's precisely why CUDA should be providing it :)

I think having variable number of template arguments would be not ideal, because this would imply the markers can go in any order, and users later would have problems with cuda_event<timing, IPC> and cuda_event<IPC,timing> being different types. Perhaps, I can force the order and other constraints using lots of static_asserts...

In general, I'm always up for more type safety. It's just feels to me a bit foreign in rmm, where there are no similar constructs on the streams (legacy/default/(non-)blocking) and stream pools (non-empty?:).

How about this, @jrhemstad ?

rmm/include/rmm/cuda_event_view.hpp

Lines 82 to 83 in 9aadcb4

/** @brief An event view with flags provided at runtime. */

class cuda_event_view_ {

...

rmm/include/rmm/cuda_event_view.hpp

Lines 162 to 163 in 9aadcb4

template <cuda_event_flags Flags = EVENT_DEFAULT>

class cuda_event_view : public cuda_event_view_ {

This way, cuda_event_view (without brackets) defaults to the most commonly used, statically enforced cuda_event_view<EVENT_DEFAULT>, while it's still possible to set flags dynamically via cuda_event_view_. And since I use cuda_event_flags as the template argument, I can use union (|) operator and not worry about the order in which the flags appear.

harrism · 2021-09-16T22:58:12Z

I think this still requires a bit of design work, and we are in burn down for 21.10 so I'm moving this to the next release. @achirkin please change the target branch.

achirkin · 2021-09-17T09:36:05Z

Ok, though I hoped to push it in 21.10, cause I need events in rapidsai/cuml#4201 .

harrism · 2021-09-20T22:07:56Z

@achirkin you can use CUDA events now without a wrapper class. We already use events in RMM. I think we should take our time and get this right, and perhaps it should just live in libcu++ from the beginning.

ajschmidt8 · 2021-09-30T13:17:35Z

Removing ops-codeowners from the required reviews since it doesn't seem there are any file changes that we're responsible for. Feel free to add us back if necessary.

github-actions · 2021-11-18T18:01:18Z

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2022-02-16T19:01:04Z

This PR has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates.

Added cuda_event type.

49f139c

achirkin requested review from a team as code owners September 15, 2021 08:27

achirkin requested review from rongou and harrism September 15, 2021 08:27

github-actions bot added CMake cpp Pertains to C++ code labels Sep 15, 2021

achirkin mentioned this pull request Sep 15, 2021

Faster glm ols-via-eigendecomposition algorithm rapidsai/cuml#4201

Merged

Add a macro for cudaEventRecordWithFlags non-existing before CUDA 11.1

23be43c

jrhemstad reviewed Sep 15, 2021

View reviewed changes

include/rmm/cuda_event.hpp Outdated Show resolved Hide resolved

jrhemstad reviewed Sep 15, 2021

View reviewed changes

include/rmm/cuda_event.hpp Outdated Show resolved Hide resolved

jrhemstad reviewed Sep 15, 2021

View reviewed changes

caryr35 added this to PR-WIP in v21.10 Release via automation Sep 15, 2021

caryr35 moved this from PR-WIP to PR-Needs review in v21.10 Release Sep 15, 2021

achirkin changed the title ~~[REVIEW] Add cuda_event type~~ [WIP] Add cuda_event type Sep 15, 2021

achirkin added 5 commits September 16, 2021 11:47

Refactored code according to the review

d578fa8

typo

760f080

Add cuda_stream_view::is_implicitly_synchronized

5bfe828

Compatibility with CUDA < 11.1

da010c4

typo

21a7ba2

achirkin changed the title ~~[WIP] Add cuda_event type~~ [REVIEW] Add cuda_event type Sep 16, 2021

Use is_default() to check if the stream is legacy default

b55088b

jrhemstad reviewed Sep 16, 2021

View reviewed changes

harrism added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change 2 - In Progress Currently a work in progress labels Sep 16, 2021

harrism removed this from PR-Needs review in v21.10 Release Sep 16, 2021

harrism added this to PR-WIP in v21.12 Release via automation Sep 16, 2021

achirkin added 2 commits September 22, 2021 12:12

Add templates to allow enforcing cuda_event_flags safety

9aadcb4

Merge branch 'branch-21.12' into fea-cuda-event

4208dfe

github-actions bot added gpuCI Python Related to RMM Python API labels Sep 29, 2021

achirkin changed the base branch from branch-21.10 to branch-21.12 September 29, 2021 10:09

achirkin requested review from a team as code owners September 29, 2021 10:09

ajschmidt8 removed the request for review from a team September 30, 2021 13:17

harrism removed this from PR-WIP in v21.12 Release Nov 9, 2021

harrism added this to PR-WIP in v22.02 Release via automation Nov 9, 2021

github-actions bot added the inactive-30d label Nov 18, 2021

achirkin changed the base branch from branch-21.12 to branch-22.02 November 18, 2021 18:02

harrism removed this from PR-WIP in v22.02 Release Jan 11, 2022

harrism added this to PR-WIP in v22.04 Release via automation Jan 11, 2022

github-actions bot added the inactive-90d label Feb 16, 2022

harrism removed this from PR-WIP in v22.04 Release Mar 14, 2022

achirkin changed the base branch from branch-22.02 to branch-23.02 January 9, 2023 09:27

Merge branch 'branch-23.02' into fea-cuda-event

94fcf14

github-actions bot removed the Python Related to RMM Python API label Jan 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Add cuda_event type #870

[REVIEW] Add cuda_event type #870

achirkin commented Sep 15, 2021 •

edited

achirkin commented Sep 15, 2021

jrhemstad left a comment •

edited

achirkin commented Sep 15, 2021

jrhemstad commented Sep 15, 2021 •

edited

achirkin commented Sep 15, 2021 •

edited

jrhemstad commented Sep 15, 2021

achirkin commented Sep 16, 2021

achirkin commented Sep 16, 2021 •

edited

jrhemstad Sep 16, 2021

achirkin Sep 16, 2021

jrhemstad Sep 16, 2021

harrism Sep 16, 2021

jrhemstad Sep 16, 2021

achirkin Sep 17, 2021

achirkin Sep 22, 2021 •

edited

harrism commented Sep 16, 2021

achirkin commented Sep 17, 2021

harrism commented Sep 20, 2021

ajschmidt8 commented Sep 30, 2021

github-actions bot commented Nov 18, 2021

github-actions bot commented Feb 16, 2022

	/** @brief An event view with flags provided at runtime. */
	class cuda_event_view_ {

	template <cuda_event_flags Flags = EVENT_DEFAULT>
	class cuda_event_view : public cuda_event_view_ {

[REVIEW] Add cuda_event type #870

Are you sure you want to change the base?

[REVIEW] Add cuda_event type #870

Conversation

achirkin commented Sep 15, 2021 • edited

achirkin commented Sep 15, 2021

jrhemstad left a comment • edited

Choose a reason for hiding this comment

achirkin commented Sep 15, 2021

jrhemstad commented Sep 15, 2021 • edited

achirkin commented Sep 15, 2021 • edited

jrhemstad commented Sep 15, 2021

achirkin commented Sep 16, 2021

achirkin commented Sep 16, 2021 • edited

jrhemstad Sep 16, 2021

Choose a reason for hiding this comment

achirkin Sep 16, 2021

Choose a reason for hiding this comment

jrhemstad Sep 16, 2021

Choose a reason for hiding this comment

harrism Sep 16, 2021

Choose a reason for hiding this comment

jrhemstad Sep 16, 2021

Choose a reason for hiding this comment

achirkin Sep 17, 2021

Choose a reason for hiding this comment

achirkin Sep 22, 2021 • edited

Choose a reason for hiding this comment

harrism commented Sep 16, 2021

achirkin commented Sep 17, 2021

harrism commented Sep 20, 2021

ajschmidt8 commented Sep 30, 2021

github-actions bot commented Nov 18, 2021

github-actions bot commented Feb 16, 2022

achirkin commented Sep 15, 2021 •

edited

jrhemstad left a comment •

edited

jrhemstad commented Sep 15, 2021 •

edited

achirkin commented Sep 15, 2021 •

edited

achirkin commented Sep 16, 2021 •

edited

achirkin Sep 22, 2021 •

edited