Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Add cuda_event type #870

Open
wants to merge 11 commits into
base: branch-23.02
Choose a base branch
from

Conversation

achirkin
Copy link

@achirkin achirkin commented Sep 15, 2021

  1. Add new cuda_event and cuda_event_view wrappers, similar to cuda_stream and cuda_stream_view.
  2. Add extra functionality to cuda_stream and cuda_stream_view to interact with the added event types.

@achirkin achirkin requested review from a team as code owners September 15, 2021 08:27
@github-actions github-actions bot added CMake cpp Pertains to C++ code labels Sep 15, 2021
@achirkin
Copy link
Author

A few days ago @jrhemstad mentioned you'd welcome this wrapper. I tried to follow cuda_stream approach as close as possible, so some copy-pasting is there.

Copy link
Contributor

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is that both wait and record should actually be members of cuda_stream_view. In thinking of member functions as verbs, a stream waits on an event or a stream records an event. The relationship of the thing doing the action and the thing being acted on feels inverted to make these members of the event.

@caryr35 caryr35 added this to PR-WIP in v21.10 Release via automation Sep 15, 2021
@caryr35 caryr35 moved this from PR-WIP to PR-Needs review in v21.10 Release Sep 15, 2021
@achirkin achirkin changed the title [REVIEW] Add cuda_event type [WIP] Add cuda_event type Sep 15, 2021
@achirkin
Copy link
Author

Thanks, @jrhemstad ! I wouldn't mind moving the wait and record, but then we need to make a choice: shall I add another event_view, or better just make duplicate methods for rmm::cuda_event and cudaEvent_t? And there is also wait without arguments, which synchronises with the host (cudaEventSynchronize), which I cannot move to stream_view.

On the other hand, I can just change the wording to something like... synchronize/wait_by and push/record_in?

@jrhemstad
Copy link
Contributor

jrhemstad commented Sep 15, 2021

Yes, I think we will need an event_view type.

And there is also wait without arguments, which synchronises with the host (cudaEventSynchronize), which I cannot move to stream_view.

Good point. We should keep that as a wait function in the event type.

@achirkin
Copy link
Author

achirkin commented Sep 15, 2021

There would also be some inconvenience in having to explicitly convert stream types to the stream_view to access its member functions if we move wait and record there, right? Still better than having them in cuda_event? :)

@jrhemstad
Copy link
Contributor

We should design for a world where cudaStream_t isn't used directly anymore.

Eventually these types will all exist in libcu++.

@achirkin achirkin changed the title [WIP] Add cuda_event type [REVIEW] Add cuda_event type Sep 16, 2021
@achirkin
Copy link
Author

Done. Sorry I don't have the rights to set github labels here.

@achirkin
Copy link
Author

achirkin commented Sep 16, 2021

Also note, I added a function that doesn't strictly belong to the PR (I hope it's ok). I need it in cuml to decide if I need one worker stream or two (if I cannot use the default). I thought it would be useful for others too:

/**
* @brief Tells if the viewed CUDA stream is implicitly synchronized with the given stream.
*
* This can happen e.g.
* if the two views point to the same stream
* or sometimes when one of them is the legacy default stream.
*/
bool is_implicitly_synchronized(cuda_stream_view other) const
{
// any stream is "synchronized" with itself
if (value() == other.value()) return true;
// legacy + blocking streams
unsigned int flags = 0;
if (is_default()) {
RMM_CUDA_TRY(cudaStreamGetFlags(other.value(), &flags));
if ((flags & rmm::STREAM_NON_BLOCKING) == 0) return true;
}
if (other.is_default()) {
RMM_CUDA_TRY(cudaStreamGetFlags(value(), &flags));
if ((flags & rmm::STREAM_NON_BLOCKING) == 0) return true;
}
return false;
}

Comment on lines 29 to 38
enum cuda_event_flags {
/** Default event flag. */
EVENT_DEFAULT = cudaEventDefault,
/** Event uses blocking synchronization. */
EVENT_BLOCKING_SYNC = cudaEventBlockingSync,
/** Event will not record timing data. */
EVENT_DISABLE_TIMING = cudaEventDisableTiming,
/** Event is suitable for interprocess use. cudaEventDisableTiming must be set. */
EVENT_INTERPROCESS = cudaEventInterprocess
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, I wonder if these things should be encoded in the type of the cuda_event/event_view. For example, the elapsed_time_since function currently accepts any event_view, but really it only works with an event that was created where cudaEventDisableTiming wasn't specified. That's the kind of thing that the type system should be used to enforce.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm personally not sure if it's worth the cost. I think lifting cudaEventDisableTiming into something like cuda_event_with_timing would be nice. But the same thing would cause problems for cuda_event_view: we'd get the exception from elapsed_time_since moved to the implicit conversion/constructor from cudaEvent_t. cudaEventBlockingSync does not seem to affect the api at all, so there is no point in lifting it. Not sure if it makes sense for cudaEventInterprocess and the future flags to come either, and it may seem illogical to a user that the meaning of some of the flags is duplicated in the type system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking something more like cuda_event<Properties...>.

So the API for elapsed_time_since would be something like:

template <typename... Properties>
float elapsed_time_since(event_view<Properties...> e){
   static_assert( /* Properties does not contain `cudaEventDisableTiming` */ )
}

cudaEventBlockingSync does not seem to affect the api at all, so there is no point in lifting it

Maybe. Maybe not though. I might want to specialize/overload a function for an event that is blocking vs. non-blocking.

Likewise with cudaEventInterprocess, I might write a function that I require the specified event is capable of working with IPC. I don't believe there is even a way to query a cudaEvent_t after the fact to detect if it was created with cudaEventInterprocess, so if a user passed in an incorrectly created stream, an error likely wouldn't be detected until I attempted to use that event in another API like cudaIpcGetEventHandle.

My point is that there are a number of ways a user can misuse events. We can either detect those at runtime and throw exceptions, or we can detect them at compile time. Personally, I try and push as many errors to compile time as possible.

Thinking through all the complexities and corner cases like this is exactly why we shouldn't have to be designing these things ourselves and should be provided by CUDA, but I digress...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you convince me, @jrhemstad that there is never a need to decide the properties of an event at runtime?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I definitely can't. I think what the right thing to do here is use a pattern like std::span.

A span can have a statically known size, like std::span<int, 5> is a span of 5 ints.

Or it can have a dynamically known size std::span<int>. The way they do this is they have a special sentinel value for the second template argument that says "the size of this span is dynamic". It's called std::dynamic_extent and it's the default value for the second template argument.

So we could use this same pattern for an event type where you could have:

namespace cuda_event{
struct timing{};
struct IPC{};
struct dynamic{
   cudaEventFlags f;
   dynamic(cudaEventFlags f) : f{f}
};
}

cuda_event<timing> e0; // This is an event where timing hasn't been disabled
cuda_event<timing, IPC> e1; // Event that supports timing and IPC
cuda_event<dynamic> e2{/*dynamic value*/};

This way you can have best of both worlds. If I'm writing a function and I want to statically declare that an event passed into my function supports timing then I can specify cuda_event<timing,...>. If I want to support an event with dynamic properties I can just do cuda_event<dynamic>.

Don't get me wrong, it's a lot of machinery to make this work, but that's precisely why CUDA should be providing it :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having variable number of template arguments would be not ideal, because this would imply the markers can go in any order, and users later would have problems with cuda_event<timing, IPC> and cuda_event<IPC,timing> being different types. Perhaps, I can force the order and other constraints using lots of static_asserts...

In general, I'm always up for more type safety. It's just feels to me a bit foreign in rmm, where there are no similar constructs on the streams (legacy/default/(non-)blocking) and stream pools (non-empty?:).

Copy link
Author

@achirkin achirkin Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this, @jrhemstad ?

/** @brief An event view with flags provided at runtime. */
class cuda_event_view_ {

...
template <cuda_event_flags Flags = EVENT_DEFAULT>
class cuda_event_view : public cuda_event_view_ {

This way, cuda_event_view (without brackets) defaults to the most commonly used, statically enforced cuda_event_view<EVENT_DEFAULT>, while it's still possible to set flags dynamically via cuda_event_view_. And since I use cuda_event_flags as the template argument, I can use union (|) operator and not worry about the order in which the flags appear.

@harrism harrism added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change 2 - In Progress Currently a work in progress labels Sep 16, 2021
@harrism
Copy link
Member

harrism commented Sep 16, 2021

I think this still requires a bit of design work, and we are in burn down for 21.10 so I'm moving this to the next release. @achirkin please change the target branch.

@harrism harrism removed this from PR-Needs review in v21.10 Release Sep 16, 2021
@harrism harrism added this to PR-WIP in v21.12 Release via automation Sep 16, 2021
@achirkin
Copy link
Author

Ok, though I hoped to push it in 21.10, cause I need events in rapidsai/cuml#4201 .

@harrism
Copy link
Member

harrism commented Sep 20, 2021

@achirkin you can use CUDA events now without a wrapper class. We already use events in RMM. I think we should take our time and get this right, and perhaps it should just live in libcu++ from the beginning.

@github-actions github-actions bot added gpuCI Python Related to RMM Python API labels Sep 29, 2021
@achirkin achirkin changed the base branch from branch-21.10 to branch-21.12 September 29, 2021 10:09
@achirkin achirkin requested review from a team as code owners September 29, 2021 10:09
@ajschmidt8 ajschmidt8 removed the request for review from a team September 30, 2021 13:17
@ajschmidt8
Copy link
Member

Removing ops-codeowners from the required reviews since it doesn't seem there are any file changes that we're responsible for. Feel free to add us back if necessary.

@harrism harrism removed this from PR-WIP in v21.12 Release Nov 9, 2021
@harrism harrism added this to PR-WIP in v22.02 Release via automation Nov 9, 2021
@github-actions
Copy link

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

@achirkin achirkin changed the base branch from branch-21.12 to branch-22.02 November 18, 2021 18:02
@harrism harrism removed this from PR-WIP in v22.02 Release Jan 11, 2022
@harrism harrism added this to PR-WIP in v22.04 Release via automation Jan 11, 2022
@github-actions
Copy link

This PR has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates.

@harrism harrism removed this from PR-WIP in v22.04 Release Mar 14, 2022
@achirkin achirkin changed the base branch from branch-22.02 to branch-23.02 January 9, 2023 09:27
@github-actions github-actions bot removed the Python Related to RMM Python API label Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress CMake cpp Pertains to C++ code gpuCI improvement Improvement / enhancement to an existing function inactive-30d inactive-90d non-breaking Non-breaking change
Projects
Status: No status
Feature Planning
Awaiting triage
Development

Successfully merging this pull request may close these issues.

None yet

4 participants