Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EventEngine Test Suite: Timers #27496

Merged
merged 23 commits into from
Oct 4, 2021
Merged

EventEngine Test Suite: Timers #27496

merged 23 commits into from
Oct 4, 2021

Conversation

drfloob
Copy link
Member

@drfloob drfloob commented Sep 27, 2021

A reusable test suite for EventEngine implementations.

To exercise a custom EventEngine, simply link against :event_engine_test_suite
and provide a testing main function that sets a custom EventEngine factory:

#include "path/to/my_custom_event_engine.h"
#include "src/core/event_engine/test_suite/event_engine_test.h"

int main(int argc, char** argv) {
  ::testing::InitGoogleTest(&argc, argv);
  SetEventEngineFactory(
      []() { return absl::make_unique<MyCustomEventEngine>(); });
  auto result = RUN_ALL_TESTS();
  return result;
}

See the :simple_event_engine_test blaze target for an example.

@Vignesh2208 @dennycd

@ctiller
Copy link
Member

ctiller commented Sep 28, 2021

Watch out for backwards references here... there are build systems that will likely choke on this.

We may be better off with the following layout:

eventengine-tests:

  function<EventEngine*()> NewEventEngine();
  void SetEventEngineFactory(function<EventEngine*()>);

specific-event-engine-test:

  void main() {
    SetEventEngineFactory(MyEventEngineFactory);
    gtest::RunTheStuff();
  }

@drfloob
Copy link
Member Author

drfloob commented Sep 28, 2021

Thanks for the heads up about linker backreference issues. I don't think it's an issue in this case, since implementation-specific test suite targets will always have a one-way dependency on the :event_engine_test_suite target, and nothing else should depend on that :event_engine_test_suite target, leaving that method undefined. But it is a "clever" thing, which is debatably a bad idea. I've made the changes you suggested. It's a bit more code now (I didn't want an unmanaged global ptr that never gets deleted, for example), but it does the job fine.

@drfloob drfloob added area/test release notes: yes Indicates if PR needs to be in release notes labels Sep 29, 2021
@drfloob drfloob marked this pull request as ready for review September 29, 2021 00:31
drfloob added a commit to drfloob/grpc that referenced this pull request Sep 29, 2021
drfloob added a commit to drfloob/grpc that referenced this pull request Sep 29, 2021
std::atomic<int> call_count{0};
std::atomic<int> failed_timer_count{0};
absl::BitGen bitgen;
for (int i = 0; i < thread_count; ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/timer_count/thread_count through here...

We probably want 100 threads creating 100 timers, for 10000 timers scheduled total. (i.e. 100 threads each run this loop).

This will help catch any threading issues with scheduling new timers too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the 10k RunAt calls (split across 100 threads). As a result, I decided to delete the SimpleEventEngine example implementation rather than improve it, because TSAN was not able to deal with 10k threads in that naive implementation. The LibuvEventEngine PR will exercise this suite, and at the very least, the SimpleEventEngine implementation was helpful to verify the suite up to this point.

++count;
cv_.Signal();
});
engine->RunAt(absl::Now(), [&]() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a source of flakiness: if we get pre-empted on line 88 for 1 second or more then this test will fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted on line 76. I considered removing this test, since it does require some timeout hackery which is inherently brittle, and could take a lot of time if it's too conservatively configured. We're already testing the timer's scheduling accuracy in StressTestTimersNotCalledBeforeScheduled, and ordering is not guaranteed if both timers come due at the same time (presuming some delay in clock evaluation). I'm thinking again maybe we remove this. WDYT?

ASSERT_FALSE(signaled_);
}

TEST_F(EventEngineTimerTest, TimersRespectScheduleOrdering) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would make a very good stress test too: have 100 threads do this for 100 different vectors (and 100 different mutexes), assert all vectors are ordered at the end.

(this could be later work)

ASSERT_TRUE(engine->Cancel(handle));
}

TEST_F(EventEngineTimerTest, CancelledCallbackIsNotExecuted) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also make an excellent stress test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 threads schedule infinite future callbacks, and then in phase 2 100 different threads cancel them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(again, future work suggestion)

Copy link
Member

@ctiller ctiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some future work suggestions... I expect we should do something about TimersRespectScheduleOrdering potential flakiness, but I'm not sure it should block submission either.

@drfloob
Copy link
Member Author

drfloob commented Oct 4, 2021

(import notes) All tests had previously passed except for the ruby tests, which are failing due to an RVM bug. #27538

@drfloob drfloob merged commit cdf5965 into grpc:master Oct 4, 2021
@copybara-service copybara-service bot added the imported Specifies if the PR has been imported to the internal repository label Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test imported Specifies if the PR has been imported to the internal repository lang/core release notes: yes Indicates if PR needs to be in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants