Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't enable SimpleCov in CI #125

Merged
merged 1 commit into from
Aug 4, 2020

Conversation

geoff2k
Copy link
Contributor

@geoff2k geoff2k commented Jul 5, 2020

While working on #123, a number of my builds segfaulted and failed during their runs under Travis (example 1). I investigated and noticed that segfaults appear to occur sporadically in other builds that did not contain my code: example2, example 3.

I have been unable to reproduce the segfaults locally under any of the versions of Ruby we test with. 馃槥

Per google searches, SimpleCov seems to be anecdotally related to segmentation faults experienced running builds in CI, so this PR checks for a well-known environment variable CI (supported under both Travis and CircleCI) and doesn't enable SimpleCov if present.

If this does not solve the issue, we may need to use something like travis-codedump to diagnose the segfaults.

Per google searches, SimpleCov seems to be anedcotally related to
segmentation faults experienced running builds in CI.

This checks for a well-known environment variable (supported under
both Travis and CircleCI) and doesn't enable SimpleCov if present.
@mensfeld
Copy link
Member

mensfeld commented Jul 5, 2020

I have a similar problem - but that is not because of simplecov. That's due to some issues with FFI and Rubys GC. There were some bypasses suggested (including by me) but they don't seem to work in all the scenarios.

@geoff2k
Copy link
Contributor Author

geoff2k commented Jul 6, 2020

@mensfeld In your estimation, are these worth trying to track down? In other words, while the failures in CI are annoying, I'd be more willing to ignore them if we knew they didn't point to some problem with the code under test, if that makes sense.

@Adithya-copart
Copy link
Contributor

@geoff2k I ran into similar issues while open sockets are closed during the shutdown sequence.

See #108 (comment) and CI run.

I ended up calling close in the specs where sockets were left open after some debugging at that time.

@mensfeld
Copy link
Member

mensfeld commented Jul 7, 2020

@mensfeld In your estimation, are these worth trying to track down? In other words, while the failures in CI are annoying, I'd be more willing to ignore them if we knew they didn't point to some problem with the code under test, if that makes sense.

The question here is: will it occur in production. And the answer is: I don't know. I took several countermeasures to make sure it won't happen in production (in Waterdrop). I tackled majority of cases in specs as well but there's still something off.

I've been running Waterdrop 2.0 (with rdkafka) in scale of up to 10k rq/s with auto-scaling on aws and things that could potentially trigger this as well as with instrumentation to detect something like that upon process exit and for the past several months it never occured anywhere beyond CI.

However, I still see this as an issue worth investigating.

@tombruijn tombruijn changed the base branch from master to main July 8, 2020 09:33
@thijsc
Copy link
Collaborator

thijsc commented Aug 4, 2020

I've been running Waterdrop 2.0 (with rdkafka) in scale of up to 10k rq/s with auto-scaling on aws and things that could potentially trigger this as well as with instrumentation to detect something like that upon process exit and for the past several months it never occured anywhere beyond CI.
However, I still see this as an issue worth investigating.

Same for us, we run this in production and have never once seen it happen. Still unsure why the spec context matters so much. Probably because in production you never close and recreate consumers/producers.

@thijsc thijsc merged commit 33aa3ba into karafka:main Aug 4, 2020
@geoff2k geoff2k deleted the gt/disable_simplecov_in_ci branch September 18, 2020 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants