Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: possible state leak of buffer_count state between stackprof invocations #207

Open
dalehamel opened this issue Apr 17, 2023 · 2 comments

Comments

@dalehamel
Copy link

Currently the buffer is only reset after a sample is recorded:

// record the sample previously buffered by stackprof_buffer_sample
static void
stackprof_record_buffer(void)
{
stackprof_record_sample_for_stack(_stackprof.buffer_count, _stackprof.buffer_time.timestamp_usec, _stackprof.buffer_time.delta_usec);
// reset the buffer
_stackprof.buffer_count = 0;
}

However, when stopping stackprof, we may skip this:

stackprof_job_record_buffer(void *data)
{
if (!_stackprof.running) return;
stackprof_record_buffer();
}

This could leave a sample buffered, or for stackprof to believe it has a sample still buffered anyways. Since we never reset this counter between stackprof runs in stackprof_results, it is thus possible that this could bleed state into a future invocation of stackprof

At minimum, I think we must reset this counter as part of stackprof_results, following the existing idiom of resetting values there during reporting. A better behaviour would probably be to pop any buffered but not recorded data off as well, rather than discarding it outright

@dalehamel
Copy link
Author

I think this is perhaps a known issue, as documented here https://github.com/Shopify/app_profiler/blob/main/lib/app_profiler/profiler.rb#L66-L77

My theory is that because the postponed job is responsible for recording the buffer, it is possible there is a race between recording the buffer and summarizing the results.

Perhaps this issue can be solved by having a synchronization point that waits for all jobs to complete as part of stackprof_results?

@dalehamel
Copy link
Author

Perhaps the way to fix this conclusively is to have each postponed job increment a semaphore when enqueued, and decrement it when finished, then have stackprof_results poll until this equal 0. It might even suffice to poll on the existing buffer_count variable, if we can ensure that the postponed jobs never return early and always decrement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant