New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Trace Cache Iteration Crash #692
Conversation
b2d2812
to
8210077
Compare
🦙 MegaLinter status: ❌ ERROR
See errors details in artifact MegaLinter reports on CI Job page |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize that I didn't see this earlier. I had this one on my todo list but I didn't see you'd already opened a PR for it until just now. I do not believe this fix will work as the crash was during the list() cast prior to the loop running. If the list casting didn't fix this, I do not believe a copy will fix it either. My comment on the issue was a bit rushed and not thought through. This type of fix would only work if the code inside the loop was where the _cache was being modified (the list casting would have also fixed this kind of issue). However, since the crash is still happening, I do not think that's what's happening here (nor do I see any code inside the loop that modifies the size of the _cache). I think this is being modified in a separate thread so the two threads are accessing the _cache simultaneously. I believe we need to add a lock here. I've opened a draft PR that does that here: https://github.com/newrelic/newrelic-python-agent/pull/702/files. We can either open that draft PR or we can modify this fix to be like what's in that draft.
The issue with a list is that it's not casting, it's iterating on the dict.items into a list, then iterating the list. The crash has less area to happen but can still happen if the change is made during the short time when the dict.items is iterated over by the list constructor (instead of the longer running for loop). A shallow That being said, I'm not opposed to using locking but I wonder if that causes potential performance issues. It's certainly harder to maintain. I think the simpler solution might be more appropriate considering this is only affecting event loop wait time and debug logic, as opposed to potential impacts to any time that the trace cache is accessed. |
Superseded by #704 |
Before contributing, please read our contributing guidelines and code of conduct.
Overview
Describe the changes present in the pull request
Related Github Issue
Include a link to the related GitHub issue, if applicable
Testing
The agent includes a suite of tests which should be used to
verify your changes don't break existing functionality. These tests will run with
Github Actions when a pull request is made. More details on running the tests locally can be found
here,
For most contributions it is strongly recommended to add additional tests which
exercise your changes.