Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Java Source context to stack frames #46497

Merged
merged 18 commits into from
Apr 18, 2023
Merged

Conversation

adinauer
Copy link
Member

@adinauer adinauer commented Mar 29, 2023

Adds a new processor to the JavaPlugin called JavaSourceLookupStacktraceProcessor which looks up stack frames in available source bundles and tries to find the source code line plus some context lines.

JavaSourceLookupStacktraceProcessor wraps the existing JavaStacktraceProcessor as only the first processor that handles a certain frame is called but we need both of them to run. In case of obfuscated sources, JavaStacktraceProcessor deobfuscates them (possibly exploding a single frame into multiple) then we look up sources in JavaSourceLookupStacktraceProcessor.

We're using a new debug file type jvm (see getsentry/relay#2002 and getsentry/sentry-cli#1551). This causes tests to fail because other PRs haven't been merged and released yet.

getsentry/sentry-java#633

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Mar 29, 2023
@codecov
Copy link

codecov bot commented Apr 13, 2023

Codecov Report

Merging #46497 (f09223f) into master (64b8c20) will increase coverage by 0.00%.
The diff coverage is 98.05%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #46497   +/-   ##
=======================================
  Coverage   80.73%   80.74%           
=======================================
  Files        4761     4761           
  Lines      201288   201386   +98     
  Branches    11626    11626           
=======================================
+ Hits       162512   162608   +96     
- Misses      38518    38520    +2     
  Partials      258      258           
Impacted Files Coverage Δ
src/sentry/lang/java/plugin.py 98.31% <97.87%> (+2.91%) ⬆️
src/sentry/lang/java/utils.py 89.47% <100.00%> (+1.06%) ⬆️

... and 5 files with indirect coverage changes

@adinauer adinauer marked this pull request as ready for review April 13, 2023 07:30
@adinauer adinauer requested a review from a team April 13, 2023 07:30
@iker-barriocanal iker-barriocanal requested a review from a team April 13, 2023 10:21
@iker-barriocanal
Copy link
Member

This seems like stack trace processing so requesting a review from the processing team, as they own this part.

Copy link
Member

@Swatinem Swatinem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests are not really happy :-(

Comment on lines 223 to 235
# TODO unable to use dif cache as file can't be recognized as ZIP by ArtifactBundleArchive(file)
difs = ProjectDebugFile.objects.find_by_debug_ids(self.project, self.images)

for new_frame in new_frames:
lineno = new_frame.get("lineno")
if not lineno:
continue

source_file_name = self._build_source_file_name(new_frame)

for key, dif in difs.items():
file = dif.file.getfile(prefetch=True)
archive = ArtifactBundleArchive(file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do a database query for every frame, and then open up the ArtifactBundleArchive for at least every frame (plus, every inlined frame and every debug-id).

We should definitely do this in the constructor of this class, or lazily on-demand.
Are we guaranteed to only have a single debug-id by event? What would happen if we have more than one?

Copy link
Member Author

@adinauer adinauer Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely do this in the constructor of this class, or lazily on-demand.

Moved it to init

Are we guaranteed to only have a single debug-id by event? What would happen if we have more than one?

It should try all of them until it's able to find the source file. Can add a test for this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we guaranteed to only have a single debug-id by event?

Not there could be multiple per event.


platform = frame.get("platform") or self.data.get("platform")
self._handles_frame = (
platform == "java" and self.available and "module" in frame and "lineno" in frame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are looking at the lineno of the raw frame, before proguard mapping. Which means this is out of sync with whatever frames you will actually process in process_frame.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that even with proguard if there's no line number in the first place we'd be unable to produce one via proguard mapping. No lineno means no source lookup possible. Maybe my assumption is wrong. I can remove the lineno check. Though the check here doesn't mean we use the "raw" lineno for lookup. In case JavaStacktraceProcessor (doing proguard mapping) remaps it, we take the remapped lineno. This behaviour is also tested by the last test.

src/sentry/lang/java/utils.py Outdated Show resolved Hide resolved
@@ -138,16 +138,23 @@ def __init__(self, *args, **kwargs):
self._handles_frame = None
self.images = get_jvm_images(self.data)
self.available = len(self.images) > 0
difs = ProjectDebugFile.objects.find_by_debug_ids(self.project, self.images)
self._archives = {}
for key, dif in difs.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you never use the key anywhere, so I think you can just remove it and use a list instead of a dict.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to a list

Copy link
Member

@antonpirker antonpirker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good and can bemerged IF:

  • the Query ProjectDebugFile.objects.find_by_debug_ids() is ok. (Read somewhere that because of hybrid cloud there a new API to use to query stuff, but I am not sure if this is a concern here. Someone with more knowledge of how all this works should be consulted about this)
  • it is OK to load one (or maybe multiple ArtifactBundleArchive(file) into memory on start of the JavaSourceLookupStacktraceProcessor. Again someone with more knowledge about how this is run should be asked if there is a concern of using too much memory here.

@Swatinem
Copy link
Member

  • it is OK to load one (or maybe multiple ArtifactBundleArchive(file) into memory on start of the JavaSourceLookupStacktraceProcessor. Again someone with more knowledge about how this is run should be asked if there is a concern of using too much memory here.

I believe that is fine. The JS processor and lookup API is also loading a couple of bundles at the same time.

src/sentry/lang/java/utils.py Outdated Show resolved Hide resolved
@adinauer adinauer merged commit edd5420 into master Apr 18, 2023
54 checks passed
@adinauer adinauer deleted the feat/java-source-context branch April 18, 2023 08:42
@github-actions github-actions bot locked and limited conversation to collaborators May 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants