Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Access embedded PortablePDB in DLL #752

Merged
merged 6 commits into from Feb 3, 2023

Conversation

vaind
Copy link
Collaborator

@vaind vaind commented Feb 1, 2023

replaces & closes #659

PE (a.k.a a .dll) may contain embedded Portable PDB (when built with <DebugType>embedded</DebugType>) as shown in the following screenshot from dotPeek
image

We need to access that but it seems it's currently impossible with the used PE reader, see the tracking issue: m4b/goblin#314

The metadata buffer is in ECMA 335 format and should contain one module record with
a Module Version ID that we are ultimately interested in.
@codecov
Copy link

codecov bot commented Feb 1, 2023

Codecov Report

Merging #752 (5b37e98) into master (498d9c5) will increase coverage by 0.05%.
The diff coverage is 79.10%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #752      +/-   ##
==========================================
+ Coverage   73.82%   73.88%   +0.05%     
==========================================
  Files          69       69              
  Lines       14884    14951      +67     
==========================================
+ Hits        10988    11046      +58     
- Misses       3896     3905       +9     

@vaind vaind changed the title feat: Support embedded PDB in DLL feat: Support embedded PortablePDB in DLL Feb 1, 2023
@vaind vaind marked this pull request as ready for review February 2, 2023 12:36
@vaind vaind requested a review from a team February 2, 2023 12:36
loewenheim
loewenheim previously approved these changes Feb 2, 2023
symbolic-debuginfo/src/pe.rs Outdated Show resolved Hide resolved
Copy link
Member

@Swatinem Swatinem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome work! Reminds me that I have m4b/goblin#319 just laying there since half a year :-D

symbolic-debuginfo/src/pe.rs Outdated Show resolved Hide resolved
symbolic-debuginfo/src/pe.rs Outdated Show resolved Hide resolved
symbolic-ppdb/tests/test_ppdb.rs Outdated Show resolved Hide resolved
symbolic-ppdb/tests/test_ppdb.rs Outdated Show resolved Hide resolved
@loewenheim loewenheim dismissed their stale review February 2, 2023 13:18

Missed some necessary changes

@Swatinem
Copy link
Member

Swatinem commented Feb 2, 2023

As for the Symbolicator integration, I believe there are two ways to approach this.

  1. As you mentioned, we will not use the PE itself anymore, so we can decide to "unpack" the Portable PDB when fetching the object here: https://github.com/getsentry/symbolicator/blob/9014ed70d127fb33d051b37efa781b01e7e598e2/crates/symbolicator-service/src/services/objects/data_cache.rs#L135-L172
    That way, we can use it "cheaply" for source lookups, and directly create a ppdb cache from it.
    The advantage here is that the source lookups and ppdb cache creation are unchanged, and fast (no decompression needed anymore). The disadvantage is that we throw away the "outer" PE.
  2. You could do the get_embedded_ppdb dance both in source lookup, and in ppdb conversion.
    This is a bit more work, and would still have to do decompression on every source lookup. But it has the advantage of keeping the original PE intact.

Given that we (at this time) are not using the outer PE for anything, I would recommend to just go with 1 and unpack directly when downloading.

@vaind vaind changed the title feat: Support embedded PortablePDB in DLL feat: Access embedded PortablePDB in DLL Feb 2, 2023
@vaind
Copy link
Collaborator Author

vaind commented Feb 2, 2023

As for the Symbolicator integration, I believe there are two ways to approach this.

  1. As you mentioned, we will not use the PE itself anymore, so we can decide to "unpack" the Portable PDB when fetching the object here: getsentry/symbolicator@9014ed7/crates/symbolicator-service/src/services/objects/data_cache.rs#L135-L172
    That way, we can use it "cheaply" for source lookups, and directly create a ppdb cache from it.
    The advantage here is that the source lookups and ppdb cache creation are unchanged, and fast (no decompression needed anymore). The disadvantage is that we throw away the "outer" PE.
  2. You could do the get_embedded_ppdb dance both in source lookup, and in ppdb conversion.
    This is a bit more work, and would still have to do decompression on every source lookup. But it has the advantage of keeping the original PE intact.

Given that we (at this time) are not using the outer PE for anything, I would recommend to just go with 1 and unpack directly when downloading.

I was thinking of a combination of both. The first point for actual symbolication, and the second one (or just a part of it) to display the correct information in the UI (that it has sources). On the other hand, doing an unpack just to check whether there are sources doesn't sound like a great idea and maybe we should skip that for now...

Copy link
Member

@Swatinem Swatinem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost there :-)

Comment on lines +313 to +317
let mut signature: [u8; 4] = [0; 4];
self.data
.gread_inout(&mut offset, &mut signature)
.map_err(PeError::new)?;
if signature != "MPDB".as_bytes() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut signature: [u8; 4] = [0; 4];
self.data
.gread_inout(&mut offset, &mut signature)
.map_err(PeError::new)?;
if signature != "MPDB".as_bytes() {
let signature: [u8; 4] = self.data
.gread(&mut offset)
.map_err(PeError::new)?;
if signature != b"MPDB" {

First time I saw gread_inout.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not the same thing - gread doesn't work with slices

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error[E0277]: the trait bound `[u8; 4]: TryFromCtx<'_, _>` is not satisfied
   --> symbolic-debuginfo\src\pe.rs:313:42
    |
313 |                 let signature: [u8; 4] = self.data.gread(&mut offset).map_err(PeError::new)?;
    |                                          ^^^^^^^^^ ----- required by a bound introduced by this call
    |                                          |
    |                                          the trait `TryFromCtx<'_, _>` is not implemented for `[u8; 4]`
    |
    = help: the trait `TryFromCtx<'a, usize>` is implemented for `&'a [u8]`
note: required by a bound in `gread`
   --> C:\Users\dlugo\.cargo\registry\src\github.com-1ecc6299db9ec823\scroll-0.11.0\src\pread.rs:88:21
    |
88  |     fn gread<'a, N: TryFromCtx<'a, Ctx, Self, Error = E>>(
    |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `gread`
    ```

symbolic-debuginfo/src/pe.rs Show resolved Hide resolved
@Swatinem Swatinem merged commit 2fe64b8 into getsentry:master Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants