-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing PE export #88
Conversation
I haven't looked at changes yet, but thanks for your work here! I briefly tested on some PE binaries which gave goblin parser issues before, and they seem resolved, which is really awesome. Since the PR is so large, it will be good to have a couple of people check this out. @willglynn and anyone else are welcome to review. Also, from a git perspective, I'd like to see the commit history cleaned up a bit; there's a lot of merges, and not very informative git messages. I know that there isn't a policy on this repo for commit messages but I don't think its useful to have a separate commit with, to pick on a couple, "code cleanup", or "minor" as content. So I think we can do this two ways:
If you don't know how to do the latter or can't be bothered, that's ok, I or others can help out here :) And thanks for your hard work! |
Many thanks for this encouraging response, and sorry for this messy PR. I would prefer the second approach (i.e. rebase + squash). I still do not know how to do this yet, but never mind, I would handle this myself. |
Hello all!
Please do not hesitate to correct me if you find any problem. |
hmmm, assuming you just pushed the rebased version, seems you've still got some redundant commits. I think the easiest way to fix this would be to get the sum of the changes, which now appear to be in 0f87046, and then do something like:
I've verified this works for me. Once the patch is on top of master, you should be able to force push to your branch:
Also, I noticed there are some whitespace "errors" (git show should highlight whitespace with red); if you could remove those, i think the patch is ready for review! |
Hello all, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are great!
However, the public facing API changes, specifically the changes to Export and the exports vector need to be revised as per comments above.
Otherwise, I think this will be ready to merge; thanks for your patience and work here; I'll also note that the goblin parser is now much more robust with this change and parses successfully the last remaining 64 pe files it failed on previously, so again, great work, we're almost there!
@@ -173,16 +200,16 @@ impl<'a> Reexport<'a> { | |||
#[derive(Debug, Default)] | |||
/// An exported symbol in this binary, contains synthetic data (name offset, etc., are computed) | |||
pub struct Export<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the export has every field optional? This doesn't seem right somehow, and i'm not sure what it means exactly for every field to be optional now.
I don't like this, and don't really think this is an appropriate API to expose to consumers; if its somehow:
Export {
name: None,
offset: None,
rva: None
reexport: None
}
Then the export is really just None, and the api/construction of exports should drop it, not push optionalness into the fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks!!! Your response is reasonable for me.
When making all fields optional, I thought about parsing as much information as possible and return always parsed information. So even when an offset cannot be obtained from rva, the parser continues; or even when an ordinal cannot be obtained (but name is), the parser returns an export entry (with only name), etc.
The existence of an export will every fields are None
is indeed weird, it should be just None
as you have said.
I've changed the PR to reflect this.
src/pe/export.rs
Outdated
name_offset.and_then(|offset| bytes.pread::<&str>(offset).ok()) | ||
}); | ||
|
||
let ordinal = ordinals.get(idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check should be in the loop below which constructs the export vector; not here, this way you don't have to make every field in export None; I believe your switching name to None is valid, but the other fields all cannot be, for the reasons I gave above.
src/pe/export.rs
Outdated
pointers.get(idx as usize).map(|v| *v) | ||
}); | ||
|
||
let export = bytes.pread_with(0, ExportCtx { ptr: ptr, idx: idx as usize, sections: sections, addresses: addresses, ordinals: ordinals })?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e., here check if idx is valid, if not, you can skip the export
src/pe/mod.rs
Outdated
@@ -42,7 +42,7 @@ pub struct PE<'a> { | |||
/// Data for any imported symbols, and from which `dll`, etc., in this binary | |||
pub import_data: Option<import::ImportData<'a>>, | |||
/// The list of exported symbols in this binary, contains synthetic information for easier analysis | |||
pub exports: Vec<export::Export<'a>>, | |||
pub exports: Option<Vec<export::Export<'a>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I typically prefer an empty vector for zero exports, as its less heavy-weight on the user, and in principle, they still need to check whether the vector is empty anyway.
In other words, in this usecase, making it optional doesn't really give us any extra semantic help here, so I'd just leave it as a regular vector (and empty if no exports), unless there's some other compelling reason I'm not seeing
Oh, and I'll note goblin cannot currently parse these (65) binaries, but with these changes it can:
|
Also, feel free to push new commits with the changes; no need to rebase and squash, etc. |
Many thanks for your review. I've changed the PR to fix the problems as you have shown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tathanhdinh First, thank you for your patience and hard work here! We are in the "home stretch".
I have similar concerns about use of optionals in export data, which should clean up the code in other places, and reduce API breakage.
All things being equal, I think we'll be able to merge after the next round.
Again, thank you for your hard work!
@philipc @willglynn Do you have any other concerns here?
src/pe/export.rs
Outdated
pub export_ordinal_table: ExportOrdinalTable, | ||
pub export_address_table: ExportAddressTable, | ||
pub export_name_pointer_table: Option<ExportNamePointerTable>, | ||
pub export_ordinal_table: Option<ExportOrdinalTable>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be optional? Ditto for export_address_table below; the reason I ask is:
- Similar to
Export
, when I see almost everything optional, it indicates that the object should be skipped, etc. - It's more breaking changes anyone who's using ExportData has to deal with; the less
pub
fields you touch, the better, so if we can require ordinal_table or address_table to be non optional, its better all around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say ditto for export_name_pointer_table
too. I would only use Option
for these if you can come up with a reason why you want to distinguish between None
and Some(vec![])
, ie between invalid RVA and zero size. Otherwise it's simpler for consumers if it's plain Vec
that is empty for both invalid RVA and zero size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks to @m4b and @philipc for the encouraging response.
Yes, my argument for Option
is to distinguish the case where a correct offset cannot be obtained (i.e. the table does not exist), from the case where the table's offset can be obtained (i.e. the table exists) but its size is zero.
For executable PE
(i.e. .exe
files), I think that the Windows's PE loader will not care much if the export-related tables are valid or not (it is possible to make a runable .exe
having invalid export_name_pointer_table
). So these tables "can be skipped".
But I am happy also if we choose to unify these cases :), I don't know if this choice is better or not but, as you have said, it will not break the API and maybe simpler for the consumers.
I have changed the PR to restore these tables to not optional.
src/pe/export.rs
Outdated
table.push(name_rva); | ||
} | ||
else { | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if i understand, we stop parsing all name_pointers
here (and other break
below for export_address_table
) if a single one is bad, yes? is it possible to bump the offset by sizeof(name_rva), and continue?
This might not be a good idea for various reasons, and feel free not to implement this if its too much work, etc., since your PR already successfully parses my PE binary suite, so it would only be super-malformed binaries which isn't really a high priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the case where a name_rva
cannot be read
for _ in 0..number_of_name_pointers {
if let Ok(name_rva) = bytes.gread_with(&mut offset, scroll::LE) {
table.push(name_rva);
} else {
break;
}
}
there would be a serious error in the PE file for which data cannot be read from bytes
at the given offset
(e.g. the file is truncated), in such a case continuing to read would not help (personal opinion).
src/pe/export.rs
Outdated
table.push(ExportAddressTableEntry::ExportRVA(func_rva)); | ||
} | ||
} | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: else on newline (doesn't match style above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I did not think of the style.
src/pe/export.rs
Outdated
pub offset: usize, | ||
pub rva: usize, | ||
pub size: usize, | ||
// pub size: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this field removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, because I do not understand why this field is needed, I have restored size
but I still do not understand :)
src/pe/export.rs
Outdated
} | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: else on newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked :)
src/pe/export.rs
Outdated
pointers.get(idx as usize).map(|v| *v) | ||
}); | ||
|
||
if let Ok(export) = bytes.pread_with(0, ExportCtx { ptr: ptr, idx: idx as usize, sections: sections, addresses: addresses, ordinals: ordinals }) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great usecase for type punning/field initializer syntax, e.g., ExportCtx { ptr, sections, addresses, ordinals: idx: idx as usize }
src/pe/export.rs
Outdated
|
||
let mut addresses = &Vec::new(); | ||
if let Some(ref table) = export_data.export_address_table { | ||
addresses = table; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That you are doing this here, and below, in order to get an empty vec immediately suggests to me my concern above about the optional values is well-founded, and you should unmark it optional, and the empty vec is the "None" case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I get your point. Letting empty vec as None
case would minimize the change, if not we should change also ExportCtx
!?
Many thanks for the encouraging response and the exchange of ideas. I have modified the PR to reflect this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, we've minimized public API breakage, and now PE parser passes everything in my PE suite of 3k system binaries. Really appreciate your work and patience here @tathanhdinh !
Many thanks!!! |
This issue seems to be fixed in the master, but it is not provided in the version |
Done, version 0.0.16 is out :) |
IMHO, goblin parser might be "too strict" in parsing export directory table and export entries of "abnormal" PEs.
First, goblin will refuse parsing a PE if it cannot get the RVA of the name pointer table. In fact, an executable PE can have this RVA invalid, for example this sample from Corkami is a totally valid PE (it can even execute), but goblin will reject this PE.
Second, goblin parses export entries using number of names it found in the
export name pointer table
(and not fromnumber of names
), it will stop parsing and reject if there is a name with invalid RVA. But adll
can have invalid RVAs in theexport name pointer table
, in the following example ofwmi.dll
in Windows:I have modified the first entry (photo on the right), which contains RVA to the name of the first exported function
ControlTrace
, to zero;Adlice PE viewer
still be able to list other entries (other tools likeFileAlyzer
andPE Insider
are be able too). Moreover, the modified file can be loaded happily byLoadLibraryEx
API (withDONT_RESOLVE_DLL_REFERENCES
flag), and other entries can be found byGetProcAddress
using the returned handle (ofLoadLibraryEx
)This pull request try to fix these problems (and the problem I have reported in issue 76). Of course YMMV, but please give me some ideas.