Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing PE export #88

Merged
merged 3 commits into from
Jun 10, 2018
Merged

Parsing PE export #88

merged 3 commits into from
Jun 10, 2018

Conversation

tathanhdinh
Copy link
Contributor

IMHO, goblin parser might be "too strict" in parsing export directory table and export entries of "abnormal" PEs.

First, goblin will refuse parsing a PE if it cannot get the RVA of the name pointer table. In fact, an executable PE can have this RVA invalid, for example this sample from Corkami is a totally valid PE (it can even execute), but goblin will reject this PE.

Second, goblin parses export entries using number of names it found in the export name pointer table (and not from number of names), it will stop parsing and reject if there is a name with invalid RVA. But a dll can have invalid RVAs in the export name pointer table, in the following example of wmi.dll in Windows:

wmi_dll

I have modified the first entry (photo on the right), which contains RVA to the name of the first exported function ControlTrace, to zero; Adlice PE viewer still be able to list other entries (other tools like FileAlyzer and PE Insider are be able too). Moreover, the modified file can be loaded happily by LoadLibraryEx API (with DONT_RESOLVE_DLL_REFERENCES flag), and other entries can be found by GetProcAddress using the returned handle (of LoadLibraryEx)

This pull request try to fix these problems (and the problem I have reported in issue 76). Of course YMMV, but please give me some ideas.

@m4b
Copy link
Owner

m4b commented Apr 30, 2018

I haven't looked at changes yet, but thanks for your work here!

I briefly tested on some PE binaries which gave goblin parser issues before, and they seem resolved, which is really awesome.

Since the PR is so large, it will be good to have a couple of people check this out. @willglynn and anyone else are welcome to review.

Also, from a git perspective, I'd like to see the commit history cleaned up a bit; there's a lot of merges, and not very informative git messages.

I know that there isn't a policy on this repo for commit messages but I don't think its useful to have a separate commit with, to pick on a couple, "code cleanup", or "minor" as content.

So I think we can do this two ways:

  1. If you want to preserve several commits assuming this gets ok'd and merged, which I could understand, since its non-trivial amount of work and more commits shows up better in repo stats, etc., then I'd like to ask you to clean up the patch into 2-4 commits.

  2. If that sounds boring and you don't care, then I'd prefer you to either rebase and squash all your commits into one with a good summary of the changes, or whoever merges this should do a squash and merge.

If you don't know how to do the latter or can't be bothered, that's ok, I or others can help out here :)

And thanks for your hard work!

@tathanhdinh
Copy link
Contributor Author

Many thanks for this encouraging response, and sorry for this messy PR. I would prefer the second approach (i.e. rebase + squash). I still do not know how to do this yet, but never mind, I would handle this myself.
Again, thanks for your feedback (I have been hesitated a lot before making the PR, I know that it is messy and though that it would be rejected immediately).

@tathanhdinh
Copy link
Contributor Author

tathanhdinh commented May 6, 2018

Hello all!
I've rebased my fork and merged my commit into a single commit. I am still not sure that is what you are suggested. In brief, what I have done is:

git rebase -i HEAD~26
(move external commits to the top, leave them as pick)
pick my first commit
fixup others

Please do not hesitate to correct me if you find any problem.

@m4b
Copy link
Owner

m4b commented May 6, 2018

hmmm, assuming you just pushed the rebased version, seems you've still got some redundant commits.

I think the easiest way to fix this would be to get the sum of the changes, which now appear to be in 0f87046, and then do something like:

git show > MY_PATCH.diff # assuming all your changes are in HEAD, which looks like they are
git checkout -b branch_backup # save branch just in case
git checkout origin/master && git pull # get latest master, assuming origin is this repo, i.e.: github.com/m4b/goblin
git checkout <your branch name> # i think your branch is actually on master?
git reset --hard origin/master
git apply MY_PATCH.diff # this should only complain about whitespace errors 

I've verified this works for me.

Once the patch is on top of master, you should be able to force push to your branch:

git push --force

Also, I noticed there are some whitespace "errors" (git show should highlight whitespace with red); if you could remove those, i think the patch is ready for review!

@tathanhdinh tathanhdinh reopened this May 22, 2018
@tathanhdinh
Copy link
Contributor Author

Hello all,
Sorry for the delayed update. I have just reapplied the patch following the guide of @m4b (many thanks for your help); the PR now contains a single monolithic commit.

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are great!

However, the public facing API changes, specifically the changes to Export and the exports vector need to be revised as per comments above.

Otherwise, I think this will be ready to merge; thanks for your patience and work here; I'll also note that the goblin parser is now much more robust with this change and parses successfully the last remaining 64 pe files it failed on previously, so again, great work, we're almost there!

@@ -173,16 +200,16 @@ impl<'a> Reexport<'a> {
#[derive(Debug, Default)]
/// An exported symbol in this binary, contains synthetic data (name offset, etc., are computed)
pub struct Export<'a> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the export has every field optional? This doesn't seem right somehow, and i'm not sure what it means exactly for every field to be optional now.

I don't like this, and don't really think this is an appropriate API to expose to consumers; if its somehow:

Export {
  name: None,
  offset: None,
  rva: None
  reexport: None
}

Then the export is really just None, and the api/construction of exports should drop it, not push optionalness into the fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks!!! Your response is reasonable for me.

When making all fields optional, I thought about parsing as much information as possible and return always parsed information. So even when an offset cannot be obtained from rva, the parser continues; or even when an ordinal cannot be obtained (but name is), the parser returns an export entry (with only name), etc.

The existence of an export will every fields are None is indeed weird, it should be just None as you have said.

I've changed the PR to reflect this.

src/pe/export.rs Outdated
name_offset.and_then(|offset| bytes.pread::<&str>(offset).ok())
});

let ordinal = ordinals.get(idx);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should be in the loop below which constructs the export vector; not here, this way you don't have to make every field in export None; I believe your switching name to None is valid, but the other fields all cannot be, for the reasons I gave above.

src/pe/export.rs Outdated
pointers.get(idx as usize).map(|v| *v)
});

let export = bytes.pread_with(0, ExportCtx { ptr: ptr, idx: idx as usize, sections: sections, addresses: addresses, ordinals: ordinals })?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e., here check if idx is valid, if not, you can skip the export

src/pe/mod.rs Outdated
@@ -42,7 +42,7 @@ pub struct PE<'a> {
/// Data for any imported symbols, and from which `dll`, etc., in this binary
pub import_data: Option<import::ImportData<'a>>,
/// The list of exported symbols in this binary, contains synthetic information for easier analysis
pub exports: Vec<export::Export<'a>>,
pub exports: Option<Vec<export::Export<'a>>>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I typically prefer an empty vector for zero exports, as its less heavy-weight on the user, and in principle, they still need to check whether the vector is empty anyway.

In other words, in this usecase, making it optional doesn't really give us any extra semantic help here, so I'd just leave it as a regular vector (and empty if no exports), unless there's some other compelling reason I'm not seeing

@m4b
Copy link
Owner

m4b commented May 28, 2018

Oh, and I'll note goblin cannot currently parse these (65) binaries, but with these changes it can:

psmsrv.dll
shunimpl.dll
winload.exe
nlsfunc.exe
mscdexnt.exe
DeviceUxRes.dll
ntvdmd.dll
vdmredir.dll
ntvdm.exe
fhuxgraphics.dll
winresume.exe
ntasn1.dll
ctl3dv2.dll
msacm.dll
HalExtIntcLpioDMA.dll
mfc100u.dll
compobj.dll
debug.exe
mswdat10.dll
prncache.dll
sysedit.exe
miguiresource.dll
apisetschema.dll
mem.exe
dosx.exe
append.exe
exe2bin.exe
win87em.dll
wscript.exe
uDWM.dll
mswstr10.dll
profapi.dll
olecli.dll
setver.exe
SetProxyCredential.dll
fastopen.exe
dnsext.dll
pmspl.dll
WinSyncMetastore.dll
typelib.dll
mfc100.dll
odbc16gt.dll
ole2nls.dll
msidle.dll
ext-ms-win-gdi-private-l1-1-0.dll
redir.exe
avifile.dll
WinSync.dll
oleacchooks.dll
wfapigp.dll
ole2.dll
edlin.exe
lzexpand.dll
cscript.exe
krnl386.exe
ole2disp.dll
share.exe
XpsGdiConverter.dll
storage.dll
netapi.dll
bootux.dll
HalExtIntcUartDMA.dll
ver.dll
tapi.dll
msrepl40.dll

@m4b
Copy link
Owner

m4b commented May 28, 2018

Also, feel free to push new commits with the changes; no need to rebase and squash, etc.

@tathanhdinh
Copy link
Contributor Author

Many thanks for your review.

I've changed the PR to fix the problems as you have shown.

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tathanhdinh First, thank you for your patience and hard work here! We are in the "home stretch".

I have similar concerns about use of optionals in export data, which should clean up the code in other places, and reduce API breakage.

All things being equal, I think we'll be able to merge after the next round.

Again, thank you for your hard work!

@philipc @willglynn Do you have any other concerns here?

src/pe/export.rs Outdated
pub export_ordinal_table: ExportOrdinalTable,
pub export_address_table: ExportAddressTable,
pub export_name_pointer_table: Option<ExportNamePointerTable>,
pub export_ordinal_table: Option<ExportOrdinalTable>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be optional? Ditto for export_address_table below; the reason I ask is:

  1. Similar to Export, when I see almost everything optional, it indicates that the object should be skipped, etc.
  2. It's more breaking changes anyone who's using ExportData has to deal with; the less pub fields you touch, the better, so if we can require ordinal_table or address_table to be non optional, its better all around.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say ditto for export_name_pointer_table too. I would only use Option for these if you can come up with a reason why you want to distinguish between None and Some(vec![]), ie between invalid RVA and zero size. Otherwise it's simpler for consumers if it's plain Vec that is empty for both invalid RVA and zero size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks to @m4b and @philipc for the encouraging response.

Yes, my argument for Option is to distinguish the case where a correct offset cannot be obtained (i.e. the table does not exist), from the case where the table's offset can be obtained (i.e. the table exists) but its size is zero.

For executable PE(i.e. .exe files), I think that the Windows's PE loader will not care much if the export-related tables are valid or not (it is possible to make a runable .exe having invalid export_name_pointer_table). So these tables "can be skipped".

But I am happy also if we choose to unify these cases :), I don't know if this choice is better or not but, as you have said, it will not break the API and maybe simpler for the consumers.

I have changed the PR to restore these tables to not optional.

src/pe/export.rs Outdated
table.push(name_rva);
}
else {
break;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i understand, we stop parsing all name_pointers here (and other break below for export_address_table) if a single one is bad, yes? is it possible to bump the offset by sizeof(name_rva), and continue?

This might not be a good idea for various reasons, and feel free not to implement this if its too much work, etc., since your PR already successfully parses my PE binary suite, so it would only be super-malformed binaries which isn't really a high priority.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the case where a name_rva cannot be read

for _ in 0..number_of_name_pointers {
  if let Ok(name_rva) = bytes.gread_with(&mut offset, scroll::LE) {
    table.push(name_rva);
    } else {
      break;
    }
}

there would be a serious error in the PE file for which data cannot be read from bytes at the given offset (e.g. the file is truncated), in such a case continuing to read would not help (personal opinion).

src/pe/export.rs Outdated
table.push(ExportAddressTableEntry::ExportRVA(func_rva));
}
}
else {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: else on newline (doesn't match style above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I did not think of the style.

src/pe/export.rs Outdated
pub offset: usize,
pub rva: usize,
pub size: usize,
// pub size: usize,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this field removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, because I do not understand why this field is needed, I have restored size but I still do not understand :)

src/pe/export.rs Outdated
}
else {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: else on newline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked :)

src/pe/export.rs Outdated
pointers.get(idx as usize).map(|v| *v)
});

if let Ok(export) = bytes.pread_with(0, ExportCtx { ptr: ptr, idx: idx as usize, sections: sections, addresses: addresses, ordinals: ordinals }) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great usecase for type punning/field initializer syntax, e.g., ExportCtx { ptr, sections, addresses, ordinals: idx: idx as usize }

src/pe/export.rs Outdated

let mut addresses = &Vec::new();
if let Some(ref table) = export_data.export_address_table {
addresses = table;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That you are doing this here, and below, in order to get an empty vec immediately suggests to me my concern above about the optional values is well-founded, and you should unmark it optional, and the empty vec is the "None" case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I get your point. Letting empty vec as None case would minimize the change, if not we should change also ExportCtx !?

@tathanhdinh
Copy link
Contributor Author

Many thanks for the encouraging response and the exchange of ideas. I have modified the PR to reflect this.

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we've minimized public API breakage, and now PE parser passes everything in my PE suite of 3k system binaries. Really appreciate your work and patience here @tathanhdinh !

@m4b m4b merged commit 3115e6b into m4b:master Jun 10, 2018
@tathanhdinh
Copy link
Contributor Author

Many thanks!!!

@koutheir
Copy link

This issue seems to be fixed in the master, but it is not provided in the version 0.0.15 of goblin, which is declared to be the latest version (See https://docs.rs/crate/goblin/). Could you please create a new version with this fix?

@m4b
Copy link
Owner

m4b commented Jul 14, 2018

Done, version 0.0.16 is out :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants