Parsing PE export #88

tathanhdinh · 2018-04-29T21:19:10Z

IMHO, goblin parser might be "too strict" in parsing export directory table and export entries of "abnormal" PEs.

First, goblin will refuse parsing a PE if it cannot get the RVA of the name pointer table. In fact, an executable PE can have this RVA invalid, for example this sample from Corkami is a totally valid PE (it can even execute), but goblin will reject this PE.

Second, goblin parses export entries using number of names it found in the export name pointer table (and not from number of names), it will stop parsing and reject if there is a name with invalid RVA. But a dll can have invalid RVAs in the export name pointer table, in the following example of wmi.dll in Windows:

I have modified the first entry (photo on the right), which contains RVA to the name of the first exported function ControlTrace, to zero; Adlice PE viewer still be able to list other entries (other tools like FileAlyzer and PE Insider are be able too). Moreover, the modified file can be loaded happily by LoadLibraryEx API (with DONT_RESOLVE_DLL_REFERENCES flag), and other entries can be found by GetProcAddress using the returned handle (of LoadLibraryEx)

This pull request try to fix these problems (and the problem I have reported in issue 76). Of course YMMV, but please give me some ideas.

m4b · 2018-04-30T02:01:43Z

I haven't looked at changes yet, but thanks for your work here!

I briefly tested on some PE binaries which gave goblin parser issues before, and they seem resolved, which is really awesome.

Since the PR is so large, it will be good to have a couple of people check this out. @willglynn and anyone else are welcome to review.

Also, from a git perspective, I'd like to see the commit history cleaned up a bit; there's a lot of merges, and not very informative git messages.

I know that there isn't a policy on this repo for commit messages but I don't think its useful to have a separate commit with, to pick on a couple, "code cleanup", or "minor" as content.

So I think we can do this two ways:

If you want to preserve several commits assuming this gets ok'd and merged, which I could understand, since its non-trivial amount of work and more commits shows up better in repo stats, etc., then I'd like to ask you to clean up the patch into 2-4 commits.
If that sounds boring and you don't care, then I'd prefer you to either rebase and squash all your commits into one with a good summary of the changes, or whoever merges this should do a squash and merge.

If you don't know how to do the latter or can't be bothered, that's ok, I or others can help out here :)

And thanks for your hard work!

tathanhdinh · 2018-05-01T14:04:36Z

Many thanks for this encouraging response, and sorry for this messy PR. I would prefer the second approach (i.e. rebase + squash). I still do not know how to do this yet, but never mind, I would handle this myself.
Again, thanks for your feedback (I have been hesitated a lot before making the PR, I know that it is messy and though that it would be rejected immediately).

tathanhdinh · 2018-05-06T21:07:47Z

Hello all!
I've rebased my fork and merged my commit into a single commit. I am still not sure that is what you are suggested. In brief, what I have done is:

git rebase -i HEAD~26
(move external commits to the top, leave them as pick)
pick my first commit
fixup others

Please do not hesitate to correct me if you find any problem.

m4b · 2018-05-06T22:33:39Z

hmmm, assuming you just pushed the rebased version, seems you've still got some redundant commits.

I think the easiest way to fix this would be to get the sum of the changes, which now appear to be in 0f87046, and then do something like:

git show > MY_PATCH.diff # assuming all your changes are in HEAD, which looks like they are
git checkout -b branch_backup # save branch just in case
git checkout origin/master && git pull # get latest master, assuming origin is this repo, i.e.: github.com/m4b/goblin
git checkout <your branch name> # i think your branch is actually on master?
git reset --hard origin/master
git apply MY_PATCH.diff # this should only complain about whitespace errors

I've verified this works for me.

Once the patch is on top of master, you should be able to force push to your branch:

git push --force

Also, I noticed there are some whitespace "errors" (git show should highlight whitespace with red); if you could remove those, i think the patch is ready for review!

tathanhdinh · 2018-05-22T16:36:22Z

Hello all,
Sorry for the delayed update. I have just reapplied the patch following the guide of @m4b (many thanks for your help); the PR now contains a single monolithic commit.

m4b

These changes are great!

However, the public facing API changes, specifically the changes to Export and the exports vector need to be revised as per comments above.

Otherwise, I think this will be ready to merge; thanks for your patience and work here; I'll also note that the goblin parser is now much more robust with this change and parses successfully the last remaining 64 pe files it failed on previously, so again, great work, we're almost there!

m4b · 2018-05-28T02:42:25Z

src/pe/export.rs

@@ -173,16 +200,16 @@ impl<'a> Reexport<'a> {
 #[derive(Debug, Default)]
 /// An exported symbol in this binary, contains synthetic data (name offset, etc., are computed)
 pub struct Export<'a> {


So the export has every field optional? This doesn't seem right somehow, and i'm not sure what it means exactly for every field to be optional now.

I don't like this, and don't really think this is an appropriate API to expose to consumers; if its somehow:

Export { name: None, offset: None, rva: None reexport: None }

Then the export is really just None, and the api/construction of exports should drop it, not push optionalness into the fields.

Many thanks!!! Your response is reasonable for me.

When making all fields optional, I thought about parsing as much information as possible and return always parsed information. So even when an offset cannot be obtained from rva, the parser continues; or even when an ordinal cannot be obtained (but name is), the parser returns an export entry (with only name), etc.

The existence of an export will every fields are None is indeed weird, it should be just None as you have said.

I've changed the PR to reflect this.

m4b · 2018-05-28T02:45:28Z

src/pe/export.rs

+            name_offset.and_then(|offset| bytes.pread::<&str>(offset).ok())
+        });
+
+        let ordinal = ordinals.get(idx);


This check should be in the loop below which constructs the export vector; not here, this way you don't have to make every field in export None; I believe your switching name to None is valid, but the other fields all cannot be, for the reasons I gave above.

m4b · 2018-05-28T02:46:15Z

src/pe/export.rs

+                pointers.get(idx as usize).map(|v| *v)
+            });
+
+            let export = bytes.pread_with(0, ExportCtx { ptr: ptr, idx: idx as usize, sections: sections, addresses: addresses, ordinals: ordinals })?;


I.e., here check if idx is valid, if not, you can skip the export

m4b · 2018-05-28T02:49:51Z

src/pe/mod.rs

@@ -42,7 +42,7 @@ pub struct PE<'a> {
    /// Data for any imported symbols, and from which `dll`, etc., in this binary
    pub import_data: Option<import::ImportData<'a>>,
    /// The list of exported symbols in this binary, contains synthetic information for easier analysis
-    pub exports: Vec<export::Export<'a>>,
+    pub exports: Option<Vec<export::Export<'a>>>,


I typically prefer an empty vector for zero exports, as its less heavy-weight on the user, and in principle, they still need to check whether the vector is empty anyway.

In other words, in this usecase, making it optional doesn't really give us any extra semantic help here, so I'd just leave it as a regular vector (and empty if no exports), unless there's some other compelling reason I'm not seeing

m4b · 2018-05-28T03:01:57Z

Oh, and I'll note goblin cannot currently parse these (65) binaries, but with these changes it can:

psmsrv.dll
shunimpl.dll
winload.exe
nlsfunc.exe
mscdexnt.exe
DeviceUxRes.dll
ntvdmd.dll
vdmredir.dll
ntvdm.exe
fhuxgraphics.dll
winresume.exe
ntasn1.dll
ctl3dv2.dll
msacm.dll
HalExtIntcLpioDMA.dll
mfc100u.dll
compobj.dll
debug.exe
mswdat10.dll
prncache.dll
sysedit.exe
miguiresource.dll
apisetschema.dll
mem.exe
dosx.exe
append.exe
exe2bin.exe
win87em.dll
wscript.exe
uDWM.dll
mswstr10.dll
profapi.dll
olecli.dll
setver.exe
SetProxyCredential.dll
fastopen.exe
dnsext.dll
pmspl.dll
WinSyncMetastore.dll
typelib.dll
mfc100.dll
odbc16gt.dll
ole2nls.dll
msidle.dll
ext-ms-win-gdi-private-l1-1-0.dll
redir.exe
avifile.dll
WinSync.dll
oleacchooks.dll
wfapigp.dll
ole2.dll
edlin.exe
lzexpand.dll
cscript.exe
krnl386.exe
ole2disp.dll
share.exe
XpsGdiConverter.dll
storage.dll
netapi.dll
bootux.dll
HalExtIntcUartDMA.dll
ver.dll
tapi.dll
msrepl40.dll

m4b · 2018-05-28T04:55:01Z

Also, feel free to push new commits with the changes; no need to rebase and squash, etc.

tathanhdinh · 2018-05-29T16:26:42Z

Many thanks for your review.

I've changed the PR to fix the problems as you have shown.

m4b

@tathanhdinh First, thank you for your patience and hard work here! We are in the "home stretch".

I have similar concerns about use of optionals in export data, which should clean up the code in other places, and reduce API breakage.

All things being equal, I think we'll be able to merge after the next round.

Again, thank you for your hard work!

@philipc @willglynn Do you have any other concerns here?

m4b · 2018-06-03T17:58:01Z

src/pe/export.rs

-    pub export_ordinal_table: ExportOrdinalTable,
-    pub export_address_table: ExportAddressTable,
+    pub export_name_pointer_table: Option<ExportNamePointerTable>,
+    pub export_ordinal_table: Option<ExportOrdinalTable>,


Can this be optional? Ditto for export_address_table below; the reason I ask is:

Similar to Export, when I see almost everything optional, it indicates that the object should be skipped, etc.

It's more breaking changes anyone who's using ExportData has to deal with; the less pub fields you touch, the better, so if we can require ordinal_table or address_table to be non optional, its better all around.

I would say ditto for export_name_pointer_table too. I would only use Option for these if you can come up with a reason why you want to distinguish between None and Some(vec![]), ie between invalid RVA and zero size. Otherwise it's simpler for consumers if it's plain Vec that is empty for both invalid RVA and zero size.

Many thanks to @m4b and @philipc for the encouraging response.

Yes, my argument for Option is to distinguish the case where a correct offset cannot be obtained (i.e. the table does not exist), from the case where the table's offset can be obtained (i.e. the table exists) but its size is zero.

For executable PE(i.e. .exe files), I think that the Windows's PE loader will not care much if the export-related tables are valid or not (it is possible to make a runable .exe having invalid export_name_pointer_table). So these tables "can be skipped".

But I am happy also if we choose to unify these cases :), I don't know if this choice is better or not but, as you have said, it will not break the API and maybe simpler for the consumers.

I have changed the PR to restore these tables to not optional.

m4b · 2018-06-03T18:01:04Z

src/pe/export.rs

+                    table.push(name_rva);
+                }
+                else {
+                    break;


if i understand, we stop parsing all name_pointers here (and other break below for export_address_table) if a single one is bad, yes? is it possible to bump the offset by sizeof(name_rva), and continue?

This might not be a good idea for various reasons, and feel free not to implement this if its too much work, etc., since your PR already successfully parses my PE binary suite, so it would only be super-malformed binaries which isn't really a high priority.

I think in the case where a name_rva cannot be read

for _ in 0..number_of_name_pointers { if let Ok(name_rva) = bytes.gread_with(&mut offset, scroll::LE) { table.push(name_rva); } else { break; } }

there would be a serious error in the PE file for which data cannot be read from bytes at the given offset (e.g. the file is truncated), in such a case continuing to read would not help (personal opinion).

m4b · 2018-06-03T18:01:34Z

src/pe/export.rs

+                        table.push(ExportAddressTableEntry::ExportRVA(func_rva));
+                    }
+                }
+                else {


Nit: else on newline (doesn't match style above)

Sorry, I did not think of the style.

m4b · 2018-06-03T18:02:05Z

src/pe/export.rs

    pub offset: usize,
    pub rva: usize,
-    pub size: usize,
+    // pub size: usize,


Why was this field removed?

Sorry, because I do not understand why this field is needed, I have restored size but I still do not understand :)

m4b · 2018-06-03T18:03:03Z

src/pe/export.rs

            }
+            else {


Nit: else on newline

m4b · 2018-06-03T18:07:48Z

src/pe/export.rs

+                pointers.get(idx as usize).map(|v| *v)
+            });
+
+            if let Ok(export) = bytes.pread_with(0, ExportCtx { ptr: ptr, idx: idx as usize, sections: sections, addresses: addresses, ordinals: ordinals }) {


Great usecase for type punning/field initializer syntax, e.g., ExportCtx { ptr, sections, addresses, ordinals: idx: idx as usize }

m4b · 2018-06-03T18:09:30Z

src/pe/export.rs

+
+        let mut addresses = &Vec::new();
+        if let Some(ref table) = export_data.export_address_table {
+            addresses = table;


That you are doing this here, and below, in order to get an empty vec immediately suggests to me my concern above about the optional values is well-founded, and you should unmark it optional, and the empty vec is the "None" case.

Ah, I think I get your point. Letting empty vec as None case would minimize the change, if not we should change also ExportCtx !?

tathanhdinh · 2018-06-05T01:30:55Z

Many thanks for the encouraging response and the exchange of ideas. I have modified the PR to reflect this.

m4b

LGTM, we've minimized public API breakage, and now PE parser passes everything in my PE suite of 3k system binaries. Really appreciate your work and patience here @tathanhdinh !

tathanhdinh · 2018-06-11T07:02:52Z

Many thanks!!!

koutheir · 2018-07-14T02:31:12Z

This issue seems to be fixed in the master, but it is not provided in the version 0.0.15 of goblin, which is declared to be the latest version (See https://docs.rs/crate/goblin/). Could you please create a new version with this fix?

m4b · 2018-07-14T22:17:03Z

Done, version 0.0.16 is out :)

tathanhdinh force-pushed the master branch from b13c912 to 0f87046 Compare May 6, 2018 14:14

tathanhdinh closed this May 22, 2018

tathanhdinh force-pushed the master branch from 0f87046 to d83b4ec Compare May 22, 2018 15:39

Parse export data directory/entries in case of corrupted data

dc0ef43

tathanhdinh reopened this May 22, 2018

m4b requested changes May 28, 2018

View reviewed changes

Changed by request

941c0c6

tathanhdinh force-pushed the master branch from 277bcce to 941c0c6 Compare May 29, 2018 16:04

m4b requested changes Jun 3, 2018

View reviewed changes

Changed by request

ffbcd2a

tathanhdinh force-pushed the master branch from 87f898b to ffbcd2a Compare June 4, 2018 22:06

m4b approved these changes Jun 10, 2018

View reviewed changes

m4b merged commit 3115e6b into m4b:master Jun 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing PE export #88

Parsing PE export #88

tathanhdinh commented Apr 29, 2018

m4b commented Apr 30, 2018

tathanhdinh commented May 1, 2018

tathanhdinh commented May 6, 2018 •

edited

m4b commented May 6, 2018

tathanhdinh commented May 22, 2018

m4b left a comment

m4b May 28, 2018

tathanhdinh May 29, 2018

m4b May 28, 2018

m4b May 28, 2018

m4b May 28, 2018

m4b commented May 28, 2018

m4b commented May 28, 2018

tathanhdinh commented May 29, 2018

m4b left a comment

m4b Jun 3, 2018

philipc Jun 4, 2018

tathanhdinh Jun 5, 2018

m4b Jun 3, 2018

tathanhdinh Jun 5, 2018 •

edited

m4b Jun 3, 2018

tathanhdinh Jun 5, 2018

m4b Jun 3, 2018

tathanhdinh Jun 5, 2018 •

edited

m4b Jun 3, 2018

tathanhdinh Jun 5, 2018 •

edited

m4b Jun 3, 2018

m4b Jun 3, 2018

tathanhdinh Jun 5, 2018

tathanhdinh commented Jun 5, 2018

m4b left a comment

tathanhdinh commented Jun 11, 2018

koutheir commented Jul 14, 2018

m4b commented Jul 14, 2018

Parsing PE export #88

Parsing PE export #88

Conversation

tathanhdinh commented Apr 29, 2018

m4b commented Apr 30, 2018

tathanhdinh commented May 1, 2018

tathanhdinh commented May 6, 2018 • edited

m4b commented May 6, 2018

tathanhdinh commented May 22, 2018

m4b left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m4b commented May 28, 2018

m4b commented May 28, 2018

tathanhdinh commented May 29, 2018

m4b left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tathanhdinh Jun 5, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tathanhdinh Jun 5, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tathanhdinh Jun 5, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tathanhdinh commented Jun 5, 2018

m4b left a comment

Choose a reason for hiding this comment

tathanhdinh commented Jun 11, 2018

koutheir commented Jul 14, 2018

m4b commented Jul 14, 2018

tathanhdinh commented May 6, 2018 •

edited

tathanhdinh Jun 5, 2018 •

edited

tathanhdinh Jun 5, 2018 •

edited

tathanhdinh Jun 5, 2018 •

edited