Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COFF stringtable created with 'strings' function truncates the first 4 characters when used with symbol offset #373

Open
DasStone opened this issue Jul 4, 2023 · 9 comments

Comments

@DasStone
Copy link

DasStone commented Jul 4, 2023

Detailed Problem Description with Example

A symbol with a name longer than 8 characters, will be stored in the string table, as per the PE File Format Documentation given by Microsoft. The offset to the string table can then be determined by the last 4 bytes of the 8 byte long name field if the first 4 bytes are 0 (see SymbolNameRepresentation).

This does not seem to work correctly in goblin. Take a look at this example:

// read file to u8 buffer
let buffer = fs::read(path_pe).unwrap();

let pe = pe::PE::parse(&buffer).unwrap_or_else(|e| {
    println!("err: {:?}", e);
    process::exit(2);
});

let symbol_count = pe.header.coff_header.number_of_symbol_table as usize;
let symbols = pe.header.coff_header.symbols(&buffer).unwrap();
let strtab = pe.header.coff_header.strings(&buffer).unwrap();

// search symbol fibonacci
for i in 0..symbol_count {
    let named_sym = symbols.get(i).unwrap();

    let name = match named_sym {
        (Some(name), _) => name,
        (None, sym) => {
            let strtab_offset = u32::from_le_bytes(sym.name[4..8].try_into().unwrap());
            strtab.get_at(strtab_offset as usize).unwrap()
        },
    };

    if name == "fibonacci" {
        println!("Correct Name");
        println!("{:#?}", named_sym.1);
        println!("Calculated strtab offset: {}", u32::from_le_bytes(named_sym.1.name[4..8].try_into().unwrap()));
    } else if name == "nacci" {
        println!("Name offset by 4 characters");
        println!("{:#?}", named_sym.1);
        println!("Calculated strtab offset: {}", u32::from_le_bytes(named_sym.1.name[4..8].try_into().unwrap()));
    }
}

The provided PE file contains a function called "fibonacci". This can easily be confirmed with objdump -t <file> | grep fibonacci.

[ 88](sec  1)(fl 0x00)(ty   20)(scl   2) (nx 0) 0x0000000000001670 fibonacci

However when executing the provided Rust code on the same file, the output will stem from the second case (name == "nacci"). Here is the exact output:

Name offset by 4 characters
Symbol {
    name: [
        0,
        0,
        0,
        0,
        78,
        4,
        0,
        0,
    ],
    value: 5744,
    section_number: 1,
    typ: 32,
    storage_class: 2,
    number_of_aux_symbols: 0,
}
Calculated strtab offset: 1102

This is also the same entry that was previously found by objdump.

Short Problem Description

It seems, that the strtab parsed by the strings function on a CoffHeader accidentally offsets every string by 4 characters.

Expected Output

I would have expected to receive the full name with the provided Rust code. Not a truncated Version.

I hope that I didn't do anything wrong when parsing the necessary information...

Other Information:

goblin-version: 0.7.1
rustc-version and cargo-version: 1.70.0
OS: Windows 11 Pro Version 22H2

@DasStone DasStone changed the title COFF Stringtable created with 'strings' Function truncates the first 4 characters when used with symbol offset COFF stringtable created with 'strings' function truncates the first 4 characters when used with symbol offset Jul 4, 2023
@m4b
Copy link
Owner

m4b commented Jul 5, 2023

Hmmm, thank you for the detailed writeup! Can you paste the binary by any chance with your fibonnaci function? I'm a little surprised this wouldn't have come up earlier, but it is possibly a regression since I know a change went in recently with pe and string table

@DasStone
Copy link
Author

DasStone commented Jul 5, 2023

Yes of course!

Here is the source code if you want to compile it yourself (but the binary is also attached, see below):

fn main() {
    println!("Here are some fibonacci numbers");
    println!("Fib 50: {}", fibonacci(50));
}

#[no_mangle]
fn fibonacci(n: u64) -> u64 {
    println!("addr of fib: {:#?}", fibonacci as *const u8);
    let mut a = 0;
    let mut b = 1;

    if n == 0 {
        return 0;
    }

    for _ in 1..n {
        let tmp = a + b;
        a = b;
        b = tmp;
    }

    b
}

(The program was called hash-test, but I stripped it of any meaningful computation. All that is left is a main function that calls a fibonacci calculation)
hash-test.zip

@m4b
Copy link
Owner

m4b commented Jul 9, 2023

if you git revert eaba4ed4ae6a8054fd489dd011aeaf8609378b48 can you test if this repro's for you?

@DasStone
Copy link
Author

DasStone commented Jul 9, 2023

Sadly not, undoing the changes from the specified commit does not resolve the issue. I ran the code I provided again (ofc. with the changed version of goblin), and the results are still the same. The name is still offset by 4 characters.

@m4b
Copy link
Owner

m4b commented Jul 9, 2023

hmmm, if i run bingrep on your exe, all the libraries functions it imports seem ok, so i don't think this is a general problem in our strtab parsing itself.

In your example code could you try using the name_offset() api on Symbol? e.g. change this line fromlet strtab_offset = u32::from_le_bytes(sym.name[4..8].try_into().unwrap()); to use let stratb_offset = sym.name_offset().unwrap()

@m4b
Copy link
Owner

m4b commented Jul 9, 2023

@DasStone
Copy link
Author

DasStone commented Jul 10, 2023

In your example code could you try using the name_offset() api on Symbol? e.g. change this line fromlet strtab_offset = u32::from_le_bytes(sym.name[4..8].try_into().unwrap()); to use let stratb_offset = sym.name_offset().unwrap()

I needed to change your suggested fix a bit (since the unwrap() would not work in the current context. But this is no meaningful change):

let strtab_offset = match sym.name_offset() {
                    Some(val) => val,
                    None => continue,
                };

The Problem still persists when testing the "fixed" code with my local version of goblin (the one with git revert eaba4ed4ae6a8054fd489dd011aeaf8609378b48 ).


I also checked how the code would behave with the current release of goblin 0.7.1. It crashes with attempt to subtract with overflow on the exact line you already suggested (src\pe\symbol.rs:240:36).


Running in release seems to "fix" the issue for version 0.7.1

Matching the returned Option from get_at() and just continuing on a None value "resolves" the issue.

let name = match named_sym {
            (Some(name), _) => name,
            (None, sym) => {
                let strtab_offset = match sym.name_offset() {
                    Some(val) => val,
                    None => continue,
                };
                match strtab.get_at(strtab_offset as usize) {
                    Some(name) => name,
                    None => continue,
                }
            }
        };

But this "fix" seems highly suspicious to me. First of all, there should not be a difference when running debug and release (regarding the code in question). Second, I just printed all symbols found by this code and there are some symbols that can't be found with objdump -t <.exe> | grep <sym_name>. I assume this might be due to the overflowing subtraction.

Based on this, I still think that there might be some deeply rooted problem within goblins PE module...

@philipc
Copy link
Collaborator

philipc commented Jul 10, 2023

for i in 0..symbol_count {

This is wrong because it will try to parse auxiliary symbols as regular symbols. Use symbols.iter() instead.

First of all, there should not be a difference when running debug and release

Agreed. name_offset should use checked_sub instead so that we gracefully handle malformed files, but once you fix the iteration it won't matter for the file you are testing on.

@DasStone
Copy link
Author

DasStone commented Jul 10, 2023

This is wrong because it will try to parse auxiliary symbols as regular symbols. Use symbols.iter() instead.

True. It seems to be working now.

Thank you for your help @m4b and @philipc !

(I will leave closing the issue up to you, due to the checked_sub code change)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants