Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multiple LOAD program headers with different biases #107

Merged
merged 2 commits into from
Oct 10, 2018

Conversation

philipc
Copy link
Collaborator

@philipc philipc commented Oct 6, 2018

Partial fix for #106

Note that this changes elf::dyn::DynamicInfo so that it is no longer the same as elf::dyn::dyn32::DynamicInfo.

@philipc philipc requested a review from m4b October 6, 2018 09:08
@m4b
Copy link
Owner

m4b commented Oct 7, 2018

Running this code on patch, I'm seeing a lot of what could be garbage (indicative of bad bias calculations?):

                st_name: 256160347 UNKNOWN_STB OBJECT st_other: 40 st_shndx: 10544 st_value: a3416c12eaa1bae1 st_size: 4343740557238714799,
                st_name: 1231268623 GLOBAL FUNC st_other: 13 st_shndx: 2453 st_value: f5267cdcea69ca7 st_size: 17000724336414452430,
                st_name: 322171258 UNKNOWN_STB OBJECT st_other: 52 st_shndx: 39099 st_value: 5d8533ab0f5267d3 st_size: 938747773043678002,
                st_name: 3467145249 UNKNOWN_STB UNKNOWN_STT st_other: 202 st_shndx: 27083 st_value: 93ac3a015d805dba st_size: 7622048091411052839,
                st_name: 3793167682 UNKNOWN_STB TLS st_other: 105 st_shndx: 41898 st_value: 397f9993f4c22f23 st_size: 14891172709048592475,
                st_name: 625681951 UNKNOWN_STB UNKNOWN_STT st_other: 130 st_shndx: 46492 st_value: 304b1a28162944b7 st_size: 14891172735670931007,
                st_name: 810239313 UNKNOWN_STB COMMON st_other: 161 st_shndx: 21404 st_value: ceaef39821260a17 st_size: 578240182440425589,
                st_name: 2746643623 UNKNOWN_STB UNKNOWN_STT st_other: 81 st_shndx: 3629 st_value: 7614838dc57e59b6 st_size: 8196851217518267431,
                st_name: 3976518154 UNKNOWN_STB UNKNOWN_STT st_other: 248 st_shndx: 56631 st_value: a423f7dfa3b674ad st_size: 870023370323442660,
                st_name: 1870234271 UNKNOWN_STB GNU_IFUNC st_other: 142 st_shndx: 29772 st_value: 3c481138ceabc1e3 st_size: 1381626612315706365,
                st_name: 3573891993 UNKNOWN_STB TLS st_other: 4 st_shndx: 9547 st_value: e8e78c35563cdbc6 st_size: 11014612366144553449,
                st_name: 1011355967 UNKNOWN_STB UNKNOWN_STT st_other: 137 st_shndx: 3622 st_value: a09c27a1f87554dc st_size: 14893380077069025244,
                st_name: 257056982 UNKNOWN_STB UNKNOWN_STT st_other: 161 st_shndx: 23163 st_value: 27dbfbbfceafea46 st_size: 1019653916751153770,
                st_name: 2753820660 UNKNOWN_STB OBJECT st_other: 206 st_shndx: 24942 st_value: 5d7df065ceafea4a st_size: 85899345924,
                st_name: 3 UNKNOWN_STB NUM st_other: 78 st_shndx: 85 st_value: 6165aa2043794870 st_size: 17404001873853089997,
                st_name: 3401280212 LOCAL NOTYPE st_other: 0 st_shndx: 0 st_value: 0 st_size: 0,
                st_name: 0 LOCAL NOTYPE st_other: 0 st_shndx: 0 st_value: 0 st_size: 0,
                st_name: 0 LOCAL NOTYPE st_other: 0 st_shndx: 0 st_value: 0 st_size: 0,
                st_name: 0 LOCAL NOTYPE st_other: 0 st_shndx: 0 st_value: 0 st_size: 0,
                st_name: 0 LOCAL NOTYPE st_other: 0 st_shndx: 0 st_value: 0 st_size: 0,
                st_name: 0 LOCAL NOTYPE st_other: 0 st_shndx: 0 st_value: 0 st_size: 0,
                st_name: 0 LOCAL NOTYPE st_other: 0 st_shndx: 0 st_value: 0 st_size: 0,
                st_name: 0 LOCAL NOTYPE st_other: 0 st_shndx: 0 st_value: 0 st_size: 0,

Similarly, when i run bingrep using the new patch (it's much better than before, and gets past the program headers), but fails printing the remainder, in particular the size of a note is bad (and I'm sure if it got to the symbols, it would fail there as well.

Before:

Malformed entity: Strtable size (9556) + offset (4280848) is out of bounds for 2070376 #bytes. Overflowed: false
m4b@efrit ::  [ ~/projects/bingrep ] RUST_LOG=goblin=debug cargo run -- ../goblin/numpy/core/multiarray.so
    Finished dev [unoptimized + debuginfo] target(s) in 0.10s                                                                                         
     Running `target/debug/bingrep ../goblin/numpy/core/multiarray.so`
ELF DYN X86_64-little-endian @ 0x1eee0:

e_phoff: 0x40 e_shoff: 0x1f2040 e_flags: 0x0 e_ehsize: 64 e_phentsize: 56 e_phnum: 9 e_shentsize: 64 e_shnum: 28 e_shstrndx: 23

ProgramHeaders(9):
  Idx   Type              Flags   Offset      Vaddr       Paddr       Filesz      Memsz       Align     
  0     PT_LOAD           R+X     0x0         0x0         0x0         0x1d905c    0x1d905c    0x200000  
  1     PT_GNU_STACK      RW      0x0         0x0         0x0         0x0         0x0         0x10      
  2     PT_NOTE           R       0x1c8       0x1c8       0x1c8       0x24        0x24        0x4       
  3     PT_GNU_EH_FRAME   R       0x1a0fa4    0x1a0fa4    0x1a0fa4    0x8a94      0x8a94      0x4       
  4     PT_LOAD           RW      0x1d9060    0x3d9060    0x3d9060    0x18e60     0x18e60     0x200000  
  5     PT_TLS            R       0x1d9060    0x3d9060    0x3d9060    0x0         0x0         0x10      
  6     PT_LOAD           RW      0x1f3000    0x411000    0x411000    0x3010      0x3010      0x1000    
  7     PT_DYNAMIC        RW      0x1f7000    0x415000    0x415000    0x210       0x210       0x8       
  8     PT_LOAD           RW      0x1f7000    0x415000    0x415000    0x2768      0x2768      0x1000    
DEBUG:<unknown>: NoteIterator - 0x1c8
DEBUG:<unknown>: NoteHeader { n_namesz: 2, n_descsz: 6, n_type: 2060288 } - 0xc
DEBUG:<unknown>: note name  - 0x10
DEBUG:<unknown>: desc [0, 80, 65, 0, 0, 0] - 0x18
DEBUG:<unknown>: NoteIterator - 0x1e0
DEBUG:<unknown>: NoteHeader { n_namesz: 4280320, n_descsz: 0, n_type: 528 } - 0xc
type is too big (4280319) for 2069884

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapping out goblin backend for bingrep seems to confirm this still has trouble with the numpy binaries; need to get it figured out before can merge.

} // end if_std
}

if_alloc! {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: how come this got moved to alloc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was always in alloc. I just moved it into another macro and kept it in alloc without thinking about whether it should be. I can move it out as part of this PR if you want, but it's probably better as another PR.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah just leave

@@ -347,7 +324,6 @@ if_sylvan! {
is_64: is_64,
is_lib: is_lib,
entry: entry as u64,
bias: bias as u64,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm; I don't like to see this field go, but given now we're supporting multiple biases we have to. Or maybe I'm wrong and its useless? Still might be nice to expose the iteration macro as a function hanging off the Elf type for users to pass an unbiased value in and get biased value out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there's no single bias, this field seems wrong or at least misleading to me. Exposing the iteration might be ok, but we can wait until someone needs it. The current iteration can't be part of the Elf type because we need it during construction of the Elf type, but any exposed version would be best as part of the Elf type.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, makes sense

macro_rules! elf_dynamic_info_std_impl {
($size:ident, $phdr:ty) => {
/// Convert a virtual memory address to a file offset
fn vm_to_offset(phdrs: &[$phdr], address: $size) -> Option<$size> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some reservations about looping over the phdrs everytime an address lookup is requested; we really need to create an interval tree (and we could also use this in PE, which has very similar lookups).

I know you're trying to avoid allocation, but core dumps, for example, can have hundreds of phdrs, but then that might not ever have any bias calculations? I guess if this only occurs in the single pass over the dynamic info we're ok. I dunno, just some thoughts here, as opposed to allocating an associative list or a proper interval tree (which rust afaics doesn't have a robust one with a nice api unfortunately...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be overkill to do that for the single pass over the dynamic info.

@philipc
Copy link
Collaborator Author

philipc commented Oct 8, 2018

Running this code on patch, I'm seeing a lot of what could be garbage (indicative of bad bias calculations?):

No, this is because the dynamic info only has the start of the symtab, not it's size. The current calculation assumes it finishes at the strtab, and for this file, the symtab is at the start and the strtab is at the end. I guess we could do better by assuming it ends at the end of the program header containing its virtual address, it's still too long though so there will still be invalid symbols.

In my opinion the best fix is to use the .dynsym section if it exists.

@philipc
Copy link
Collaborator Author

philipc commented Oct 8, 2018

in particular the size of a note is bad

Looks to me like the file really does have a bad size for the note.. or more likely, that note segment is garbage, because the previous note doesn't look correct either, and there is no corresponding note section to match it. Note that binutils 'readelf -n' doesn't display notes in NOTE segments unless it is a core file. So in my opinion the fix is bingrep should handle bad notes.

@m4b
Copy link
Owner

m4b commented Oct 8, 2018

Wow ok, will check if skipping the note can let me parse rest of binary. Just realized that the debug dump using rdr example can parse it, so might be ok. Let me get back to you; great work though, thanks!

I think we’ll publish a new 0.18 once this goes in, its a good feature :)

@m4b
Copy link
Owner

m4b commented Oct 9, 2018

@philipc are you sure the symbol table is correctly constructed in this new patch? I'm seeing complete garbage using both example=rdr and bingrep to parse the multiarray.so example:

  451e78c085fffff6    LOCAL         NOTYPE        BAD NAME                               0x4cda8948c931c031             BAD_IDX=40424   0x0    
  c489411ff8c1ffff    UNKNOWN_STB   UNKNOWN_STT   BAD NAME                               0xaed6e8fffffbf2e9             BAD_IDX=38294   0xe8   
  7b2e058b48fffffd    UNKNOWN_STB   NOTYPE        BAD NAME                               0xaebee8188b480034             BAD_IDX=33156   0xf    
   fc085fff8abb3e8    UNKNOWN_STB   NUM           BAD NAME                               0xbc86e8fffffd5f84             BAD_IDX=56969   0x48   
  7c8b48fffffbb2e9    GLOBAL        UNKNOWN_STT   BAD NAME                               0xff4824448b482024                             0x44   
  894118247c8b48ff    UNKNOWN_STB   COMMON        BAD NAME                               0x8b48fff8a88ae8c4             BAD_IDX=65534   0x3d   
    1f0ffffffb82e9    UNKNOWN_STB   NOTYPE        BAD NAME                               0xc08548fff8ae63e8    .fini(9)                 0xe3   
   824448b49fffffd    UNKNOWN_STB   UNKNOWN_STT   BAD NAME                               0x79e93050ffe7894c              BAD_IDX=2537   0xff   
  e9e93050ffe7894c    UNKNOWN_STB   UNKNOWN_STT   BAD NAME                               0xfff8a9d0e8fffffc              BAD_IDX=2084   0x44   
  90fffffcd5e90009    WEAK          NOTYPE        BAD NAME                               0xd3894853cd894855             BAD_IDX=58166   0xe8   
          2404c748    UNKNOWN_STB   UNKNOWN_STT   BAD NAME                               0x102444c748                    BAD_IDX=1849   0x6    
  ffff8420e8e28948    UNKNOWN_STB   UNKNOWN_STT   BAD NAME                               0x74c0854824048b48              BAD_IDX=4132   0x4c   
  247cf7489948c3af    UNKNOWN_STB   UNKNOWN_STT   BAD NAME                               0x48c0310045894810              BAD_IDX=3912   0x32   
            841f0f    UNKNOWN_STB   UNKNOWN_STT   BAD NAME                               0xbb7504473904468b             BAD_IDX=11878   0xc3   
     1ba10244c8b48    NUM           OBJECT        BAD NAME                               0x2948c3af0f480000             BAD_IDX=57323   0xc0   
  c4834800458948f9    UNKNOWN_STB   UNKNOWN_STT   BAD NAME                               0x2e66c35d5bc03128             BAD_IDX=63304   0x99   
  66a8ebffffffffb8    LOCAL         NOTYPE        BAD NAME                               0x841f0f                                       0x0    
  4855cc89495441d5    UNKNOWN_STB   OBJECT        BAD NAME                               0x8348f3894853fd89             BAD_IDX=35137   0x55   
    c6f7000000a8b7    UNKNOWN_STB   UNKNOWN_STT   BAD NAME                               0xbf840f180000                 BAD_IDX=35656   0x8    
        ab87f60000    WEAK          FILE          BAD NAME                               0x102444c74810                                 0x20   
  48de894901038348    UNKNOWN_STB   UNKNOWN_STT   BAD NAME                               0x24748d482024548d                             0x1    
     3b7880fc085ff    UNKNOWN_STB   UNKNOWN_STT   BAD NAME                               0x85482024448b4800             BAD_IDX=63674   0x27   
     167840f03f883    LOCAL         OBJECT        BAD NAME                               0x8d4810247c8b4800             BAD_IDX=18432   0x0    
    302444c7480000    UNKNOWN_STB   UNKNOWN_STT   BAD NAME                               0x45e8ff3145000000    .rodata(10)              0xba   
  24442b483024448b    UNKNOWN_STB   UNKNOWN_STT   BAD NAME                               0xf412024443b4810              BAD_IDX=18468   0x4    
   fff854500000384    WEAK          UNKNOWN_STT   BAD NAME                               0xf63145000000a884             BAD_IDX=33807   0x1    
   445c70000000e00    UNKNOWN_STB   COMMON        BAD NAME                               0x906623eb00000001             BAD_IDX=17863   0x33   
   14e840fff007d83    UNKNOWN_STB   NOTYPE        BAD NAME                               0xf63145df89480000             BAD_IDX=12404   0x1    
  f0894448c4834824    UNKNOWN_STB   UNKNOWN_STT   BAD NAME                               0x5e415d415c415d5b              BAD_IDX=1161   0x49   

objdump doesn't appear to have any issues parsing the dynamic and symbol table, so I think we're in error here...

@m4b
Copy link
Owner

m4b commented Oct 9, 2018

Also we have an infinite loop in our note iterator implementation...

@philipc
Copy link
Collaborator Author

philipc commented Oct 9, 2018

objdump doesn't have a problem because it uses the .dynsym section, not the DT_SYMTAB value.

re the infinite loop, gimli-rs/object#74 may be of interest.

@m4b
Copy link
Owner

m4b commented Oct 9, 2018

Well i'm shocked, had no idea objdump (and readelf!) relied on the section headers, which are technically strippable. We don't need to use .dynamic though, the dynamic linker has to know which dynamic symbols are available without section headers, so imho we should only use that (downstream clients can use the .dynamic section if they want manually, or we can implement a specific iterator over it if needed).

There just shouldn't be garbage coming out, it's just a bug in our implementation; i'm just trying to figure out if it was already present, or is a result of this patch? probably the former?

@philipc
Copy link
Collaborator Author

philipc commented Oct 9, 2018

The bug was already there. It's at this line where it is calculating the number of symbols. The problem is that without a .dynsym section, we don't know the size of the the dynamic symbol table, and that code makes a wild guess, which is wrong in this case. The dynamic linker doesn't need to know the size because it uses DT_HASH to index into it (at least, that's my understanding, this isn't something I've looked into in detail). So maybe what we could do is iterate over the hash table instead, if that's possible?

@philipc
Copy link
Collaborator Author

philipc commented Oct 9, 2018

https://flapenguin.me/2017/04/24/elf-lookup-dt-hash/ and https://flapenguin.me/2017/05/10/elf-lookup-dt-gnu-hash/ have good explanations of the hash tables. For DT_HASH we can directly use nchains as the number of symbols. For DT_GNU_HASH, we would have to walk the last bucket to determine its length, and add symoffset (the link has a longer description of this).

@m4b
Copy link
Owner

m4b commented Oct 9, 2018

Ok it sounds like that’s somewhat unrelated to this patch, and this is ready to go ? I’ll let you merge. Thanks for this and the subsequent discussion :)

@philipc philipc merged commit c94b2e5 into m4b:master Oct 10, 2018
@philipc philipc deleted the issue-106 branch October 10, 2018 03:44
@philipc
Copy link
Collaborator Author

philipc commented Oct 10, 2018

readelf will read the hash table instead if you use readelf --syms -D. It skips the undefined symbols at the start though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants