Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use u64/i64 instead of usize/isize for r_offset and r_addend #86

Merged
merged 2 commits into from
Apr 16, 2018

Conversation

Amanieu
Copy link
Contributor

@Amanieu Amanieu commented Apr 15, 2018

This avoids these values getting truncated when reading a 64-bit ELF files from a 32-bit platform.

[breaking-change]

This avoids these values getting truncated when reading a 64-bit
ELF file from a 32-bit platform.
@Amanieu
Copy link
Contributor Author

Amanieu commented Apr 15, 2018

The same issue probably applies to other places as well, it's just that this is the one that I am actually using.

@m4b
Copy link
Owner

m4b commented Apr 15, 2018

Hi @Amanieu thanks for the PR!

Could you provide a motivating usecase for this particular kind of workflow, however?

Specifically, goblin assumes the parser/analyzer (which is the “unified” structs) is on a 64-bit machine if you want to read 64-bit binaries.

As you’ve encountered, reading a 64-bit binary is not supported very well on 32-bit machines. I considered this but did not find a good motivating reason to want to do this, so I’m hoping you could elaborate what exactly you’re doing/ why parsing 64bit binaries on a 32-bit machine is necessary.

Notwithstanding the breakage this change would cause, I’m not sure how it really solves the problem in many ways. Specifically, various r_offset fields will be used as indexes into symbol tables, etc, which are indexed vectors.

Hence the truncation occurs here or at time of indexing, I believe.

@Amanieu
Copy link
Contributor Author

Amanieu commented Apr 15, 2018

I have a tool which works with ELF files of other architectures and it would be nice (though not strictly necessary) for it to work on 32-bit platforms.

I am particularly worried that this truncation could cause a silent error when running on a 32-bit platforms. It would be fine if it explicitly failed, but silent data corruption is worrying.

Now, regarding this change in particular: in dynamic relocations (.dynrel), r_offset contains the virtual address of the value being relocated (so that the dynamic linker can patch it) rather than its offset in the ELF file. This means that even if the size of the ELF file itself is < 4G, r_offset can have a value that does not fit in 32 bits.

r_addend is an even more obvious case: this is added to 64-bit values that are being relocated, so it must be a full 64-bit value.

Neither of these two values are used for indexing into any tables, so I think they should be changed to 64-bit values.

@m4b
Copy link
Owner

m4b commented Apr 15, 2018

I am particularly worried that this truncation could cause a silent error when running on a 32-bit platforms. It would be fine if it explicitly failed, but silent data corruption is worrying.

Yes this is a legitimate worry, would be nice to warn on 32-bit platforms that reading a 64-bit file is not well supported, or something? But maybe this is client downstream consumer responsibility? Dunno.

Neither of these two values are used for indexing into any tables, so I think they should be changed to 64-bit values.

Ok, you have convinced me. I think this will break faerie here, but this is probably a good thing: https://github.com/m4b/faerie/blob/b25c392a265936dd1e005a76df271e15ec528a7c/src/elf.rs#L274

Thinking about it more, it is somewhat reasonable to want to emit 64-bit object files on a 32-bit system (why not?), and above is instance where this capability is broken.

Lastly, do you think we could add a test for some of the issues described here, and if so, what does it look like?

src/elf/reloc.rs Outdated
@@ -132,7 +132,7 @@ macro_rules! elf_rela_std_impl { ($size:ident, $isize:ty) => {
impl From<Rel> for Reloc {
fn from(rel: Rel) -> Self {
Reloc {
r_offset: rel.r_offset as usize,
r_offset: rel.r_offset as u64,
r_addend: 0,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also want this to be an Option, but I didn't do it cause i was being lazy ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a breaking change anyways, I turned it into an Option as part of this PR.

@m4b m4b merged commit 971744d into m4b:master Apr 16, 2018
@m4b
Copy link
Owner

m4b commented Apr 16, 2018

Ok, this looks good to me, will merge; if anyone else has any concerns, feel free to add here.

It might be a good idea to have someone review other uses of vm addresses, etc., in other places where usize is being used.

I'm still not sure on best policy for the remaining usizes, e.g., r_sym, etc. Any comments or insights welcome.

@Amanieu
Copy link
Contributor Author

Amanieu commented Apr 16, 2018

I left r_sym as a usize since it is a 32-bit even for 64-bit ELF and therefore it won't get truncated.

One potential issue is DynamicInfo which uses usize everywhere despite many of these values being virtual addresses. However that doesn't affect me personally since I work around that by parsing the Dyn array directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants