Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return CStr in goblin::strtab::Strtab instead of str #345

Open
koutheir opened this issue Nov 14, 2022 · 5 comments
Open

Return CStr in goblin::strtab::Strtab instead of str #345

koutheir opened this issue Nov 14, 2022 · 5 comments

Comments

@koutheir
Copy link

There is no way to get access to null-terminated C strings that are not valid UTF-8, through goblin::strtab::Strtab. Strtab.get() and Strtab.get_at() should return Option<&CStr> instead of optional UTF-8 strs.

@m4b
Copy link
Owner

m4b commented Nov 22, 2022

This would be a pretty enormous breaking change for goblin, and I'd hate to think of the ergonomic issues around dealing with strings from this Strtab after a change like this. Do you have a particular example where using rusts str doesn't work today (e.g., a string in a binary that isn't valid utf-8?) I assume this issue would have come up before if people were encountering it in the wild?

@willglynn
Copy link
Collaborator

@m4b I agree about the breakage/ergonomics concerns, but I also agree that those strings are just a bag of bytes with unknown encoding and a NUL terminator, i.e. CStr.

What would you think about adding get_cstr() and get_at_cstr(), implementing Strtab in terms of those, and moving UTF-8 validation into get()/get_at()?

@m4b
Copy link
Owner

m4b commented Nov 22, 2022

get_cstr() sounds like a great compromise, did not think of that :)

@koutheir
Copy link
Author

This would be a pretty enormous breaking change for goblin

Isn't this the reason behind goblin still in major version zero?

and I'd hate to think of the ergonomic issues around dealing with strings from this Strtab after a change like this

Transforming a &CStr into a &str is fairly easy, and it is honest about the reality of ELF string tables.

Do you have a particular example where using rusts str doesn't work today (e.g., a string in a binary that isn't valid utf-8?)

No, but I don't want to assume UTF-8 encoding, especially if I'm building tools around goblin.

What would you think about adding get_cstr() and get_at_cstr(), implementing Strtab in terms of those, and moving UTF-8 validation into get()/get_at()?

I would argue the opposite, because UTF-8 validation is an additional operation that should not be included in the default get(). Providing a more ergonomic, but also more expensive, get_str() seems better to me.

@m4b
Copy link
Owner

m4b commented Jan 1, 2023

So for now I don't think I want to proceed with this; it will be too intrusive a change for everyone involved. I'm not convinced that switching everything to Cstr is correct without a compelling example where we fail today.
As far as I can see, it should remain sound to return &str on all platforms binaries since they are either regular ansi encoded strings (and hence compatible with utf-8). While it may be true that something could emit a non-utf8 string, and this is used somehow by something (e.g., a function call as a sequence of bytes instead of a regular string) but I haven't seen any binaries in the wild yet that have this feature (and I suspect most binary tools just assume that strings in the string table are char* (hence utf-8 compatible) and make many assumptions about this) but I could be wrong.
Regardless I think I would require a non manufactured/stilted example that is compelling in order to justify what on the face of it is a pretty user hostile/annoying change.
The long of it is that people have been using this library for many years without complaining that the strings returned from string table aren't CStr, so I'm not sure why it would be important now, but this could perhaps be survivorship bias?
Lastly, we could always add a get_at_cstr() option as suggested for those who want to opt into this kind of feature if it becomes urgent somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants