Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse Strtab only once on Creation #275

Merged
merged 3 commits into from Jul 2, 2021
Merged

Parse Strtab only once on Creation #275

merged 3 commits into from Jul 2, 2021

Conversation

Lichtso
Copy link
Contributor

@Lichtso Lichtso commented Jun 2, 2021

Together with @jackcmay I am working on a BPF VM, which utilizes goblin for loading user provided ELF files.

So, last week we encountered a problem with the way Strtab::get() works, tried to fix it outside, which did not work and now have found a fix which would require upstream changes. Furthermore, I think these changes can also help other projects depending on goblin which are exposed to potentially hand-crafted adversary / worst case inputs too.

The problem

A get request performs a linear scan to find the separators / NULL termination and possibly invalid UTF-8 variable length encodings on every single request. So it is possible to construct e.g. an ELF file which is only a few MiB in size but has massive overlap in the Strtab. This way everything involving the Strtab suddenly becomes O(n^2) in runtime complexity because of the overlap (as opposed to independent strings which can be amorphized to be constant length) and just reading the file can take minutes to hours.

What did not work

We tried to fix it on our side by caching the results of Strtab::get(). But

  • there are places like this:
    pub fn get_libraries<'a>(&self, strtab: &Strtab<'a>) -> Vec<&'a str> {
    which are always called when loading an ELF file:
    libraries = dynamic.get_libraries(&dynstrtab);
  • and caching afterwards does not help because each overlapping byte creates a new entry which again scans half the Strtab on average. So still O(n^2) runtime complexity in total.

What does work

The proposed fix in this PR instead moves the parsing from the get request to the constructor of the Strtab, which in turn also moves the error handling with it. The trick is to use a binary search (in each get request) over the cached positions of the separators (found by one linear scan in the constructor). This is only done when dynamic (heap) allocations are enabled via #[cfg(feature = "alloc")], otherwise we default to the original behavior.

Missing

I did not address or test UTF-8 support yet, so it is probably broken. I will invest more time into it if this PR looks otherwise promising.

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is super interesting! so i think we need to figure out the breaking changes situation, whether it's worth it (i'm sort of ok with it, in particular the get method return signature (from Ok(Some to Some seems probably ok, but the various methods removed/renamed doesn't seem necessary?, but maybe you can change my mind :)

otherwise this is looking good; only other concern i have is for perf, parsing entire table at once could have some perf consequences? But maybe not, majority of symbols might be looked up intiially when we construct the dyn sym table + the symbol table, i'm not sure; do you have any insight into this aspect?

lasty, i'll have to reread your explanation why doing an initial parse avoids the o(n^2) runtime when doing gets, because it's not clear to me how that would happen; do you happen to have an elf binary you can share that exhibits this quadratic behavior?

otherwise this is super cool, thank you for making the PR and writing up your findings, this is very interesting :) (and in case it's not clear, yes i think this PR is welcome, and modulo concerns above, i'm ok with merging something like this)

src/strtab.rs Outdated Show resolved Hide resolved
src/strtab.rs Outdated Show resolved Hide resolved
src/strtab.rs Outdated Show resolved Hide resolved
src/strtab.rs Show resolved Hide resolved
src/strtab.rs Outdated Show resolved Hide resolved
src/strtab.rs Show resolved Hide resolved
src/strtab.rs Outdated Show resolved Hide resolved
src/strtab.rs Outdated Show resolved Hide resolved
src/strtab.rs Outdated Show resolved Hide resolved
@Lichtso Lichtso requested a review from m4b June 8, 2021 06:17
@Lichtso
Copy link
Contributor Author

Lichtso commented Jun 10, 2021

Could parsing the entire table at once have some perf consequences? But maybe not, majority of symbols might be looked up initially when we construct the dyn sym table + the symbol table, i'm not sure; do you have any insight into this aspect?

Uses cases which only request a few strings from a huge table will suffer, but even if you request every possible entry only once and there is overlap (shared suffices) then this already provides gains. Obviously, it would improve even further if an entry is requested multiple times.

lasty, i'll have to reread your explanation why doing an initial parse avoids the o(n^2) runtime when doing gets, because it's not clear to me how that would happen

The worst case is that the Strtab is made from one big string (no separators inside). And there are symbol name refs to each possible offset of the Strtab. Then each get request costs O(n) because it has to parse half of this one string on average. And there can be O(n) different get requests, even with caching as each offset is unique (even if the share the same suffixes). So it total you get O(n^2) for all get requests.

However, if the Strtab is parsed once (costing O(n)), and then each get request only does a binary search it costs just O(log(n)) per request or O(n*log(n)) in total, which is a lot better. Now the worst case is that the Strtab consists only of separators, but even that does not change the efficiency of the binary search and the resulting complexity class.

do you happen to have an elf binary you can share that exhibits this quadratic behavior?

We have multiple and I wouldn't want to publish them before this is resolved.

@philipc
Copy link
Collaborator

philipc commented Jun 10, 2021

Do tools like readelf, objdump, llvm-readobj and the libraries they use also have this problem?

@willglynn
Copy link
Collaborator

This reminds me of pdb::ItemFinder. The situation there is similar: there's a data structure containing variable length items, which might be ill-formed, to which the user needs rapid access by index.

pdb::ItemFinder is to be updated by the user from a (fallible) iterator which parses one item at a time. The associated data structures form a directed acyclic graph, so the most common situation is to scan and then to refer to items which have already been seen, but separating the iterator and the finder allows (forces) the user to choose whichever index-building and error handling strategy is best for their application.

Additionally, given N records, pdb::ItemFinder stores N >> x offsets for some small value of x. This reduces memory requirements by a factor of 2^x in exchange for iterating over up to 2^x - 1 records to retrieve a particular item. Iteration is so cheap – the data's usually in the same cache line anyway – that real code using this structure is actually faster due to the reduced recordkeeping overhead. Neither pdb nor goblin yet depend on a version of Rust with const generics so it's not easy to let the user parameterize this tradeoff, but my suspicion is that a similar tradeoff for some small value of x would be worthwhile. Note also that this doesn't have to make retrieval fallible: if parsing a &[u8] worked before, it would be safe for get() to .expect() that parsing it will work again.

instead of on demand when entries are requested.
@Lichtso
Copy link
Contributor Author

Lichtso commented Jun 14, 2021

Do tools like readelf, objdump, llvm-readobj and the libraries they use also have this problem?

@philipc That is a good question, but I will have to craft different ELF files because the current ones are not completely valid in unrelated areas and it makes these tools return early with a warning.

@m4b I added two unit tests for UTF-8 parsing and the CI is passing, can you review it again?

@m4b
Copy link
Owner

m4b commented Jun 18, 2021

We have multiple and I wouldn't want to publish them before this is resolved.

Could you clarify what you mean here? I.e., why would resolution of this issue enable you to publish / share the problematic binaries? It seems to me if you can't share them now (which is fine), you wouldn't be able to share them after, regardless of whether this issue is resolved.

@Lichtso
Copy link
Contributor Author

Lichtso commented Jun 18, 2021

We have multiple and I wouldn't want to publish them before this is resolved.

Could you clarify what you mean here? I.e., why would resolution of this issue enable you to publish / share the problematic binaries? It seems to me if you can't share them now (which is fine), you wouldn't be able to share them after, regardless of whether this issue is resolved.

We would simply like to keep using the upstream repo, not a fork with our security patches applied. That is all. Once this is merged we can deploy a new version of our software packages and release the exploits as they would be disarmed by the time. By release I mean first to you specifically and later to the general public, as we would want other projects depending on this one to have a change to adopt as well.

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so other than to_vec being a breaking change, which if you feel strongly about, i guess i'm ok with it, the only other major concern i have is whether we're pessimizing 99% of binary's strtables in what seems like a very niche problem case.
as far as i understand it, the only security implications I see are a kind of DoS on accessing strtables for these malformed binaries, but it's not clear to me how that is impactful, other than a tool taking a long time to run? it would be good to get some further motivations here.

I'm not sure why, but I almost wish we had a variant constructor for Strtab that uses the pre-allocated form via a config we'd pass into parsing, and defaults to a non-preallocated form otherwise, but this is a much bigger PR. However, it might be worth doing, because a config for parsing options is something i've thought would be nice for a while.

with that in mind, i don't see anything in this PR that would prevent such a feature from landing in the future?

example:

enum StrtabKind {
  Preallocated(Vec<&str>),
  ZeroCopy
}

pub struct Strtab {
  bytes: &[u8],
  inner: StrtabKind
}

where get's dispatch depending on the enum variant accordingly. Anyway, just some food for thought, as this would allow users to opt into whether they want to pay for the preallocated penalty or not, at a programmatic, library level choice.

ok i've blabbed enough :D

that being said, i'm in favor of merging, unless anyone else has any other concerns?

pub unsafe fn from_raw(ptr: *const u8, len: usize, delim: u8) -> Strtab<'a> {
Self::from_slice_unsafe(core::slice::from_raw_parts(ptr, len), 0, len, delim)
}
#[deprecated(since = "0.4.2", note = "Bad performance, use get_at() instead")]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome

delim: ctx::StrCtx,
bytes: &'a [u8],
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: layout here didn't need to change (bytes after delim in field declarations), this and some of the functions moving around did make this PR a bit hard to review, just fyi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was for consistency with the constructors (see line 30 of the prev version).

src/strtab.rs Outdated
pub fn new(bytes: &'a [u8], delim: u8) -> Self {
Strtab {
/// Creates a `Strtab` directly without bounds check and without parsing it.
pub fn from_slice_unsafe(bytes: &'a [u8], offset: usize, len: usize, delim: u8) -> Self {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g., this function here makes a lot of diff noise; also does this function need to be pub? one can construct a strtable now, but the vec isn't filled, which means any get_at calls would fail, which feels like the strtab is "half constructed"; also it is better to not introduce any new pub apis unless necessary, since once they're pub, we can't take them back

src/strtab.rs Outdated
/// Safely parses and gets a str reference from the backing bytes starting at byte `offset`.
#[cfg(feature = "alloc")]
/// Converts the string table to a vector of parsed strings.
pub fn to_vec(&self) -> Vec<&'a str> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately this is still a breaking change, since return signature changed. i'm 50/50 on fence whether it's worth it (can just Ok wrap at end and it's non-breaking, but maybe it's fine to break?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ironically, I was on the fence of breaking it even further by including the starting offset of each entry in the result.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I've thought about this, and I think we should return Result here, so that we don't have a breaking change, and if we switch to using the enum, it will work without changes; if we get that done i'm ok with merging this and making a minor release asap.

As a bonus, we can perhaps construct the vector on the fly if it's empty (i.e., it was constructed via the unsafe method), but that is probably a niche usecase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both done, reverted to use the Result return type and implemented a fallback.

/// Construct a strtab from a `ptr`, and a `size`, using `delim` as the delimiter
///
/// This function creates a `Strtab` directly from a raw pointer and size
pub unsafe fn from_raw(ptr: *const u8, len: usize, delim: u8) -> Strtab<'a> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see now why from_slice_unsafe is pub; so i'm not sure this should be deprecated? i used this in the past, e.g., to construct strtables for a dynamic linker, which doesn't want to allocate (since it's already mapped into memory). i'm not sure what others are doing, but it would force someone to first call from raw_parts on their pointer and len as you do here.

alternatively, it would be feasible to have from_slice_unsafe just get the ptr + len and pass it to this. however, it's probably better to get rid of / deprecate these unsafe apis, since it doesn't add much (the user can just call the unsafe from_raw_parts themselves in their own unsafe context, so we push it onto them, which i think is fine).

so i guess what i'm saying is maybe it might be good to either:

  1. delete this
  2. deprecate as you do with warning it will be removed, then delete next release

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deprecated marker is a few lines above (the comment) in 103.

@Lichtso
Copy link
Contributor Author

Lichtso commented Jun 18, 2021

ok so other than to_vec being a breaking change, which if you feel strongly about, i guess i'm ok with it, the only other major concern i have is whether we're pessimizing 99% of binary's strtables in what seems like a very niche problem case.

Well, we have no data what the "usual" user of this crate is doing on average, so it is hard to optimize for.
But, as I said, this is only penalizing use cases where only a few symbols are requested once. All others such as:

  • Symbols being requested multiple times
  • Symbols having suffix overlap
  • Most symbols being requested

Would not be hit at all or even profit from the pre-parsing.

as far as i understand it, the only security implications I see are a kind of DoS on accessing strtables for these malformed binaries, but it's not clear to me how that is impactful, other than a tool taking a long time to run? it would be good to get some further motivations here.

I agree that many CLI tools some developer runs and cancels, if they take to long, are not affected by this.
However, think about the automated processes like CI services, etc. Furthermore, we run a decentralized network which validates ELF files on the fly, so it would easily grind to halt if somebody wanted it to, which is not acceptable.

I'm not sure why, but I almost wish we had a variant constructor for Strtab that uses the pre-allocated form via a config we'd pass into parsing, and defaults to a non-preallocated form otherwise, but this is a much bigger PR. However, it might be worth doing, because a config for parsing options is something i've thought would be nice for a while.

with that in mind, i don't see anything in this PR that would prevent such a feature from landing in the future?

example:

enum StrtabKind {
  Preallocated(Vec<&str>),
  ZeroCopy
}

Might not be so complicated, why not just default back to the original implementation of get in get_at if the strings Vec is empty, because it was constructed by from_slice_unsafe not by parse?

That being said, maybe from_slice_unparsed would be a better name than from_slice_unsafe ...

@Lichtso
Copy link
Contributor Author

Lichtso commented Jul 1, 2021

I made the changes you requested, could you rerun the CI tests once more?

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great, thanks for doing this :)

@m4b
Copy link
Owner

m4b commented Jul 2, 2021

I made the changes you requested, could you rerun the CI tests once more?

I believe the CI has run on your last change Reverts breaking to_vec() result and adds a fallback mode. , there is a green check mark at least next to it :)

Anyway, thank you for your patience! I think this is ready to merge; perhaps we can do the enum variant I talked about earlier, but it's fine for now I believe; we can do a minor crates release as well, since we don't (i believe!) have any breaking changes.

Thanks again for everyone's input here!

@m4b m4b merged commit 8febb72 into m4b:master Jul 2, 2021
@Lichtso Lichtso deleted the parse_strtab_in_constructor branch July 2, 2021 07:56
@Lichtso
Copy link
Contributor Author

Lichtso commented Jul 2, 2021

Thank you as well.

Once you release the minor crate version I will send you the exploits via email to m4b.github.io a gmail.com

@m4b
Copy link
Owner

m4b commented Jul 4, 2021

ok i've published 0.4.2; do note that i caught a breaking change in the new function whose return signature changed; i restored the original new, and added a new_preparsed that has the Result return type so that a minor release should be non-breaking now. So if you had a branch using these changes, you'll have to rename your Strtab::new -> Strtab::new_preparsed is all.

Thanks for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants