Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read: cache abbreviations at offset 0 #628

Merged
merged 2 commits into from Oct 4, 2022
Merged

Conversation

philipc
Copy link
Collaborator

@philipc philipc commented Sep 16, 2022

This optimises for the case where there is a single set of abbreviations that is used for all units.

Closes #626
Closes #627

@Swatinem
Copy link
Contributor

This would work perfectly for us, we wouldn’t even need any code changes vs the normal gimli calls.

Drawbacks as you said is a Mutex, plus the additional public field on Dwarf that is technically a breaking change.

@bjorn3
Copy link
Contributor

bjorn3 commented Sep 16, 2022

Maybe you could use an AtomicPtr pointing to the cached data and have a null pointer represent not yet computed. Then multiple threads can race to compute it and when setting it compare_exchange can be used to detect if there was a race or not. If there was a race the local version can be discarded and the new version read.

@philipc
Copy link
Collaborator Author

philipc commented Sep 16, 2022

Perhaps what I'll do is use AtomicPtr for no_std, and Mutex otherwise.

@Swatinem
Copy link
Contributor

Using arc-swap might also be an option I think. I would argue in most cases you are single threaded anyway.

@philipc
Copy link
Collaborator Author

philipc commented Sep 16, 2022

Another option might be to make it generic over traits from lock_api, and default to Mutex when using std. Probably not something I'll look into though unless someone wants this.

I would argue in most cases you are single threaded anyway.

We do need to support multi-threaded still. The dwarfdump example uses threads, and I think not-perf does too.

@Swatinem
Copy link
Contributor

I think the locking question boils down to doing all this lazily. In my workaround I just pre-parse abbrevs at offset 0 at the very beginning when constructing the Dwarf, so I don’t need interior mutability at all. But the current API design (with everything being a public member) does not make that too easy either.

@philipc
Copy link
Collaborator Author

philipc commented Sep 18, 2022

I think the locking question boils down to doing all this lazily. In my workaround I just pre-parse abbrevs at offset 0 at the very beginning when constructing the Dwarf, so I don’t need interior mutability at all.

Right, but do you think that interior mutability a problem? If we didn't need to support no_std, would this PR be fine as is? Does the benefit of all users automatically getting this improvement outweigh the cost of one mutex lock per unit?

But the current API design (with everything being a public member) does not make that too easy either.

Can you expand on how that does not make it easy? The current API design is because we don't have a strong commitment to stability, and having public members is meant to make things more flexible for users. For example, I think your workaround is only possible because of the public members?

@Swatinem
Copy link
Contributor

Indeed, the fields being public was one reason why I was able to work around this in the first place.

I was just thinking that because of public members and struct literals, you don’t have a constructor to do work in. So you would either have to do lazy parsing with internal mutability, or put the burden on the user to correctly initialize the cache ahead of time. Which I found out I might have done wrong and was too overeager to just parse at offset 0 and propagate EOF errors for objects that have no abbrevs at all.

I don’t think there is anything wrong with that interior mutability, and I think its a good tradeoff to have locking there.

@Swatinem
Copy link
Contributor

BTW, I blogged about this issue here: https://swatinem.de/blog/abbreviations/
There is also instructions to reproduce all this together with some benchmark results that show a 2 orders of magnitude improvement with proper caching/deduplication.

@philipc
Copy link
Collaborator Author

philipc commented Sep 20, 2022

@bjorn3 Are you aware of any examples of using AtomicPtr with Arc? I think I want AtomicPtr<Abbreviations>, where the pointer is obtained from Arc::into_raw. I've seen an example of using it with Box::into_raw to initialize a static, but that's not what I want here.

@philipc
Copy link
Collaborator Author

philipc commented Sep 20, 2022

Using arc-swap might also be an option I think.

Unfortunately it doesn't appear to support no_std.

@philipc philipc force-pushed the issue-626 branch 4 times, most recently from 40534df to de3b572 Compare September 24, 2022 06:21
This optimises for the case where there is a single set of
abbreviations that is used for all units.
@philipc philipc marked this pull request as ready for review September 24, 2022 06:38
@fitzgen
Copy link
Member

fitzgen commented Sep 26, 2022

Still looking into the code, but: I would normally use once_cell::sync::Lazy for this kind of thing. Unfortunately it looks like once_cell::sync::* is unavailable if the std feature is not enabled.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me with some minor comments below.

src/read/lazy.rs Outdated
if !value_ptr.is_null() {
// SAFETY: all writes to `self.value` are pointers obtained from `Arc::into_raw`.
unsafe {
Arc::from_raw(value_ptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using Arc::from_raw or Box::from_raw specifically just to run the Drop implementation, I like to wrap it in drop() for clarity:

Suggested change
Arc::from_raw(value_ptr);
drop(Arc::from_raw(value_ptr));

src/read/lazy.rs Outdated
Comment on lines 51 to 52
Ordering::Release,
Ordering::Relaxed,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some kind of explanation of why these orderings are chosen is called for here.

When I see SeqCst I feel okay (as far as I feel okay about atomics code in general!) and when I see AcqRel I know to double check that basically we are dealing with one memory location and not relying on orderings between operations on different memory locations, and when I see anything else I basically know that I am out of my depth. Sometimes I feel better if there is a link to some comment in std's implementation that explains why weaker orderings are okay in this specific case, and how the code I am actually reading just does the same thing std is doing. But yeah I definitely don't have the hubris required to say I understand justifications for weaker orderings. Mostly I just try to make sure that whoever wrote the code actually really understands what they're doing and has a good justification, and the way I do that is by asking them to write a comment :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. I've improved my understanding and decided that Relaxed is wrong here, and I've added comments.

src/read/lazy.rs Outdated
Comment on lines 11 to 12
// Only written once with a value obtained from `Arc<T>::into_raw`.
value: AtomicPtr<T>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment about how this is holding a ref count for us, so we can always safely load it and return a Arc::clone of it without fear of someone else racing in, decrementing its ref count to zero, and dropping it while we are still managing it / giving out clones?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a way to share parsed Abbreviations between Units
4 participants