read: cache abbreviations at offset 0 #628

philipc · 2022-09-16T03:55:11Z

This optimises for the case where there is a single set of abbreviations that is used for all units.

Closes #626
Closes #627

Swatinem · 2022-09-16T06:45:06Z

This would work perfectly for us, we wouldn’t even need any code changes vs the normal gimli calls.

Drawbacks as you said is a Mutex, plus the additional public field on Dwarf that is technically a breaking change.

bjorn3 · 2022-09-16T07:54:46Z

Maybe you could use an AtomicPtr pointing to the cached data and have a null pointer represent not yet computed. Then multiple threads can race to compute it and when setting it compare_exchange can be used to detect if there was a race or not. If there was a race the local version can be discarded and the new version read.

philipc · 2022-09-16T08:45:50Z

Perhaps what I'll do is use AtomicPtr for no_std, and Mutex otherwise.

Swatinem · 2022-09-16T08:46:15Z

Using arc-swap might also be an option I think. I would argue in most cases you are single threaded anyway.

philipc · 2022-09-16T08:56:31Z

Another option might be to make it generic over traits from lock_api, and default to Mutex when using std. Probably not something I'll look into though unless someone wants this.

I would argue in most cases you are single threaded anyway.

We do need to support multi-threaded still. The dwarfdump example uses threads, and I think not-perf does too.

Swatinem · 2022-09-16T09:02:48Z

I think the locking question boils down to doing all this lazily. In my workaround I just pre-parse abbrevs at offset 0 at the very beginning when constructing the Dwarf, so I don’t need interior mutability at all. But the current API design (with everything being a public member) does not make that too easy either.

philipc · 2022-09-18T04:26:17Z

I think the locking question boils down to doing all this lazily. In my workaround I just pre-parse abbrevs at offset 0 at the very beginning when constructing the Dwarf, so I don’t need interior mutability at all.

Right, but do you think that interior mutability a problem? If we didn't need to support no_std, would this PR be fine as is? Does the benefit of all users automatically getting this improvement outweigh the cost of one mutex lock per unit?

But the current API design (with everything being a public member) does not make that too easy either.

Can you expand on how that does not make it easy? The current API design is because we don't have a strong commitment to stability, and having public members is meant to make things more flexible for users. For example, I think your workaround is only possible because of the public members?

Swatinem · 2022-09-18T07:14:58Z

Indeed, the fields being public was one reason why I was able to work around this in the first place.

I was just thinking that because of public members and struct literals, you don’t have a constructor to do work in. So you would either have to do lazy parsing with internal mutability, or put the burden on the user to correctly initialize the cache ahead of time. Which I found out I might have done wrong and was too overeager to just parse at offset 0 and propagate EOF errors for objects that have no abbrevs at all.

I don’t think there is anything wrong with that interior mutability, and I think its a good tradeoff to have locking there.

Swatinem · 2022-09-19T14:08:23Z

BTW, I blogged about this issue here: https://swatinem.de/blog/abbreviations/
There is also instructions to reproduce all this together with some benchmark results that show a 2 orders of magnitude improvement with proper caching/deduplication.

philipc · 2022-09-20T05:57:48Z

@bjorn3 Are you aware of any examples of using AtomicPtr with Arc? I think I want AtomicPtr<Abbreviations>, where the pointer is obtained from Arc::into_raw. I've seen an example of using it with Box::into_raw to initialize a static, but that's not what I want here.

philipc · 2022-09-20T06:00:38Z

Using arc-swap might also be an option I think.

Unfortunately it doesn't appear to support no_std.

This optimises for the case where there is a single set of abbreviations that is used for all units.

fitzgen · 2022-09-26T17:04:02Z

Still looking into the code, but: I would normally use once_cell::sync::Lazy for this kind of thing. Unfortunately it looks like once_cell::sync::* is unavailable if the std feature is not enabled.

fitzgen

Looks good to me with some minor comments below.

fitzgen · 2022-09-26T17:10:01Z

src/read/lazy.rs

+            if !value_ptr.is_null() {
+                // SAFETY: all writes to `self.value` are pointers obtained from `Arc::into_raw`.
+                unsafe {
+                    Arc::from_raw(value_ptr);


When using Arc::from_raw or Box::from_raw specifically just to run the Drop implementation, I like to wrap it in drop() for clarity:

Suggested change

Arc::from_raw(value_ptr);

drop(Arc::from_raw(value_ptr));

fitzgen · 2022-09-26T17:19:37Z

src/read/lazy.rs

+                Ordering::Release,
+                Ordering::Relaxed,


I think some kind of explanation of why these orderings are chosen is called for here.

When I see SeqCst I feel okay (as far as I feel okay about atomics code in general!) and when I see AcqRel I know to double check that basically we are dealing with one memory location and not relying on orderings between operations on different memory locations, and when I see anything else I basically know that I am out of my depth. Sometimes I feel better if there is a link to some comment in std's implementation that explains why weaker orderings are okay in this specific case, and how the code I am actually reading just does the same thing std is doing. But yeah I definitely don't have the hubris required to say I understand justifications for weaker orderings. Mostly I just try to make sure that whoever wrote the code actually really understands what they're doing and has a good justification, and the way I do that is by asking them to write a comment :)

Good suggestion. I've improved my understanding and decided that Relaxed is wrong here, and I've added comments.

fitzgen · 2022-09-26T17:22:38Z

src/read/lazy.rs

+        // Only written once with a value obtained from `Arc<T>::into_raw`.
+        value: AtomicPtr<T>,


Maybe add a comment about how this is holding a ref count for us, so we can always safely load it and return a Arc::clone of it without fear of someone else racing in, decrementing its ref count to zero, and dropping it while we are still managing it / giving out clones?

philipc force-pushed the issue-626 branch from 6b36de8 to 200fa84 Compare September 16, 2022 03:56

philipc mentioned this pull request Sep 16, 2022

Make Abbreviations refcounted and add Unit::with_abbreviations constructor #627

Closed

philipc force-pushed the issue-626 branch 4 times, most recently from 40534df to de3b572 Compare September 24, 2022 06:21

read: cache abbreviations at offset 0

fef8947

This optimises for the case where there is a single set of abbreviations that is used for all units.

philipc force-pushed the issue-626 branch from de3b572 to a15e7e2 Compare September 24, 2022 06:26

philipc marked this pull request as ready for review September 24, 2022 06:38

philipc requested a review from fitzgen September 24, 2022 06:39

fitzgen approved these changes Sep 26, 2022

View reviewed changes

philipc force-pushed the issue-626 branch from 3e4c089 to 02ce736 Compare September 30, 2022 04:12

Address review

2505658

philipc force-pushed the issue-626 branch from 02ce736 to 2505658 Compare September 30, 2022 04:19

philipc merged commit e33d1ac into gimli-rs:master Oct 4, 2022

philipc deleted the issue-626 branch October 4, 2022 02:50

philipc mentioned this pull request Sep 14, 2023

read: add Dwarf::populate_abbreviations_cache #679

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read: cache abbreviations at offset 0 #628

read: cache abbreviations at offset 0 #628

philipc commented Sep 16, 2022 •

edited

Swatinem commented Sep 16, 2022

bjorn3 commented Sep 16, 2022

philipc commented Sep 16, 2022

Swatinem commented Sep 16, 2022

philipc commented Sep 16, 2022

Swatinem commented Sep 16, 2022

philipc commented Sep 18, 2022

Swatinem commented Sep 18, 2022

Swatinem commented Sep 19, 2022

philipc commented Sep 20, 2022

philipc commented Sep 20, 2022

fitzgen commented Sep 26, 2022

fitzgen left a comment

fitzgen Sep 26, 2022

fitzgen Sep 26, 2022

philipc Sep 30, 2022

fitzgen Sep 26, 2022

		// Only written once with a value obtained from `Arc<T>::into_raw`.
		value: AtomicPtr<T>,

read: cache abbreviations at offset 0 #628

read: cache abbreviations at offset 0 #628

Conversation

philipc commented Sep 16, 2022 • edited

Swatinem commented Sep 16, 2022

bjorn3 commented Sep 16, 2022

philipc commented Sep 16, 2022

Swatinem commented Sep 16, 2022

philipc commented Sep 16, 2022

Swatinem commented Sep 16, 2022

philipc commented Sep 18, 2022

Swatinem commented Sep 18, 2022

Swatinem commented Sep 19, 2022

philipc commented Sep 20, 2022

philipc commented Sep 20, 2022

fitzgen commented Sep 26, 2022

fitzgen left a comment

Choose a reason for hiding this comment

fitzgen Sep 26, 2022

Choose a reason for hiding this comment

fitzgen Sep 26, 2022

Choose a reason for hiding this comment

philipc Sep 30, 2022

Choose a reason for hiding this comment

fitzgen Sep 26, 2022

Choose a reason for hiding this comment

philipc commented Sep 16, 2022 •

edited