Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider not dropping entries on panicky rehashes #444

Open
Manishearth opened this issue Jul 16, 2023 · 2 comments
Open

Consider not dropping entries on panicky rehashes #444

Manishearth opened this issue Jul 16, 2023 · 2 comments

Comments

@Manishearth
Copy link
Member

Context: I maintain elsa, a crate which contains append-only collections that can be appended to with only &self references. Recently @steffahn brought this bug to my attention, where elsa's assumption that "a hashmap that only experiences insertions (not replacements or removals) will never drop values". This is incorrect in the (relatively rare) case Hash panics on subsequent calls. (It's a very interesting edge case, nice catch @steffahn!)

I do have a way to fix this: we can ensure that Hash is only called by our code (by using a wrapper type with a cached hash; a common pattern). There are other potential fixes too.

However, I do think that the behavior that insert() may drop values not being replaced is surprising, and while HashMap makes no promises of its behavior when you write silly Hash and Eq implementations, I do think a different route for the garbage-in-garbage-out behavior is possible: when Hash panics during realloc, stuff the rest of the entries back into the hashmap in order regardless of hash. It will be totally unpredictable on .get() (but we knew that; that's what is expected when you write silly Hash and Eq implementations), but it will end up calling the destructors at the right time. An alternate route would be to just leak the values.

Basically, I don't think the current behavior of going out of our way to clean stuff up is super necessary, and it makes things harder for unsafe code (elsa is a crate that heavily relies on this, but I can imagine other unsafe code writers making similar assumptions in more specific cases).

Thoughts?

@Amanieu
Copy link
Member

Amanieu commented Jul 17, 2023

The relevant code is here: https://github.com/rust-lang/hashbrown/blob/3b348eaac93c5ee60f7ce1ff5a246e86782497d1/src/raw/mod.rs#L2277C45-L2277C46

I suppose we could just mark the elements as having a hash value 0 instead of deleting them, however this is a deep implementation detail of the hash table. Consider the future where we replace the standard library hash table implementation yet again. It would not be possible to guarantee that this behavior remains the same.

My recommendation would be to look into a different workaround in your own code.

@Manishearth
Copy link
Member Author

I am looking into other workarounds in my code; but I think this is a useful property for both hashbrown and the stdlib to try and maintain for the uses of all unsafe code, not just mine. And it's not just useful, it's an easy assumption to make.

Append-only collections are a common pattern and I've seen this kind of logic used outside of elsa as well. I think others will make this assumption in the future; and ideally they would be able to.

I could ask upstream that this property be codified in the docs once we have it settled here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants