-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix nil panic on remove #127
Conversation
Are you using simplelru directly or are you using via the main lru.Cache object? |
I'm using This PR does fix the panic, I moved to a local copy that has the fix applied and haven't seen it since. However, I would like to be able to explain how it's happening...I just haven't been able to find the reason. |
I agree about the cause of the nil, but what's running around my head is that the operations calling the list should be behind the mutex in lru.Cache. It seems like some operation affecting the list is being called outside of the mutex. I don't have time to dig in at the moment but I'd like to try to figure out the root cause. |
Yeah, it's puzzling. The panic seemed to be fairly random, my application could run between 1-8 hours before seeing the issue. The LRU was being shared between ~2000 goroutines doing about 40k/second in operations. In a smaller environment, it never saw the panic. |
What version of Go is your application built with? |
Took another look at the code. I can't figure out why e.list would be nil, although there are a few other checks for e.list being nil elsewhere in the code -- but that might purely be safety. I also wonder if there is some actual issue with the mutex since if the mutex is working properly I can't find any path where it wouldn't be protected. That's why I'm interested in your Go version, to see if there have been any known bugs fixed between then and now. In your code, you don't happen to be passing around an lru.Cache object as a non-pointer, do you? The mutex contained within is not a pointer, so if the lru.Cache object is not a pointer then that could certainly cause issues. Finally, I'm not sure that the PR is correct because the panic is on |
I included the Go version in the description, saw it in both
I do feel the same way, perhaps there is a flaw in the list itself. #71 is a similar issue but it uses |
It's not clear to me that e isn't nil; if the issue is that sometimes the value being passed to remove is nil when it shouldn't be, then e could be nil. Also note that the panic can't be that The list here is a modified |
Was able to reproduce this by changing the underlying key pointer. Changing the underlying data of the key happens outside the mutex and can be changed after the entry is pulled for Closing PR since it's most likely an implementation issue |
Fix for panic:
Not really sure how this panic happens and I can only reproduce it in a high concurrent environment. From what I can tell the panic happens when
e.prev
isnil
and the only timee.prev
is set to nil is when it's being removed from the list.removeElement
is suppose to delete the key from items map inLRU
, but somehow we still seem to get the entry back. If it is an entry we've already removed, the list would be set tonil
, adding a condition to check if entry has the correct list would avoid this panic.Go version: 1.20.3/1.20.2