-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing incorrect model parse when XGB buf starts with 'binff' or 'binfn' #2162
Fixing incorrect model parse when XGB buf starts with 'binff' or 'binfn' #2162
Conversation
Hi @slundberg, I believe this PR fixes (#1864), and addresses the root cause which stems from unintended issues from #1220. Please update me when you can, thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted @TheZL ! This bug has crept in because .lstrip
looks for any characters in the provided string and keeps stripping until a character not provided, whereas what we want here is more the behaviour of .removeprefix
, although that was only added in 3.9 so can't be used here as shap supports other python versions.
Maybe this could be written slightly more succinctly as something like
self.buf = xgb_model.save_raw()
if self.buf.startswith(b'binf'):
self.buf = self.buf[4:]
Have you been able to find an example of a model where the raw dump starts with 'binff' of something? If there is an easy example then it would be good to add that as a test case, but if not then this should be merged either way as this is clearly better than the current bug.
@slundberg , thoughts?
Hi @Irjball, thanks for the reply! I agree with your opinion on this issue.
The committed code on this branch works well with the example. I also tried your suggested code, but I got an error for it: |
Looks like that is just a typo, it should be startswith, not startwith. Good one on the example though, I've tried a few others and e.g. |
@lrjball Yeah, you are right. The code works well with "startswith". I have updated the code accordingly on this branch. |
Hi @lrjball, Just want to follow up with the progress on this issue. Will the bug be fixed in the next released version of shap? We are using the shap package for an application study. Currently we manually adjust the input data when the error occurs. If this bug could be fixed in the next release, that will be very helpful. Thanks! |
…ction Added test for buffer strip update
@lrjball Thanks for the help! The change has been merged. |
Just wanted to add: I have the same problem with several datasets and models, it even seems to happen more frequently recently and for us changing the input data manually is not an option. If there is any chance that this fix could be merged soon, that would be awesome, thanks! |
I just checked the failing test on 3.8 because I thought I might be able to help push this over the finish line. |
Codecov Report
@@ Coverage Diff @@
## master #2162 +/- ##
=======================================
Coverage 51.51% 51.52%
=======================================
Files 90 90
Lines 13116 13118 +2
=======================================
+ Hits 6757 6759 +2
Misses 6359 6359
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
In XGBoost 1.1, .save_raw() output added a prefix 'binf' to the model buffer.
#1220 addressed this by using lstrip to remove 'binf'. However, this has unintended consequence of stripping 'binff', 'binfn', 'binfb', 'binfi' as well, which occassionally occur as valid starts to a buffer.
The fix here is to check for exactly 'binf', and move the buffer forward by 4 if the prefix is exactly 'binf'.