Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace grid_resolution np.inf value with large number #940

Merged
merged 6 commits into from Jul 7, 2023

Conversation

mauicv
Copy link
Collaborator

@mauicv mauicv commented Jul 4, 2023

What is this

Fixes CI failing due to sklearn parameter validation in partial_dependence function

Sklearn has added parameter validation for partial_dependence here which we use in the tests for our partial dependence implementation. This means the ci is failing as we set grid_resolution to np.inf where now Sklearn expects that parameter to be less than infinity.

This PR just replaces np.inf in the relevant tests to a large number. This is to check if the test suite failed elsewhere. This is still a work in progress.

@mauicv mauicv added the WIP This PR is a Work in Progress label Jul 4, 2023
@codecov
Copy link

codecov bot commented Jul 4, 2023

Codecov Report

Merging #940 (e88d4c8) into master (c41b6d8) will decrease coverage by 0.05%.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #940      +/-   ##
==========================================
- Coverage   85.28%   85.23%   -0.05%     
==========================================
  Files          74       74              
  Lines        8832     8832              
==========================================
- Hits         7532     7528       -4     
- Misses       1300     1304       +4     

see 4 files with indirect coverage changes

@RobertSamoilescu
Copy link
Collaborator

The inf value was used to avoid the division of the categorical feature axis into equidistant bins controlled by the grid_resolution parameter. In principle, any value bigger than the cardinal of all categorical feature values should work.

At that time, sklearn was supporting only numerical features, and using the inf value was a way around to support categorical features as well. Note that at the moment, sklearn included support for categorical features (see PR here). Thus, I would recommend to use this new feature instead of the large number, only if we don't run into any dependency problems.

@mauicv
Copy link
Collaborator Author

mauicv commented Jul 6, 2023

I've updated the sklearn partial_dependence call to use categorical_features instead of grid_resolution. This fixes the issue for python >= 3.8 but sklearn 1.3.0 is not supported for python == 3.7 so the categorical_features functionality has not been added and this introduces an error. I've added some logic so that the test checks for which sklearn version we're using.

Note: Python 3.7 is technically end-of-life so we should drop support for it (see here). If we do so we could remove the above changes. It's perhaps worth leaving them in case we're testing on supported python versions with sklearn<=-1.3.0.

@ascillitoe ascillitoe self-requested a review July 6, 2023 16:49
@ascillitoe
Copy link
Contributor

Ci failing due to #943

@ascillitoe
Copy link
Contributor

I'm in favour of the second option i.e. keeping the conditional sklearn version behaviour in to support running tests with a new Python version but older sklearn version.

@ascillitoe ascillitoe merged commit edd11aa into SeldonIO:master Jul 7, 2023
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP This PR is a Work in Progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants