Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix support for scikit-learn>=1.2.0 and Numpy=1.24.0 #52

Open
ehudkr opened this issue Dec 21, 2022 · 1 comment
Open

Fix support for scikit-learn>=1.2.0 and Numpy=1.24.0 #52

ehudkr opened this issue Dec 21, 2022 · 1 comment
Assignees

Comments

@ehudkr
Copy link
Collaborator

ehudkr commented Dec 21, 2022

Scikit-learn version 1.2.0 enforces two API changes that currently break tests.

  1. LinearRegression no longer supports the normalize keyword argument, which some of the tests use.
    Fix should theoretically be rather simple since it is just replacing LinearRegression with a Pipeline object with a StandardScaler preprocessing step.
  2. Scikit-learn now enforces strict column name restrictions.
    First, all columns must be of the same type, and second, column names should match between fit and predict.
    This might require a solution of larger breadth.
    The first part will require a "safe join" that is column-name-type aware and replace all the instances we join covariate X with treatment assignment a.
    The second part require to validate column-names are consistent/preserved when new data is inputted. Which might be mostly in the time-pooled survival models where a time range is artificially created and placed as a predictor.

A slightly more minor exception was also raised with Numpy v1.24.0. Throwing a TypeError: ufunc 'isfinite' not supported for the input types exception when generating calibration plots calls matplotlib's fill_between call that fails.
Need to dig deeper into that and whether that's a causallib problem (providing bad fill values) or some external matplotlib-numpy mismatch.

In the meantime, PR #50 limited the allowed dependency versions.

@ehudkr ehudkr self-assigned this Dec 21, 2022
@ehudkr
Copy link
Collaborator Author

ehudkr commented Mar 22, 2023

The numpy 1.24.0 bug indeed seems to be a matplotlib problem matplotlib/matplotlib#24106, which was fixed matplotlib/matplotlib#24115 and released in matplotlib v3.6.1, so updating matplotlib should allow updating numpy too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant