New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

ENH Array API support for f1_score #27369

Open

OmarManzoor wants to merge 20 commits into scikit-learn:main from OmarManzoor:f1_array_api

Contributor

OmarManzoor commented Sep 14, 2023

Reference Issues/PRs

Towards #26024

What does this implement/fix? Explain your changes.

Adds array api support for f1_score and the functions related to it.

Any other comments?

CC: @ogrisel @betatim


          ENH Array API support for f1_score

9bcbc4c

github-actions bot added module:metrics module:preprocessing module:utils labels

OmarManzoor commented

View reviewed changes

sklearn/preprocessing/_label.py Outdated Show resolved Hide resolved

github-actions bot commented Sep 14, 2023 •

edited

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 5cd9a11. Link to the linter CI: here}

OmarManzoor commented

View reviewed changes

sklearn/utils/_array_api.py Outdated Show resolved Hide resolved

OmarManzoor mentioned this pull request

ENH Array API support for LabelEncoder #27381

Merged

betatim mentioned this pull request

Make more of the "tools" of scikit-learn Array API compatible #26024

Open

OmarManzoor added 5 commits

May 17, 2024 12:02


          Merge branch 'main' into f1_array_api

87b65b9


          Merge branch 'main' into f1_array_api

17457b9


          Add array api support for f1_score

c80617f


          Add changelog

649ce17


          Merge branch 'main' into f1_array_api

8f0db56

OmarManzoor marked this pull request as ready for review

May 20, 2024 10:07

OmarManzoor added 3 commits

May 20, 2024 16:35


          Fix sample weights in _bincount

6a02fcd


          Add some fixes

c150c9c


          Correct and add tests for nanmean

a01f2d7

Contributor Author

OmarManzoor commented May 20, 2024

@ogrisel Could you kindly have a look at this PR?

OmarManzoor commented

View reviewed changes

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

OmarManzoor added 2 commits

May 20, 2024 18:32


          Add options for testing with various average values

aa2f521


          Use reshape when creating arrays in micro average

75c7d5a

ogrisel reviewed

View reviewed changes

Member

ogrisel left a comment •

edited

Overall this looks good to me. I am surprised it works without being very specific about device and dtypes, but as long as the tests (and they do), I am happy.

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

sklearn/metrics/_classification.py Outdated

-                      tp = np.array(tp)
-                      fp = np.array(fp)
-                      fn = np.array(fn)
+                      sample_weight = xp.asarray(sample_weight)

Member

ogrisel May 24, 2024

We should probably make sure that it matches the device of the inputs, no? It's curious that existing tests do not fail with PyTorch and MPS device (or cuda devices).

I am also wondering of whether we should convert to a specific dtype. However looking at the tests I never see any case where we pass non-integer sample weights. And even for integer weights, it's only done to check an error message, not to check an actual computation. So I am not sure our sample_weight support is correct, even outside of array API concerns.

I guess this is only indirectly tested by classification metrics that rely on multilabel_confusion_matrix internally. But then the array API compliance tests for F1 score do not fail with floating point weights (I just checked) and I am not sure why.

Member

ogrisel May 24, 2024

Here is the output of my cuda run on this PR (updated to check that boolean array indexing also works, but this should be orthogonal):

$ pytest -vlx -k "array_api and f1_score" sklearn/ 

================================================================================================== test session starts ===================================================================================================
platform linux -- Python 3.10.12, pytest-7.4.2, pluggy-1.3.0
collected 34881 items / 34863 deselected / 2 skipped / 18 selected                                                                                                                                                       

sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_binary_classification_metric-numpy-None-None] PASSED                                                                      [  5%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_binary_classification_metric-array_api_strict-None-None] PASSED                                                           [ 11%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_binary_classification_metric-cupy-None-None] PASSED                                                                       [ 16%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_binary_classification_metric-cupy.array_api-None-None] PASSED                                                             [ 22%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_binary_classification_metric-torch-cpu-float64] PASSED                                                                    [ 27%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_binary_classification_metric-torch-cpu-float32] PASSED                                                                    [ 33%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_binary_classification_metric-torch-cuda-float64] PASSED                                                                   [ 38%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_binary_classification_metric-torch-cuda-float32] PASSED                                                                   [ 44%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_binary_classification_metric-torch-mps-float32] SKIPPED (Skipping MPS device test because PYTORCH_ENABLE_MPS_FALLBACK...) [ 50%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_multiclass_classification_metric-numpy-None-None] PASSED                                                                  [ 55%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_multiclass_classification_metric-array_api_strict-None-None] PASSED                                                       [ 61%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_multiclass_classification_metric-cupy-None-None] PASSED                                                                   [ 66%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_multiclass_classification_metric-cupy.array_api-None-None] PASSED                                                         [ 72%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_multiclass_classification_metric-torch-cpu-float64] PASSED                                                                [ 77%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_multiclass_classification_metric-torch-cpu-float32] PASSED                                                                [ 83%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_multiclass_classification_metric-torch-cuda-float64] PASSED                                                               [ 88%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_multiclass_classification_metric-torch-cuda-float32] PASSED                                                               [ 94%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[f1_score-check_array_api_multiclass_classification_metric-torch-mps-float32] SKIPPED (Skipping MPS device test because PYTORCH_ENABLE_MPS_FALL...) [100%]

============================================================================= 16 passed, 4 skipped, 34863 deselected, 105 warnings in 15.59s =============================================================================

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

OmarManzoor added 2 commits

May 27, 2024 09:57


          Add LabelEncoder and f1_score in array_api.rst

8b21b51


          Merge branch 'main' into f1_array_api

bc8c2df

Contributor Author

OmarManzoor commented May 29, 2024

@ogrisel @betatim Does this look okay now?

ogrisel added the Array API label

ogrisel reviewed

View reviewed changes

Member

ogrisel left a comment

I do not have time to finish the review today but here is some quick feedback:

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

sklearn/utils/_array_api.py Show resolved Hide resolved


          Merge branch 'main' into f1_array_api

ef33cf6

Member

ogrisel commented Jun 5, 2024 •

edited

I merged main to be able to launch the new CUDA GPU CI workflow on this PR:

https://github.com/scikit-learn/scikit-learn/actions/runs/9387261905

EDIT: tests are green.

ogrisel mentioned this pull request

CI Merge main in the PR branch prior to running CUDA array API tests #29194

Closed

OmarManzoor added 3 commits

June 6, 2024 10:31


          Merge branch 'main' into f1_array_api

91ab0d5


          Update: PR suggestions

696e65b


          Use xp.reshape with (1,)

d0b647b

ogrisel approved these changes

View reviewed changes

Member

ogrisel left a comment

LGTM once the following is addressed:

sklearn/utils/_array_api.py Outdated Show resolved Hide resolved

OmarManzoor added 2 commits

June 6, 2024 17:55


          Simplify count in _nanmean

842e269


          Merge branch 'main' into f1_array_api

6e9596e

Member

ogrisel commented Jun 6, 2024

@betatim this is ready for a second review.

ogrisel added the Waiting for Second Reviewer label

Member

ogrisel commented Jun 6, 2024 •

edited

I launched the CUDA GPU CI at:

https://github.com/scikit-learn/scikit-learn/actions/runs/9402808393

EDIT: CUDA tests are green.


          Merge branch 'main' into f1_array_api

5cd9a11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment