-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEA Fused sparse-dense support for PairwiseDistancesReduction
#23585
Merged
ogrisel
merged 83 commits into
scikit-learn:main
from
jjerphan:maint/pdr-sparse-support
Sep 20, 2022
Merged
Changes from 41 commits
Commits
Show all changes
83 commits
Select commit
Hold shift + click to select a range
b8bd875
MAINT Implement CSR support for all DistanceMetric
jjerphan 7b07188
Merge branch 'main' into maint/dist-metrics-csr-support
jjerphan fb99680
TST Remove useless guard
jjerphan d39d2b2
TST Skip JaccardDistance on 32bit architecture
jjerphan 011e2a2
MAINT Define dtype alias for sparse matrices indices
jjerphan a579630
MAINT Do not shadow dtype names in Tempita templating
jjerphan 98e9d21
fixup! MAINT Define dtype alias for sparse matrices indices
jjerphan 8aa4e44
TST Use cdist and pdist appropriately
jjerphan 9edfa11
DOC Improve comments
jjerphan ee5c6bf
Fixups
jjerphan bf5eb59
MAINT Wrap of indptr values to support sparse-dense
jjerphan 92b8a6c
Apply review comments
jjerphan dc6f8cf
More interesting boolean data for tests
ogrisel bb06f59
FIX Various corrections
jjerphan a5eb20d
FIX Make Jaccard, Hamming and Hashing robust to explicit zeros
jjerphan 19edf11
FIX Make the other boolean DistanceMetric also robust to explicit zeros
jjerphan de86802
TST Remove xfail for Jaccard on 32bit arch.
jjerphan bb920cf
Cast to np.float64_t where appropriate
jjerphan b3759fe
Rename methods and correctly format their signatures
jjerphan 7f89236
fixup! TST Remove xfail for Jaccard on 32bit arch.
jjerphan 01a0c33
FEA CSR support for HaversineDistance
jjerphan 7d8a717
Fix typo
jjerphan 563e359
Do not upcast to 64bit yet keep the same precision
jjerphan f863a51
Do use the default rtol
jjerphan 5ba0fbe
Set rtol explicitly in test_distance_metrics_dtype_consistency
ogrisel 4f45839
Implement the sparse-dense and the dense-sparse case for c-contiguity
jjerphan 3e3e888
Add validation on X and Y, accepting CSR as inputs
jjerphan ddc49d5
Remove left-overs
jjerphan a83887c
Merge branch 'main' into maint/dist-metrics-csr-support
jjerphan e8bb70a
Merge branch 'main' into maint/pdr-sparse-support
jjerphan dec0aa8
Add support for all combinations of {dense,sparse} datasets pairs
jjerphan 63c6fe3
Const-qualify X and Y
jjerphan 0bb368f
Merge branch 'main' into maint/pdr-sparse-support
jjerphan 30e84af
Only pass Y_norm_squared for the dense-dense case
jjerphan 780d7bb
Update comments
jjerphan 72f4ae7
Pop unused keywords arguments
jjerphan a1ce042
Remove unused import
jjerphan 713b932
DOC Add whats_new entry
jjerphan ac6208d
Merge branch 'main' into maint/pdr-sparse-support
jjerphan 5570f71
fixup! Pop unused keywords arguments
jjerphan 4c455ea
DOC Update comment and changelog
jjerphan 0f0ea70
MAINT Test second alternative for sparse-dense support
jjerphan 80b8c02
Merge branch 'main' into maint/pdr-sparse-support
jjerphan a3cf4d8
DOC Update comment and changelog
jjerphan e9da38b
Merge branch 'maint/pdr-sparse-support' into alt/feat/pdr-sparse-support
jjerphan e9ecbbc
DOC Improve comments and code self-documentation
jjerphan c8bacc6
Merge branch 'main' into maint/pdr-sparse-support
jjerphan 180a54e
Merge branch 'maint/pdr-sparse-support' into alt/feat/pdr-sparse-support
jjerphan de52371
TST `dtype`-parametrize `test_format_agnosticism`
jjerphan 3243153
fixup! TST `dtype`-parametrize `test_format_agnosticism`
jjerphan e521992
MNT Pushing data up instead of indices
thomasjpfan 972fff9
DOC Improve comment
thomasjpfan 3086c0b
REV Revert back to memoryviews for indices
thomasjpfan afa0c35
DOC Spelling
thomasjpfan bc78747
Merge pull request #16 from thomasjpfan/alt/feat/pdr-sparse-support
jjerphan bd48ef0
Merge pull request #15 from jjerphan/alt/feat/pdr-sparse-support
jjerphan be59297
TST Suggest logic adaptation for _pairwise_{dense_sparse,sparse_dense}
jjerphan f8ab496
Merge branch 'main' into maint/pdr-sparse-support
jjerphan ec1d4f9
DOC Add co-authors in `whats_new` entry
jjerphan fbf311e
Do not support CSR matrices without non-zero elements
jjerphan 511c6e6
Merge branch 'main' into maint/pdr-sparse-support
jjerphan 8fddffd
fixup! Merge branch 'main' into maint/pdr-sparse-support
jjerphan 4b879f1
MAINT Do not pop Y_norm_squared when unused
jjerphan 0b1ce13
Merge branch 'main' into maint/pdr-sparse-support
jjerphan ca49236
Explicitly do not support CSR matrices with int64 indices and indptr
jjerphan d7b3649
DOC Update and improve comment for the alternative CSR representation
jjerphan 5e13663
fixup! DOC Update and improve comment for the alternative CSR represe…
jjerphan 1de8acb
CI Retrigger CI due to faulty runs
jjerphan 7766388
Merge branch 'main' into maint/pdr-sparse-support
jjerphan 8f43a5a
DOC Update whats_new entry
jjerphan a229b35
Test and document Isomap on sparse data
jjerphan 3e357a2
Test and document TSNE on sparse data
jjerphan 9723115
Test and document pairwise_distances_argmin on sparse data
jjerphan faf704a
Test and document LocalOutlierFactor on sparse data
jjerphan c66bb82
DOC Add support for sparse data for NearestNeighbors, KNeighbors*, Ra…
jjerphan 1eb5b2c
DOC Remove formatting change
jjerphan fcf15b6
TST Do not test on full cartesian product
jjerphan 58453d7
fixup! TST Do not test on full cartesian product
jjerphan 63fda8c
TST Add TODO for consistency checks on results for sparse and dense data
jjerphan 1d7bcc7
MAINT Mark PairwiseDistancesReductions as unusable for some config.
jjerphan fec55bf
fixup! MAINT Mark PairwiseDistancesReductions as unusable for some co…
jjerphan d55bcec
TST Improve test_format_agnosticism
jjerphan c21187a
DOC Update comment regarding the use of pairwise_distances_chunked
jjerphan File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan to Tempita all of these? Specifically,
DTYPE_t
to becomefloat32
andfloat64
andSPARSE_INDEX_TYPE_t
to becomeint32
andint64
.If so, does this extend to all combinations? For example, sparse data with
float32
data andint64
indices paired with a sparse data withfloat64
data andint32
indices. 🤯There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially yes, especially for
DTYPE_t
which will remain fixed datasetpair-wise.I don't see how to easily handle various the case on
SPARSE_INDEX_TYPE_t
yet/using Cython, and I fear the complex logic and manual templating…Other languages would have allowed to ease some logic, but going done this road come after broader discussions IMO.