Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Speed up sparse.csgraph.dijkstra 2.0 #20717

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

tsery-ns
Copy link
Contributor

Introduction

Rebase, reorganization, and code review of PR 17019.

Original description

Another pull request to speed up sparse.csgraph.dijkstra, succeeding #16696
Thank you @jjerphan for reviewing the previous pull request and giving me advice

  • Replaces FibonacciHeap with std::priority_queue which makes sparse.csgraph.dijkstra faster
    • asymptotically due to the removal of a bug in FibonacciHeap implementation
    • with regard to the constant factor due to better memory usage(?)
  • Unifies the four similar dijkstra functions into one
    • Performance overhead was negligible, probably because the priority_queue is the bottleneck
  • Adds the star graph to the benchmark as the worst case for the previous version
  • Fixes a typo in the test file name
  • A part of the test file for dijkstra is modified for better readability, nothing really changed
    ( See benchmarks in PR 17019).

Changes from original PR

  • Squashed tiny commits into larger commits (e.g., this).
  • Removing irrelevant commits due to changes in repo since the original PR was opened ( e.g., this one).
  • Adapted Yen's algorithm to the new Dijkstra internal implementation (under this commit).
  • Removed random testing ( as I agree with the comment here), instead added a star-graph test.

@github-actions github-actions bot added scipy.sparse scipy.sparse.csgraph Cython Issues with the internal Cython code base Meson Items related to the introduction of Meson as the new build system for SciPy enhancement A new feature or improvement labels May 15, 2024
@j-bowhay j-bowhay requested a review from Kai-Striega May 15, 2024 20:16
Copy link
Member

@Kai-Striega Kai-Striega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to get this moving again :) Thanks for the work @tsery-ns!

I've had a quick scan of it and it looks good. I won't approve it yet as I need to do a more thorough review first. I'm pretty busy with work at the moment, but I'll try to fit it in either over this weekend or the following one.

scipy/sparse/csgraph/tests/test_shortest_path.py Outdated Show resolved Hide resolved
@j-bowhay
Copy link
Member

@jjerphan I believe you had some involvement in #17019, would you be willing to take a look here?

@jjerphan
Copy link
Contributor

Hi @j-bowhay,

I would like to, but I just need some time.

scipy/sparse/csgraph/_shortest_path.pyx Outdated Show resolved Hide resolved
@@ -419,9 +449,11 @@ def test_yen_undirected():
source=0,
sink=3,
K=4,
directed=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here? Is a bug being fixed/behavior being changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug fixed. The test is meant to be "undirected" but the parameter was not passed properly.

@tsery-ns
Copy link
Contributor Author

Is there a build expert in the audiance who can help me mitigate the build warning:

In file included from ../../../../../../opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/numpy/core/include/numpy/ndarraytypes.h:1929,
                 from ../../../../../../opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
                 from ../../../../../../opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/numpy/core/include/numpy/arrayobject.h:5,
                 from scipy/sparse/csgraph/_shortest_path.cpython-310-x86_64-linux-gnu.so.p/_shortest_path.cpp:1227:
../../../../../../opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: error: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Werror=cpp]
   17 | #warning "Using deprecated NumPy API, disable it with " \
      |  ^~~~~~~

@rgommers perhaps?

cython_gen_csgraph_for_cpp.process(pyx_file[1]),
cpp_args: cython_cpp_args,
include_directories: inc_np,
dependencies: py3_dep,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

py3_dep isn't needed. np_dep is needed I assume - and that should take care of the build warning about deprecated API.

Copy link
Contributor

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for super-seeding this work, @tsery-ns!

I have limited time, but here is another pass. I think a few allocations can be removed; I believe the implementations can be changed so that a scalar placeholder value (like None) can be passed instead of placeholders arrays whose unneeded allocations are costly.

If I were to have more time, I would look into the algorithmic and Cython details of it.


Side-note: scikit-learn sets the following Cython directives:

  • language_level=3
  • boundscheck=False
  • wraparound=False
  • initializedcheck=False
  • nonecheck=False
  • cdivision=True
  • profile=False

Would it be relevant for SciPy?

dist_matrix[0, source] = 0
int[:] predecessor_matrix = np.full((N), NULL_IDX, dtype=ITYPE)
double[:] dist_matrix = np.full((N), np.inf, dtype=DTYPE)
int[:] dummy_source_matrix = np.empty((0), dtype=ITYPE) # unused
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment: is it possible to avoid this allocation?

@@ -582,48 +584,56 @@ def dijkstra(csgraph, directed=True, indices=None,
else:
predecessor_matrix = np.empty((len(indices), N), dtype=ITYPE)
predecessor_matrix.fill(NULL_IDX)
source_matrix = np.empty((len(indices), 0), dtype=ITYPE) # unused
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend against allocating this array if it is not required.

else:
predecessor_matrix = np.empty((0, N), dtype=ITYPE)
predecessor_matrix = np.empty((len(indices), 0), dtype=ITYPE)
source_matrix = np.empty((len(indices), 0), dtype=ITYPE) # unused
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I would recommend against allocating this array if it is not required.

scipy/sparse/csgraph/_shortest_path.pyx Show resolved Hide resolved
Comment on lines +1173 to +1190
dummy_double_array = np.empty(0, dtype=DTYPE)
dummy_int_array = np.empty(0, dtype=ITYPE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to omit those unneeded allocations?

@@ -68,7 +70,7 @@ def shortest_path(csgraph, method='auto',
Computational cost is approximately ``O[N^3]``.
The input csgraph will be converted to a dense representation.

'D' -- Dijkstra's algorithm with Fibonacci heaps.
'D' -- Dijkstra's algorithm with priority queue.
Computational cost is approximately ``O[N(N*k + N*log(N))]``,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the complexity change in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears the complexity currently stated is wrong even before the change. Dijkstra's worst complexity without a priority queue is O(N^2). So the current complexity is definitely not O(N^2 * log(N))
The correct complexity should be O[(N*k + N)*log(N)] (assuming N*k is a logical way to compute the number of edges in the graph).
Should we state the known complexity of O[(E + N)*log(N)] where E is the number of edges?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend compiling this file with code annotation to understand if those Cython implementations can be technically improved further with a few changes to the Cython directives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_shortest_path.txt

I've run the annotation for you, here is the output file. Unfortunately, GitHub doesn't let you upload HTML files. I changed the file extension to .txt — you'll just have to change it back to .html again.

@tsery-ns
Copy link
Contributor Author

@jjerphan thank you for taking the time to review my PR.

I think a few allocations can be removed; I believe the implementations can be changed so that a scalar placeholder value (like None) can be passed instead of placeholders arrays whose unneeded allocations are costly.

I have implemented a change so these unused allocations are removed. However, that comes with a price. With the change, the inner methods are not completely 'Cythonic' (i.e., there are yellow lines in the annotation). This is not critical for performance, as the most inner method _dijkstra_scan_heap() and loop around it are still fully 'Cythonic'.
Take a look at the commit here and let me know if you'd like me to add it to the PR. It might require more work, but all the main logic is implemented and all tests pass.

Side-note: scikit-learn sets the following Cython directives

The boundscheck=False is used by some of the methods. Still, this module requires optimization work (not just Dijsktra, and not only adding directives). That would be left for future work.

@tsery-ns
Copy link
Contributor Author

tsery-ns commented Jun 5, 2024

Friendly reminder for reviewers :)
@jjerphan @Kai-Striega

@jjerphan
Copy link
Contributor

jjerphan commented Jun 5, 2024

I can't guarantee having some time soon for a second comprehensive review. 😕

@dschmitz89
Copy link
Contributor

@Kai-Striega
Copy link
Member

I'll take a look over the weekend @tsery-ns.

@tsery-ns
Copy link
Contributor Author

tsery-ns commented Jun 7, 2024

There is one test failure left: https://github.com/scipy/scipy/actions/runs/9316113382/job/25643666763?pr=20717

I require assistance with this issue:
../build/scipy/sparse/csgraph/_shortest_path.cpython-310-x86_64-linux-gnu.so: too many public symbols!
I looked online and tried using co-pilot, but failed to solve the issue.
Reaching our for the community for help.
Thank you.

@lucascolley

This comment was marked as off-topic.

@Kai-Striega
Copy link
Member

I've run the benchmarks locally and here are the results.

On the current main branch:

[50.00%] ··· ===== =============== ============= ================ ==============
             --                          min_only / format                      
             ----- -------------------------------------------------------------
               n    True / random   True / star   False / random   False / star 
             ===== =============== ============= ================ ==============
               30      406±0μs        380±0μs        245±0μs         460±0μs    
              300      570±0μs        468±0μs        4.02±0ms        556±0μs    
              900      1.91±0ms       563±0μs        88.9±0ms        4.75±0ms   
             ===== =============== ============= ================ ==============

On this branch:

[ 0.00%] ·· Benchmarking existing-py_home_kai_miniconda3_envs_scipy-dev_bin_python
[50.00%] ··· sparse_csgraph_dijkstra.Dijkstra.time_dijkstra_multi                                                                                                                                                                                                                                                                  ok
[50.00%] ··· ===== =============== ============= ================ ==============
             --                          min_only / format                      
             ----- -------------------------------------------------------------
               n    True / random   True / star   False / random   False / star 
             ===== =============== ============= ================ ==============
               30      534±0μs        476±0μs        546±0μs         384±0μs    
              300      523±0μs        515±0μs        2.33±0ms        960±0μs    
              900      2.76±0ms       570±0μs        45.9±0ms        5.07±0ms   
             ===== =============== ============= ================ ==============

Copy link
Member

@Kai-Striega Kai-Striega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this!

The main issue I have at the moment are the unneeded allocations mentioned by @jjerphan.

There's also a lot of yellow in the Cython annotations. Although I don't think that needs to be fixed in this PR.

@jjerphan
Copy link
Contributor

jjerphan commented Jun 9, 2024

@da-woods : I think we need your expertise to understand the remaining Cython compilation error with CPython's free threaded build.

@lucascolley

This comment has been minimized.

@da-woods
Copy link
Contributor

da-woods commented Jun 9, 2024

@da-woods : I think we need your expertise to understand the remaining Cython compilation error with CPython's free threaded build.

Just to confirm, that's:

/home/runner/work/scipy/scipy/scipy/sparse/csgraph/_shortest_path.pyx:713:25: C++ class must have a nullary constructor to be stack allocated

????

I'd suspect that's a bug in the current Cython master vs whatever Cython release you're using for the rest of the builds. I doubt it's related to free-threading particularly. I'll try to have a more detailed look later.

@da-woods
Copy link
Contributor

da-woods commented Jun 9, 2024

Right... got to the bottom of your free-threading build issue... It is indeed a slight behaviour change from 3.0 -> master.

Cython has some slightly different behaviour depending on if it believes it's compiling in C++ mode or in C mode. Specifically to do with having multiple definitions of the same function.

This is controlled by the flag --cplus to Cython, or in the distutils/setuptools world the language argument to "extensions". You've moved beyond the distutils/setuptools world so I don't know quite what's going on, but you aren't passing the flag --cplus for this file.

The upshot is that Cython believes its operating in C mode so doesn't apply the extra C++ rules.

On the Cython side I'm going to make a small modification so that it always treats cppclass as C++ (whatever it thinks about the file as a whole). I think that makes sense and would stop this little bit of breakage.

From your side I think you should be passing the --cplus flag to Cython (at least for everything that uses C++). I don't know how you do it in your build system, but hopefully you know and this is enough information to point you in the right direction.

da-woods added a commit to da-woods/cython that referenced this pull request Jun 9, 2024
but add some warnings for people who are using them outside C++
mode (since some features, especially free-functions likely
won't work as they expect).

Related to scipy issue scipy/scipy#20717
and change in cython#3235.
da-woods added a commit to da-woods/cython that referenced this pull request Jun 9, 2024
but add some warnings for people who are using them outside C++
mode (since some features, especially free-functions likely
won't work as they expect).

Related to scipy issue scipy/scipy#20717
and change in cython#3235.
@lucascolley
Copy link
Member

FYI @tsery-ns , the merge conflict is from gh-20913.

@tsery-ns
Copy link
Contributor Author

tsery-ns commented Jun 9, 2024

Thank you everyone for helping out with the build issue. I will sort it out, together with the conflicts, in the next couple of days.

The main issue I have at the moment are the unneeded allocations mentioned by @jjerphan.

As I described in an earlier comment, it is possible to avoid the allocation with a commit I currently left out of the PR. Including this commit introduces more yellow lines in the annotation, but not on the heavy loop of the algorithm.
@Kai-Striega, @jjerphan Please find the time to review the commit, and let me know if you want it included in the PR (or not).
On that matter, I'd like to point out that allocations of arrays with zero as one of the dimensions do not allocate memory, only the wrapper required for the object.
image

There's also a lot of yellow in the Cython annotations. Although I don't think that needs to be fixed in this PR.

Once this PR is merged, I intend to work on adding benchmarks and optimizations to this module. I have a few ideas in mind, working on yellow annotations is one of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython Issues with the internal Cython code base enhancement A new feature or improvement Meson Items related to the introduction of Meson as the new build system for SciPy scipy.sparse.csgraph scipy.sparse
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet