[oneMKL][spblas] Update sparse blas API #522

Rbiessy · 2024-03-05T16:47:04Z

Update the sparse blas API to better support other sparse backends.

This is a large change to the existing sparse API. Most notable changes are:

Introduce handle opaque types that store the USM pointer or SYCL buffer.
Introduce matrix properties and views.
Merge gemv, symv and trmv to spmv operation; gemm is renamed to spmm; trsv is renamed to spsv.
Drop support for gemvdot as there didn't seem to be enough use-cases nor support in other backends.
Add support for external workspace provided by the user which size is queried by *_buffer_size functions.

I have rendered the spec in html and verified it looks sane.

@gajanan-choudhary and @spencerpatty FYI.

Update sparse blas API to better support other backends.

source/elements/oneMKL/source/domains/spblas/operations/spsv.rst

source/elements/oneMKL/source/domains/spblas/data_types/destroy_sparse_matrix.rst

source/elements/oneMKL/source/domains/spblas/data_types/set_matrix_property.rst

source/elements/oneMKL/source/domains/spblas/matrix_view.rst

source/elements/oneMKL/source/domains/spblas/supported-types.rst

source/elements/oneMKL/source/domains/sparse_linear_algebra.inc.rst

source/elements/oneMKL/source/domains/spblas/data_types/create_coo_matrix.rst

source/elements/oneMKL/source/domains/spblas/data_types/format-descriptions.rst

source/elements/oneMKL/source/domains/spblas/operations/spmm.rst

source/elements/oneMKL/source/domains/spblas/operations/spmv.rst

source/elements/oneMKL/source/domains/spblas/operations/spsv.rst

source/elements/oneMKL/source/domains/spblas/spblas.rst

andrewtbarker · 2024-03-07T18:25:49Z

Can you share your motivation for introducing the dense_matrix_handle_t type? The dense BLAS parts of the oneAPI spec do not use such a handle and I'm not sure it's helpful for standardization, but I am willing to be convinced.

gajanan-choudhary · 2024-03-08T21:44:17Z

Can you share your motivation for introducing the dense_matrix_handle_t type? The dense BLAS parts of the oneAPI spec do not use such a handle and I'm not sure it's helpful for standardization, but I am willing to be convinced.

@andrewtbarker, both cuSPARSE and rocSPARSE have introduced "generic APIs" in the last few years. These contain (likely lightweight) wrappers over dense matrices. For example, see section 6.4. Dense Matrix APIs of cuSPARSE documentation. Mainly, the wrapper stores nrows, ncols, layout, floating point type, and the matrix pointer.

While I agree that the dense BLAS domain does not have such wrappers over dense matrices, the APIs in the sparse domain look cleaner with these wrappers. Without these wrappers, implementations of the oneAPI specification that dispatch to cuSPARSE/rocSPARSE APIs in the backend must repeatedly create and destroy the dense matrix wrapper object any and every time their API is called.

The BLAS domain is fortunately well-standardized, so adding dense_matrix_handle_t at oneapi::mkl namespace level may not be appropriate, but the sparse BLAS domain isn't, and so we'd add such a handle under the oneapi::mkl::sparse namespace to indicate it's intention of use only with sparse BLAS APIs.

andrewtbarker · 2024-03-08T22:50:19Z

@gajanan-choudhary Thanks for your explanation, that makes a lot of sense.

My only remaining concern would be to make sure that the actual data in the dense_matrix_handle_t is transparent to users so they can use these objects in BLAS functions. For sparse_matrix_handle_t it is intentionally opaque because you do not want to expose implementation details (eg CSR versus COO) to users, but for the dense matrix I think it needs to be transparent for interoperability.

spencerpatty · 2024-03-10T04:45:55Z

@gajanan-choudhary Thanks for your explanation, that makes a lot of sense.

My only remaining concern would be to make sure that the actual data in the dense_matrix_handle_t is transparent to users so they can use these objects in BLAS functions. For sparse_matrix_handle_t it is intentionally opaque because you do not want to expose implementation details (eg CSR versus COO) to users, but for the dense matrix I think it needs to be transparent for interoperability.

We will want to have constructors that can take an std::mdspan or the raw data arrays/sizes/leading_dimensions/layouts for our dense_matrix_handle_t/dense_vector_handle_t and we will want to have an operator function like the following in the dense_matrix_handle_t class that returns an mdspan view:

operator std::mdspan<T, std::dextents<2>> const {return  {data, nrow, ncols}; }

using whatever our internal representation is to construct an mdspan from our internal matrix representation, since that is the direction of C++ for dense linear algebra (whenever we start supporting mdspans :) ) ... likewise an std::mdspan<T, std::dextents<1> > const {return {data, length}; } for the dense_vector_t as well ... That way our local object is lightweight and non-owning and usable in any function that takes in mdspans ...

of course it might be that we need to return a slightly more complicated mdspan with the layout and user defined strides, but that is the idea, that it is easily interpreted as an mdspan, so it can be immediately plugged into any other function that uses mdspans.

These really could be thought of as a dense_matrix_view_t or dense_vector_view_t which are lightweight and non-owning ... so maybe we want to reconsider our name "handle" (C style name) to switch over to "view" which is C++ style name indicating it is a wrapper object that is lightweight and non-owning...

Co-authored-by: HJA Bird <9040797+hjabird@users.noreply.github.com>

source/elements/oneMKL/source/domains/sparse_linear_algebra.inc.rst

source/elements/oneMKL/source/domains/spblas/data_types/destroy_sparse_matrix.rst

source/elements/oneMKL/source/domains/spblas/data_types/create_coo_matrix.rst

source/elements/oneMKL/source/domains/spblas/data_types/create_dense_matrix.rst

source/elements/oneMKL/source/domains/spblas/data_types/destroy_dense_matrix.rst

spencerpatty · 2024-05-03T16:06:45Z

source/elements/oneMKL/source/domains/spblas/data_types/format-descriptions.rst

+   arrays, ``row_ind``, ``col_ind`` and ``val``, and an ``index`` parameter. The
+   ``i``-th defined element in the sparse matrix is represented by its row
+   index, column index, and value, that is, (``row_ind[i]``, ``col_ind[i]``,
+   ``val[i]``). The entries need not be in a sorted order.


Suggested change

``val[i]``). The entries need not be in a sorted order.

``val[i]``). The entries need not be in a sorted order, though performance of Sparse BLAS oeprations may be improved if there it is sorted in some logical way, for instance by row index and then column index subordinate to each row set."

This seems like a perfect place to mention that while sortedness is not required, it can have a performance impact. What do you think ?

Sure I've added your suggestion in ebc779c with a minor correction and formatted to split the line.

source/elements/oneMKL/source/domains/spblas/data_types/init_coo_matrix.rst

spencerpatty · 2024-05-03T16:15:52Z

source/elements/oneMKL/source/domains/spblas/data_types/set_coo_matrix_data.rst

+   namespace oneapi::mkl::sparse {
+
+       template <typename dataType, typename indexType>
+       void set_coo_matrix_data (sycl::queue                          &queue,


what is the difference between init_coo_matrix and set_coo_matrix_data ? And should this set case then support asynchronous DAG modification ? or is it also blocking ?

since queue is there, I presume it may have kernels called inside it which would prevent it from being embedded in a host_task to get asynchronous update ...

same question for CSR set/init

The init_* function initializes the structure with a host allocation and sets the data while set_* only sets the data. They are both blocking right now.

I don't see in which case we would need set_* functions to be asynchronous since they do not use the data, they only update the handles on the host side. I think having asynchronous init_* or set_* functions is not practical since the user would not know when the handles are updated so they would need to wait before using a handle that has been initialized or reset.

The queue does not necessarily mean that it is needed to submit kernels or host_tasks. Every functions in oneMKL need the queue to select at runtime which backend to dispatch to. Currently we also need it to support the MKLCPU and MKLGPU backends with 2024.1.

I am thinking about the potential future case of SpGEMM C = A * B sparse * sparse matrix multiplication, but omatadd is also a place where this could show up

init A and B with data

create empty C sparse handle

call spgemm_compute( q, A,B,C) to do some work to determine size of C

query size of C index_type nnz = sgemm_get_nnz(q, A,B,C) I suppose that this might need to involve futures to get nnz value asynchronously ( but if if we never do that then it won't matter if set is asynchronous or not)

allocate C arrays and fill it (could be done with asynchronous allocations to keep chain going)

call spgemm_finalize(q, A,B,C) to fill C matrix

it all depends on whether we eventually will have a way to get nnz value and allocate asynchronously or not, to affect whether we need an asynchronous set_xyz_matrix_data() function ... maybe we will never do that, so this is fine.

Right so this would be part of this separate API as well.

As you said being able to allocate asynchronously is not support in SYCL yet as far as I know. Being able to allocate asynchronously with the size being a future type is probably even harder!

ok, if the time comes that we get such features, we will be able to address this . for now, it is fine as is

spencerpatty · 2024-05-03T16:21:24Z

source/elements/oneMKL/source/domains/spblas/data_types/set_matrix_properties.rst

+   Matrix properties are optional and "strong" guarantees. Unlike
+   :ref:`onemkl_sparse_matrix_view`, the user must ensure that the handle's data
+   holds all the given properties. A property can be set to optimize some
+   operations.


Suggested change

Matrix properties are optional and "strong" guarantees. Unlike

:ref:`onemkl_sparse_matrix_view`, the user must ensure that the handle's data

holds all the given properties. A property can be set to optimize some

operations.

Matrix properties are optional and "strong" guarantees. Unlike

:ref:`onemkl_sparse_matrix_view`, the user must ensure that the handle's data

holds all the given properties. A property can be set as a hint for library

to optimize some operations.

Done in 7247b6d

spencerpatty · 2024-05-03T16:22:10Z

source/elements/oneMKL/source/domains/spblas/data_types/set_matrix_properties.rst

+          | For COO this guarantees that that the row indices are sorted in
+            ascending order.


do we need column indices to also be sorted within the row sets for COO format?

This is a question for @gajanan-choudhary.

Yes, column indices should also be sorted. In Excel speak, that's "Sort by rowptr then by colind". If we were okay with COO column indices not being sorted, we wouldn't have a sorted property for CSR matrices. If we have to deal with unsorted COO column indices some day, we can pick a new enum name and add it in.

I clarified this in 669a4e7

source/elements/oneMKL/source/domains/spblas/operations/spmv.rst

Rbiessy · 2024-05-06T09:22:14Z

I have reverted a modification on set_matrix_property in 1292ad6. We discussed before we could use bit_mask in order to set multiple properties but this also implies that a property can be unset. This is not supported with oneMKL 2024.1 as far as I can see. Since this is the only backend supporting this feature I'm keeping this aligned with the close-source library for now.

source/elements/oneMKL/source/domains/spblas/operations/spsv.rst

gajanan-choudhary

There are a few unresolved threads, but I'm going ahead and adding my approval. I think we are in a decent shape and very close to merging in this PR.

I would suggest keeping this PR open for a while, at least till all review comments are resolved. @Rbiessy can continue working on an implementation expecting no major changes going forward unless big glaring mistakes pop up, which I don't foresee having done multiple passes of reviews for this PR. If there are any mistakes or things missed out, we will have time till whenever the next release of the oneAPI specification is created.

@rscohn2, please make a note that the next oneAPI version, whenever that is, will need a major version change to 2.x once this PR is merged in.

Great work here, @Rbiessy. Thank you everyone for your reviews, it took us a long time and a lot of effort to get to these spec changes! Cheers!

Rbiessy · 2024-05-07T12:36:18Z

I added clarifications to the description of dependencies and the returned event of the operations when buffers are used: e279fa3

spencerpatty

Fantastic work, putting all this together! I approve of this pull request!

Rbiessy · 2024-05-07T15:44:21Z

Thank you @spencerpatty and thanks for the reviews!
I fixed the warning that made the GitHub pipeline fail. Sorry I didn't see it earlier, I'm not seeing any warnings when I build it locally now.

spencerpatty · 2024-05-07T16:36:31Z

it appears that all build checks have passed, whenever @gajanan-choudhary and @Rbiessy decide it is time to merge, let me know, and I will pull the trigger :)

Rbiessy · 2024-05-08T08:45:41Z

Thank you @spencerpatty this is ready to be merged as far as I am concerned!

gajanan-choudhary · 2024-05-08T15:36:51Z

Feel free to merge this, @spencerpatty.

[oneMKL][spblas] Update sparse blas API

8b413e1

Update sparse blas API to better support other backends.

Rbiessy commented Mar 5, 2024

View reviewed changes

Rbiessy added 2 commits March 5, 2024 17:30

Reword high level sparse_blas domain page

3e5c258

Fix set_dense_vector_data alignment

0b92a22

spencerpatty reviewed Mar 5, 2024

View reviewed changes

source/elements/oneMKL/source/domains/spblas/supported-types.rst Outdated Show resolved Hide resolved

Rename template types to dataType and indexType

ce62ad3

hjabird reviewed Mar 7, 2024

View reviewed changes

Rbiessy and others added 2 commits March 11, 2024 11:47

Add comma

52a3a6e

Co-authored-by: HJA Bird <9040797+hjabird@users.noreply.github.com>

Capitalize acronym

c272e3f

Co-authored-by: HJA Bird <9040797+hjabird@users.noreply.github.com>

gajanan-choudhary reviewed Mar 11, 2024

View reviewed changes

source/elements/oneMKL/source/domains/spblas/data_types/create_coo_matrix.rst Outdated Show resolved Hide resolved

Rbiessy added 7 commits March 11, 2024 16:36

Remove /

e6811a4

Clarify non-zero entries

96bee85

Add requirements on workspace alignment

2545e7b

Clarify alpha and beta type

6db7514

Do not italicise T and H

fda672d

Reword and format spsv description

2b00cf5

Reorder operations alphabetically

c863160