Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[oneMKL][spblas] Update sparse blas API #522

Merged
merged 53 commits into from
May 8, 2024

Conversation

Rbiessy
Copy link
Contributor

@Rbiessy Rbiessy commented Mar 5, 2024

Update the sparse blas API to better support other sparse backends.

This is a large change to the existing sparse API. Most notable changes are:

  • Introduce handle opaque types that store the USM pointer or SYCL buffer.
  • Introduce matrix properties and views.
  • Merge gemv, symv and trmv to spmv operation; gemm is renamed to spmm; trsv is renamed to spsv.
  • Drop support for gemvdot as there didn't seem to be enough use-cases nor support in other backends.
  • Add support for external workspace provided by the user which size is queried by *_buffer_size functions.

I have rendered the spec in html and verified it looks sane.

@gajanan-choudhary and @spencerpatty FYI.

Update sparse blas API to better support other backends.
@andrewtbarker
Copy link
Contributor

Can you share your motivation for introducing the dense_matrix_handle_t type? The dense BLAS parts of the oneAPI spec do not use such a handle and I'm not sure it's helpful for standardization, but I am willing to be convinced.

@gajanan-choudhary
Copy link
Contributor

gajanan-choudhary commented Mar 8, 2024

Can you share your motivation for introducing the dense_matrix_handle_t type? The dense BLAS parts of the oneAPI spec do not use such a handle and I'm not sure it's helpful for standardization, but I am willing to be convinced.

@andrewtbarker, both cuSPARSE and rocSPARSE have introduced "generic APIs" in the last few years. These contain (likely lightweight) wrappers over dense matrices. For example, see section 6.4. Dense Matrix APIs of cuSPARSE documentation. Mainly, the wrapper stores nrows, ncols, layout, floating point type, and the matrix pointer.

While I agree that the dense BLAS domain does not have such wrappers over dense matrices, the APIs in the sparse domain look cleaner with these wrappers. Without these wrappers, implementations of the oneAPI specification that dispatch to cuSPARSE/rocSPARSE APIs in the backend must repeatedly create and destroy the dense matrix wrapper object any and every time their API is called.

The BLAS domain is fortunately well-standardized, so adding dense_matrix_handle_t at oneapi::mkl namespace level may not be appropriate, but the sparse BLAS domain isn't, and so we'd add such a handle under the oneapi::mkl::sparse namespace to indicate it's intention of use only with sparse BLAS APIs.

@andrewtbarker
Copy link
Contributor

@gajanan-choudhary Thanks for your explanation, that makes a lot of sense.

My only remaining concern would be to make sure that the actual data in the dense_matrix_handle_t is transparent to users so they can use these objects in BLAS functions. For sparse_matrix_handle_t it is intentionally opaque because you do not want to expose implementation details (eg CSR versus COO) to users, but for the dense matrix I think it needs to be transparent for interoperability.

@spencerpatty
Copy link
Contributor

spencerpatty commented Mar 10, 2024

@gajanan-choudhary Thanks for your explanation, that makes a lot of sense.

My only remaining concern would be to make sure that the actual data in the dense_matrix_handle_t is transparent to users so they can use these objects in BLAS functions. For sparse_matrix_handle_t it is intentionally opaque because you do not want to expose implementation details (eg CSR versus COO) to users, but for the dense matrix I think it needs to be transparent for interoperability.

We will want to have constructors that can take an std::mdspan or the raw data arrays/sizes/leading_dimensions/layouts for our dense_matrix_handle_t/dense_vector_handle_t and we will want to have an operator function like the following in the dense_matrix_handle_t class that returns an mdspan view:

operator std::mdspan<T, std::dextents<2>> const {return  {data, nrow, ncols}; }

using whatever our internal representation is to construct an mdspan from our internal matrix representation, since that is the direction of C++ for dense linear algebra (whenever we start supporting mdspans :) ) ... likewise an std::mdspan<T, std::dextents<1> > const {return {data, length}; } for the dense_vector_t as well ... That way our local object is lightweight and non-owning and usable in any function that takes in mdspans ...

of course it might be that we need to return a slightly more complicated mdspan with the layout and user defined strides, but that is the idea, that it is easily interpreted as an mdspan, so it can be immediately plugged into any other function that uses mdspans.

These really could be thought of as a dense_matrix_view_t or dense_vector_view_t which are lightweight and non-owning ... so maybe we want to reconsider our name "handle" (C style name) to switch over to "view" which is C++ style name indicating it is a wrapper object that is lightweight and non-owning...

Rbiessy and others added 2 commits March 11, 2024 11:47
Co-authored-by: HJA Bird <9040797+hjabird@users.noreply.github.com>
Co-authored-by: HJA Bird <9040797+hjabird@users.noreply.github.com>
arrays, ``row_ind``, ``col_ind`` and ``val``, and an ``index`` parameter. The
``i``-th defined element in the sparse matrix is represented by its row
index, column index, and value, that is, (``row_ind[i]``, ``col_ind[i]``,
``val[i]``). The entries need not be in a sorted order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``val[i]``). The entries need not be in a sorted order.
``val[i]``). The entries need not be in a sorted order, though performance of Sparse BLAS oeprations may be improved if there it is sorted in some logical way, for instance by row index and then column index subordinate to each row set."

This seems like a perfect place to mention that while sortedness is not required, it can have a performance impact. What do you think ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I've added your suggestion in ebc779c with a minor correction and formatted to split the line.

namespace oneapi::mkl::sparse {

template <typename dataType, typename indexType>
void set_coo_matrix_data (sycl::queue &queue,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference between init_coo_matrix and set_coo_matrix_data ? And should this set case then support asynchronous DAG modification ? or is it also blocking ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since queue is there, I presume it may have kernels called inside it which would prevent it from being embedded in a host_task to get asynchronous update ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question for CSR set/init

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init_* function initializes the structure with a host allocation and sets the data while set_* only sets the data. They are both blocking right now.

I don't see in which case we would need set_* functions to be asynchronous since they do not use the data, they only update the handles on the host side. I think having asynchronous init_* or set_* functions is not practical since the user would not know when the handles are updated so they would need to wait before using a handle that has been initialized or reset.

The queue does not necessarily mean that it is needed to submit kernels or host_tasks. Every functions in oneMKL need the queue to select at runtime which backend to dispatch to. Currently we also need it to support the MKLCPU and MKLGPU backends with 2024.1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about the potential future case of SpGEMM C = A * B sparse * sparse matrix multiplication, but omatadd is also a place where this could show up

  1. init A and B with data
  2. create empty C sparse handle
  3. call spgemm_compute( q, A,B,C) to do some work to determine size of C
  4. query size of C index_type nnz = sgemm_get_nnz(q, A,B,C) I suppose that this might need to involve futures to get nnz value asynchronously ( but if if we never do that then it won't matter if set is asynchronous or not)
  5. allocate C arrays and fill it (could be done with asynchronous allocations to keep chain going)
  6. call spgemm_finalize(q, A,B,C) to fill C matrix

it all depends on whether we eventually will have a way to get nnz value and allocate asynchronously or not, to affect whether we need an asynchronous set_xyz_matrix_data() function ... maybe we will never do that, so this is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right so this would be part of this separate API as well.

As you said being able to allocate asynchronously is not support in SYCL yet as far as I know. Being able to allocate asynchronously with the size being a future type is probably even harder!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, if the time comes that we get such features, we will be able to address this . for now, it is fine as is

Comment on lines 26 to 29
Matrix properties are optional and "strong" guarantees. Unlike
:ref:`onemkl_sparse_matrix_view`, the user must ensure that the handle's data
holds all the given properties. A property can be set to optimize some
operations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Matrix properties are optional and "strong" guarantees. Unlike
:ref:`onemkl_sparse_matrix_view`, the user must ensure that the handle's data
holds all the given properties. A property can be set to optimize some
operations.
Matrix properties are optional and "strong" guarantees. Unlike
:ref:`onemkl_sparse_matrix_view`, the user must ensure that the handle's data
holds all the given properties. A property can be set as a hint for library
to optimize some operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7247b6d

Comment on lines 46 to 47
| For COO this guarantees that that the row indices are sorted in
ascending order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need column indices to also be sorted within the row sets for COO format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a question for @gajanan-choudhary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, column indices should also be sorted. In Excel speak, that's "Sort by rowptr then by colind". If we were okay with COO column indices not being sorted, we wouldn't have a sorted property for CSR matrices. If we have to deal with unsorted COO column indices some day, we can pick a new enum name and add it in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I clarified this in 669a4e7

@Rbiessy
Copy link
Contributor Author

Rbiessy commented May 6, 2024

I have reverted a modification on set_matrix_property in 1292ad6. We discussed before we could use bit_mask in order to set multiple properties but this also implies that a property can be unset. This is not supported with oneMKL 2024.1 as far as I can see. Since this is the only backend supporting this feature I'm keeping this aligned with the close-source library for now.

Copy link
Contributor

@gajanan-choudhary gajanan-choudhary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few unresolved threads, but I'm going ahead and adding my approval. I think we are in a decent shape and very close to merging in this PR.

I would suggest keeping this PR open for a while, at least till all review comments are resolved. @Rbiessy can continue working on an implementation expecting no major changes going forward unless big glaring mistakes pop up, which I don't foresee having done multiple passes of reviews for this PR. If there are any mistakes or things missed out, we will have time till whenever the next release of the oneAPI specification is created.

@rscohn2, please make a note that the next oneAPI version, whenever that is, will need a major version change to 2.x once this PR is merged in.

Great work here, @Rbiessy. Thank you everyone for your reviews, it took us a long time and a lot of effort to get to these spec changes! Cheers!

@Rbiessy
Copy link
Contributor Author

Rbiessy commented May 7, 2024

I added clarifications to the description of dependencies and the returned event of the operations when buffers are used: e279fa3

Copy link
Contributor

@spencerpatty spencerpatty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work, putting all this together! I approve of this pull request!

@Rbiessy
Copy link
Contributor Author

Rbiessy commented May 7, 2024

Thank you @spencerpatty and thanks for the reviews!
I fixed the warning that made the GitHub pipeline fail. Sorry I didn't see it earlier, I'm not seeing any warnings when I build it locally now.

@spencerpatty
Copy link
Contributor

it appears that all build checks have passed, whenever @gajanan-choudhary and @Rbiessy decide it is time to merge, let me know, and I will pull the trigger :)

@Rbiessy
Copy link
Contributor Author

Rbiessy commented May 8, 2024

Thank you @spencerpatty this is ready to be merged as far as I am concerned!

@gajanan-choudhary
Copy link
Contributor

Feel free to merge this, @spencerpatty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants