Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Expose OpenCL optimization pass #1353

Draft
wants to merge 20 commits into
base: master
Choose a base branch
from
Draft

Conversation

SteveBronder
Copy link
Contributor

@SteveBronder SteveBronder commented Aug 29, 2023

Submission Checklist

  • Run unit tests
  • Documentation
    • If a user-facing facing change was made, the documentation PR is here:
    • OR, no user-facing changes were made

Summary

Creates a user exposed flag -fopencl that will attempt to promote log_prob for reverse mode such that we can run the entire thing on a GPU via OpenCL.

This is about halfway there with a few things to figure out

  1. The lower level C++ mir needs to be able to promote vectors and scalars so that they get moved over to the GPU. I think this means that we need to add a tag to arrays and scalars in the mir for a Mem_pattern.t. @WardBrian can you think of another way to do this that wouldn't require that? Would just be annoying since we would have to touch a lot of code.

  2. The optimization pass is all or none aka we either are able to move the entire log_prob over to the GPU or we go back to running only on the CPU. If we are doing this scheme then I think I also need to add a logger so that when a parameter fails we can give the user a reason as to why their model failed to be moved over to the GPU. (UPDATE: done but it's just a writer to stderr)

  3. The current scheme right now is
    a. Do the exact same pass as the SoA optimization using the monotone framework
    b. At the end check the compiler checks whether all the parameters declared in the parameters, transformed parameters, and model block are able to go on the GPU. If so then we are good and continue, otherwise we stop here and throw an error
    c. If (b) passes then we do another pass over those blocks collecting the names of all of the data used in the model
    d. Take the data names from (c) and in the data section add declarations and assignments of that data over to the GPU using to_matrix_cl like in the code below.

    matrix_cl<{TYPE}> {DATA_NAME}_opencl__ = to_matric_cl({DATA_NAME});

I ripped out all the previous OpenCL code and my goal right now is just to get all of those tests compiling and working correctly.

  1. I think it would be a good idea for now to leave the target on the CPU as when we write the scalar from the GPU to CPU that's a stopping point for the async opencl code to know it needs to finish before passing that scalar back.

  2. Mem_pattern.t now has an OpenCL type that indicates a statement or expression can be used on the GPU. For functions, every function in the math library that supports OpenCL also supports the new matrix type, but all functions that support the new matrix type do not support OpenCL. So for the table of available function signatures, if something is tagged OpenCL we assume it can support both SoA and OpenCL. For now I just tagged everything but before we merge and start testing I need to go through the math library and see which functions actually support OpenCL

Release notes

Allow -fopencl that performs a pass on log prob to attempt to promote the model to run on the GPU via OpenCL

Copyright and Licensing

By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)

…fopencl everywhere. Still need to figure out how to handle data assignment. Wrote the code to parse data from log_prob and make new decls for the OpenCL data
@rok-cesnovar
Copy link
Member

Awesome stuff!! Let me know if I can help in any way.

@SteveBronder
Copy link
Contributor Author

Ty! At this point it's mostly just brain storming nice patterns for all this. If you can look at stan-dev/stan#3219 that is a PR we are waiting on before we can merge this

@andrjohns
Copy link
Contributor

This is great! Looking forward to this!

Copy link
Member

@WardBrian WardBrian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round - really more asking questions than anything proper review-y


let lub_mem_pat lst =
let find_soa mem_pat = mem_pat = SoA in
let find_soa mem_pat = match mem_pat with SoA -> true | _ -> false in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equivalent to is_soa above

Comment on lines +13 to +22
type t = AoS | SoA | OpenCL [@@deriving sexp, compare, map, hash, fold, equal]

let pp ppf = function
| AoS -> Fmt.string ppf "AoS"
| SoA -> Fmt.string ppf "SoA"
| OpenCL -> Fmt.string ppf "OpenCL"

let is_soa mem = match mem with SoA -> true | _ -> false
let is_aos mem = match mem with AoS -> true | _ -> false
let is_opencl mem = match mem with OpenCL -> true | _ -> false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a relationship between these right? All "OpenCL" memory layouts are automatically SoA, right?

Should is_soa take this into account?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All "OpenCL" memory layouts are automatically SoA, right?

Yes, but I think we should keep them different. For instance like printing the cpp we probably want to know the difference. I need to think about a right way to describe this

Comment on lines +241 to +250
let is_eigen_type st =
match st with
| (SVector (mem, _) | SRowVector (mem, _) | SMatrix (mem, _, _))
when Mem_pattern.is_opencl mem ->
false
| SVector _ | SRowVector _ | SMatrix _ | SComplexRowVector _
|SComplexVector _ | SComplexMatrix _ ->
true
| _ -> false

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel a little nervous that SizedType.is_eigen_type st and UnsizedType.is_eigen_type (SizedType.to_unsized st) would return different things in some circumstances. I suppose it depends on how/where both of them are used, but I'd feel a bit more confident with a is_eigen_type which matches Unsized and using is_eigen st && not (is_opencl st) where needed.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree it's very shaky. I think splitting it out it a better idea

@@ -103,13 +126,16 @@ let query_stan_math_mem_pattern_support (name : string)
|> Result.is_ok )
namematches in
let is_soa ((_ : UnsizedType.returntype), _, mem) =
mem = Mem_pattern.SoA in
match requested_mem with
| Mem_pattern.SoA -> mem = Mem_pattern.SoA || mem = OpenCL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like one place where the SoA and OpenCL things are considered equivalent, so I wonder if the is functions in mem_pattern.ml should do the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For checking the functions OpenCL -> SoA but it is not true that SoA -> OpenCL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any functions with are SoA but not OpenCL? It seems like this was a total find-and-replace, but that also seems incorrect? (The math opencl/ folder is much smaller than the others, no?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for now I just did a find and replace but before we merge I need to go through and strip out what is SoA and what is OpenCL

@@ -220,6 +220,15 @@ let modify_sizedtype_mem (mem_pattern : Mem_pattern.t) st =
match mem_pattern with
| AoS -> demote_sizedtype_mem st
| SoA -> promote_sizedtype_mem st
| OpenCL -> promote_sizedtype_mem st
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenCL is "promoted" to SoA by this function?

| SRowVector (_, dim) -> SRowVector (mem_pattern, dim)
| SMatrix (_, dim1, dim2) -> SMatrix (mem_pattern, dim1, dim2)
| SArray (inner_type, dim) -> SArray (promote_mem mem_pattern inner_type, dim)
| _ -> st
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to consider things inside tuples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to allow OpenCL to use tuples then yes.

Also the Decls having a mem_pattern tag won't work because of tuples either :( I think we do just need to tag all sized types as having a memory pattern

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that’s too bad. You could do something like what we had to do for Autodiff level and have a tuple specific variant in the type, but I wasn’t super happy with that either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah eod I think it's going to look a little odd in some places but I think it's fine to just add it to the sized types

@@ -220,6 +220,15 @@ let modify_sizedtype_mem (mem_pattern : Mem_pattern.t) st =
match mem_pattern with
| AoS -> demote_sizedtype_mem st
| SoA -> promote_sizedtype_mem st
| OpenCL -> promote_sizedtype_mem st

let rec promote_mem (mem_pattern : Mem_pattern.t) st =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should be called replace_mem rather than promote

@@ -43,6 +43,13 @@ let apply ~default ~merge op (ind : 'a t) =
| Between (expr_top, expr_bottom) -> merge (op expr_top) (op expr_bottom)
| MultiIndex exprs -> op exprs

let map_expr ~f = function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already derive map for the type t, so this is equivalent to Index.map

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this needs deleted

@@ -1281,7 +1333,8 @@ let settings_const b =
; lazy_code_motion= b
; optimize_ad_levels= b
; preserve_stability= not b
; optimize_soa= b }
; optimize_soa= b
; optimize_opencl= false }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd want this to be enabled as part of all_optimizations so this should be b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk, personally this is a very specific optimization for a piece of hardware. I'd rather the user explicitly flips it on

@codecov
Copy link

codecov bot commented Sep 5, 2023

Codecov Report

Merging #1353 (51b3ba9) into master (743d0dd) will decrease coverage by 0.44%.
Report is 2 commits behind head on master.
The diff coverage is 87.50%.

@@            Coverage Diff             @@
##           master    #1353      +/-   ##
==========================================
- Coverage   89.39%   88.95%   -0.44%     
==========================================
  Files          65       65              
  Lines       10607    10814     +207     
==========================================
+ Hits         9482     9620     +138     
- Misses       1125     1194      +69     
Files Changed Coverage Δ
src/frontend/Pretty_printing.ml 91.08% <0.00%> (ø)
src/middle/Mem_pattern.ml 40.00% <16.66%> (-26.67%) ⬇️
src/middle/Stmt.ml 79.47% <66.66%> (ø)
src/analysis_and_optimization/Mir_utils.ml 77.38% <75.00%> (ø)
src/frontend/Ast_to_Mir.ml 94.19% <75.00%> (+0.01%) ⬆️
src/middle/SizedType.ml 79.77% <77.27%> (-4.75%) ⬇️
src/analysis_and_optimization/Memory_patterns.ml 84.17% <77.88%> (-6.41%) ⬇️
src/middle/Index.ml 82.35% <80.00%> (-0.41%) ⬇️
src/stan_math_backend/Transform_Mir.ml 95.16% <87.50%> (-0.58%) ⬇️
src/stan_math_backend/Cpp.ml 85.78% <91.66%> (-0.19%) ⬇️
... and 14 more

... and 1 file with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants