GitHub - xcompact3d/backends_example: A simple example demonstrating a cpu and gpu backend using fortran OO constructs

Modern Fortran implementation of a template method pattern with two hardware backend specialisations (pure CPU and CPU/GPU backends).

Given an array of numbers $\mathbf{a} = [a_1, ... a_n]$, we want to compute

$$R = f(\mathbf{a}) + g(\mathbf{a})$$

with f(a) = sum(a + 1) and g(a) = max(a * 2).

Given an input array a, the algorithm is

Compute f(a)
Compute g(a)
Compute f(a) + g(a)

This algorithm only depends on the interface of f and g: their argument and what they return. Conversely, it does not depend on the actual implementation of f and g.

Compiling and running the programs

cd backends_example
FC=nvfortran cmake -S . -B build

The above builds three executables:

main_cpu: CPU-only version.
main_gpu: Version with f and g implemented as accelerated GPU kernels.
main_hybrid: Execution of GPU kernels is enabled/disabled at runtime.

From the build directory:

$ make main_hybrid # Build once, run everywhere.
./main_hybrid
Executing on CPU only
    184.000
$ ./main_hybrid --gpu
Executing CUDA kernels
    184.00

Implementation

The algorithm itself is defined once as a bound procedure doit to the abstract type basetype (base.f90). This type is abstract because, although doit is defined, f and g are not. This makes instanciating a object of type basetype impossible. The point is that we can now extend basetype with concrete types providing a definition for both functions.

The basetype abstract type is extended by cputype (cpu/cpu.f90) and gputype (gpu/gpu.f90). The former implements f and g using standard Fortran to be executed on a CPU. The latter, gputype, provides an implementation of f and g based on CUDA Fortran, using kernel procedures to be executed on NVIDIA GPUs.

The input array $\mathbf{a}$ is abstracted into a memblock (memblock.f90) type (memory block). The cpublock (cpu/cpublock.f90) type holds an allocatable real array, whilst the gpublock (gpu/gpublock) holds an allocatable real device array.

The current (main_hybrid.f90) implementation uses a pointer to the right type depending on the execution target:

  case('gpu')
     gpublk = gpublock(16); blk => gpublk

An preferable approach would be to rely on automatic (re)allocation of polymorphic entities

class(memblock), allocatable :: blk

case('gpu')
   blk = gpublock(16) ! Automatic allocation of the dynamic type

Unfortunately this is not supported by the NVIDIA Fortran compiler (nvfortran 22.5).

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
cpu		cpu
gpu		gpu
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
base.f90		base.f90
class.png		class.png
main_cpu.f90		main_cpu.f90
main_gpu.f90		main_gpu.f90
main_hybrid.f90		main_hybrid.f90
memblock.f90		memblock.f90

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu

cpu

gpu

gpu

CMakeLists.txt

CMakeLists.txt

LICENSE

LICENSE

README.md

README.md

base.f90

base.f90

class.png

class.png

main_cpu.f90

main_cpu.f90

main_gpu.f90

main_gpu.f90

main_hybrid.f90

main_hybrid.f90

memblock.f90

memblock.f90

Repository files navigation

Compiling and running the programs

Implementation

About

Releases

Packages

Languages

License

xcompact3d/backends_example

Folders and files

Latest commit

History

Repository files navigation

Compiling and running the programs

Implementation

About

Resources

License

Stars

Watchers

Forks

Languages