Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VkFFT error code handling #40

Open
tbirdso opened this issue Apr 7, 2022 · 3 comments
Open

VkFFT error code handling #40

tbirdso opened this issue Apr 7, 2022 · 3 comments

Comments

@tbirdso
Copy link
Contributor

tbirdso commented Apr 7, 2022

Exceptions thrown by ITK VkFFT classes are disruptive to pipeline flow and difficult to read. For example, from Python benchmarking on BIL workshop2 node:

Experiment   Image Len (px)     Volume (px)     CPU FFT Time (s)        GPU FFT Time(s) Speedup
         0                 10   1.00e+03                0.000204        0.320318        -156668.8%
         1                 30   2.70e+04                0.001416        0.159839        -11186.4%
         2                100   1.00e+06                0.042245        0.153889        -264.3%
         3                200   8.00e+06                0.450544        0.824002        -82.9%
         4                300   2.70e+07                1.304689        0.294664        77.4%
         5                600   2.16e+08                12.178864       1.481467        87.8%
         6                800   5.12e+08                44.749890       3.230387        92.8%
         7               1000   1.00e+09                59.368654       6.215513        89.5%
         8               1200   1.73e+09                123.027794      10.488541       91.5%
Traceback (most recent call last):
  File "/bil/users/tbirdso/vkfft/benchmark.py", line 42, in <module>
    vk_interval = run_fft(vk_type, image_size)
  File "/bil/users/tbirdso/vkfft/benchmark.py", line 28, in run_fft
    return benchmark_fft(fft_filter)
  File "/bil/users/tbirdso/vkfft/benchmark.py", line 20, in benchmark_fft
    itk_fft_filter.Update()
RuntimeError: ../../../include/itkVkForwardFFTImageFilter.hxx:100:
ITK ERROR: VkFFT third-party library failed with error code 4039.

Proposals:

  1. Investigate whether there is a better way to handle ITK filter failures. (The answer may be just using try/catch blocks in the C++ or Python calling script, but would be good to research).
  2. Translate VkFFT error codes into human-readable errors inline rather than require consulting the user manual. May involve contributions back to VkFFT.
@Leengit
Copy link
Collaborator

Leengit commented Apr 7, 2022

It may be different case by case, but I am thinking that generally we do not need to make working examples catch these exceptions nor make these messages more user friendly. Instead, I am thinking that the better approach is to treat them as run-time asserts and then debug the code so that these run-time asserts don't fail. In particular, if there is something that typically leads to one of these failures then we probably should check for those circumstances before asking VkFFT to handle it. For example, if VkFFT is complaining that a particular needed input is missing, we should probably be checking for that input in our code earlier and then do whatever ITK code typically should do when that kind of input is missing; and not call the failing VkFFT at all in this case.

@tbirdso
Copy link
Contributor Author

tbirdso commented Apr 7, 2022

@Leengit For context, I typically see these failures (buffer copy, kernel copy, fence, etc) in the context of trying to run GPU FFT on very large images, in this case on a 1400x1400x1400 image. If we were to introduce a check it would need to involve asking the GPU if it has sufficient buffer size for everything the FFT requires (input buffer, output buffer, kernel, other?).

Disadvantages of trying to implement an explicit GPU memory precheck:

  1. GPU availability may depend on other processes and could decrease after our check
  2. We can get nearly the same information by trying to run and failing out.

Advantages of a GPU memory precheck:

  1. Cleaner error handling
  2. Potentially introduce handling to split even larger images into sequential chunks that will fit on the GPU. For instance, if we know we can fit a 1000x1000x1000 image on GPU but our image is 2000x1000x1000, we could run FFT sequentially on two 1000x1000x1000 images. This would be much easier to implement for 1D FFT where we can mandate that the image is not split in the FFT direction; for ND FFT it would be more complicated and would likely need to involve N 1D FFT operations. I'm uncertain how this would impact speedup over CPU-based FFT implementations.

@Leengit
Copy link
Collaborator

Leengit commented Apr 7, 2022

Good, persuasive. Yes, in scenarios like "GPU availability may depend on other processes and could decrease after our check", it is sounding like letting VkFFT try is the way to go. We would try to parse VkFFT's error code in our ITK code and present it to the user in a manner typical of other ITK modules. Hopefully, VkFFT will rarely return a code that we haven't anticipated, but we should be prepared for that case anyway -- reporting what VkFFT has told us, and potentially contributing back to VkFFT to make those messages as clear as possible. In short, that's looking like your Proposal no. 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants