How to enable VkFFTBackend when building itk 5.3.0 on windows #64

zhusihan-python · 2023-04-06T07:41:14Z

hello, im trying to build itk 5.3.0 with ITK_USE_CUFFTW ON on windows, but get some problems with that.
is the VkFFTBackend a substitude for cufft, if yes, how to enable VkFFTBackend in itk? will the itk montage benefit from VkFFTBackend set ON

my environment: windows 10 cmake 3.26.1 cuda v12.1

dzenanz · 2023-04-06T12:33:51Z

Montage might benefit from doing FFTs on the GPU, but the GPU needs to be more powerful than the CPU and tiles need to be big enough to justify the time it takes to transfer them to the GPU. The only sure way to to know is to try it.

@Leengit or @tbirdso might be able to help with building, but you need to better describe the problem, e.g. provide the error message.

zhusihan-python · 2023-04-06T12:59:41Z

thanks for the reply, here is my cmake config

cmake configure output
cmake_configure.txt
build log
itk_build_log.txt
one of the error message:

683>ITKOptimizersv4-5.4.lib(ITKOptimizersv4-5.4.dll) : error LNK2005: "public: class itk::LBFGSOptimizerBaseHelperv4<class vnl_lbfgs> * __cdecl itk::LBFGSOptimizerBasev4<class vnl_lbfgs>::GetOptimizer(void)" (?GetOptimizer@?$LBFGSOptimizerBasev4@Vvnl_lbfgs@@@itk@@QEAAPEAV?$LBFGSOptimizerBaseHelperv4@Vvnl_lbfgs@@@2@XZ) 已经在 itkLBFGSOptimizerBasev4Python.obj 中定义
683>  正在创建库 E:/BUILD/master-cuda/itk/lib/Release/ITKOptimizersv4Python.lib 和对象 E:/BUILD/master-cuda/itk/lib/Release/ITKOptimizersv4Python.exp
683>E:\BUILD\master-cuda\itk\Wrapping\Generators\Python\itk\_ITKOptimizersv4Python.pyd : fatal error LNK1169: 找到一个或多个多重定义的符号

sor for the Chinese in the message, fyi:
已经在 itkLBFGSOptimizerBasev4Python.obj 中定义：already defined in itkLBFGSOptimizerBasev4Python.obj
找到一个或多个多重定义的符号：one or more multiply defined symbols found

tbirdso · 2023-04-06T15:32:24Z

Hi @zhusihan-python, a few thoughts:

im trying to build itk 5.3.0 with ITK_USE_CUFFTW ON on windows, but get some problems with that

Does ITK 5.3.0 compile successfully for you when ITK_USE_CUFFTW is turned off? Have you tested to verify that this is the flag that causes the compilation failure you noted above?

is the VkFFTBackend a substitude for cufft

ITKVkFFTBackend is a wrapper to use the VkFFT cross-platform library in ITK image processing pipelines. VkFFT itself sits on top of FFT implementations such as cuFFT, meaning ITKVkFFTBackend is not a substitute for cuFFT, but can provide an alternate path for compiling ITK with cuFFT support.

how to enable VkFFTBackend in itk?

To compile ITKVkFFTBackend as a remote module alongside ITK, set the Module_VkFFTBackend parameter to ON in your ITK-build cmake configuration.

zhusihan-python · 2023-04-07T02:03:20Z

hi tom @tbirdso , i already build ITK 5.3.0 and ITK 5.3.0 with ITK_USE_MKL ON successful on windows 10. but failed compile it with ITK_USE_CUFFTW ON or FFTWD FFTWF ITK_USE_FFTWF_DEFAULT ON or ITK_USE_CUFFTW FFTWD FFTWF ITK_USE_FFTWF_DEFAULT ON.

as i want to compare the performance of itk Montage CompleteMontage of the default fft、mkl fft、cuda fft backends.
the first two is ready, compiling ITK with cuFFT support is exactly what i want now. so i want to try ITKVkFFTBackend

by default the Module_VkFFTBackend didn't show in the configure, then i tried move this part from Modules/Remote/CmakeLists to the root CmakeLists

file(GLOB Modules/Remote/remotes "*.remote.cmake")
foreach(remote_module ${remotes})
  include(${remote_module})
endforeach()

or add an entry by hand

it seems both not working, so my problem now is how to set Module_VkFFTBackend ON in cmake gui

dzenanz · 2023-04-07T02:07:55Z

Try setting ITK_MINIMUM_COMPLIANCE_LEVEL to 1. That should expose more module ON/OFF settings in CMake GUI.

zhusihan-python · 2023-04-07T06:54:31Z

i found that cmake advanced mode will get the remote Module entries, then i enabled Module_VkFFTBackend and Module_Montage, removed ITK_WRAP_PYTHOH,
still get compile error:

272>PhaseCorrelationImageRegistration.obj : error LNK2001: unresolved external symbol "void __cdecl itk::FFTWFFTImageFilterInitFactoryRegister__Private(void)" (?FFTWFFTImageFilterInitFactoryRegister__Private@itk@@YAXXZ)
272>E:\BUILD\master-cuda\itk\Wrapping\Generators\Python\itk\PhaseCorrelationImageRegistration.exe : fatal error LNK1120: 1 unresolved externals
271>RefineMontage.obj : error LNK2001: unresolved external symbol "void __cdecl itk::FFTWFFTImageFilterInitFactoryRegister__Private(void)" (?FFTWFFTImageFilterInitFactoryRegister__Private@itk@@YAXXZ)
271>E:\BUILD\master-cuda\itk\Wrapping\Generators\Python\itk\RefineMontage.exe : fatal error LNK1120: 1 unresolved externals

fulllog: build_vkfft_nopy.txt

dzenanz · 2023-04-07T11:51:09Z

Is this InsightSoftwareConsortium/ITKMontage#214 reappearing?

Does the error persist if you do a clean build, in a new directory?

zhusihan-python · 2023-04-07T17:03:27Z

Is this InsightSoftwareConsortium/ITKMontage#214 reappearing?

Does the error persist if you do a clean build, in a new directory?

yes sir. I compiled several times in seperate dst dir from seperate source code. the difference is last time in the issue you referred I didn't set the cufft ON.

zhusihan-python · 2023-04-07T17:06:24Z

the error message seems similar FFT image filter but not exactly the same. maybe it's caused by the same reason

tbirdso · 2023-04-10T13:30:40Z

Hi @zhusihan-python , for your use case would it be reasonable to install and use prebuilt ITKVkFFTBackend OpenCL Python packages instead of building them yourself, similar to the workaround in InsightSoftwareConsortium/ITKMontage#214?

$ python -m pip install itk==5.3.0 itk-montage itk-vkfft

itk-vkfft packages available on PyPI will look for an OpenCL implementation on your machine by default. Your CUDA installation likely provides this OpenCL DLL. The ITK Python factory loading mechanism takes care of setting up the accelerated FFT filters at runtime, so you will be able to use accelerated FFTs with ITKMontage filters without any additional lines of Python code.

zhusihan-python · 2023-04-11T01:57:59Z

hi @tbirdso
the python version itk is kind of slow in stitching. 150 3072*2048 images reading and stitching take more than 55s whie c++CompleteMontage cost less than 12s.

i can use 5.2.1 cuda version now. just curious about the speed of 5.3.0 with vkfft backends. hope i can try it in following releases.

dzenanz · 2023-04-11T12:34:18Z

ITKPython has a big startup cost. Just loading all the relevant DLLs takes ~10 seconds. But most of the computation is done in C++ libraries. I wonder whether you have a relatively slow graphics card, which would make the Python+vkfft slower.

@tbirdso is there a way to disable vkFFT via environment variable or something similar? That way @zhusihan-python could compare C++ CPU vs Python CPU, and test my theory of a slow graphics card.

zhusihan-python · 2023-04-11T12:54:29Z

the python version itk what i test is 5.3.0 installed from pypi, i think the vkfft backend is not enabled by default, right.

dzenanz · 2023-04-11T12:59:07Z

Correct, it is not enabled by default. Could it be that Python version is single-threaded? If you look at CPU usage when you run, does Python version use all CPU cores?

zhusihan-python · 2023-04-11T13:08:53Z

i think it does use multiple CPU cores, 132 3000*4096 images reading、compute transfrom and Resampling cost 53.8s, writing output not included

dzenanz · 2023-04-11T13:13:54Z

Then I don't know why is Python variant that much slower.

tbirdso · 2023-04-11T13:45:44Z

Hi @zhusihan-python , there are a couple of things that might be happening here.

Echoing @dzenanz 's note, ITK uses lazy library loading and as a result the first ITK filter execution in a script can seem to take much longer to execute. To account for this we can force libraries to load before timing later executions. itk.auto_progress can be used to turn on and off verbose messages about lazy library loading to give better insight into what is happening. Try adding the following before the body of your script:

import itk
itk.auto_progress(2)
itk.TileMergeImageFilter # Forces underlying libraries to load
itk.auto_progress(0)
...
# your code here

As part of the step above ITK will load any modules that define FFT implementations. If you have installed itk-vkfft then you should see that VkFFTBackend loads by default when you run the code above. Likewise, the easiest way to disable VkFFT filters is by uninstalling the itk-vkfft module and re-running your script. There is also a way to disable VkFFT filters through the ITK object factory.

Would you please confirm that the GPU is in use when itk-vkfft is installed and itk.TileMergeImageFilter runs on your data? In the screenshot above it seems that only the CPU is used, it would be good to confirm that the GPU is in use when we expect it.

There is a possibility that the overhead of moving images to GPU outweighs the advantage of FFT computation for your data. In our benchmarking we focused primarily on the performance of convolution of large 3D kernels compared with large 3D images. While your data represents large 2D images, it might be the case that they are not large enough to benefit from VkFFT GPU acceleration.

Would you please comment on what CPU and GPU you are using to run the script?

zhusihan-python · 2023-04-11T14:22:05Z

My gpu in this pc is 1050ti. And i only installed itk without vkfft from pypi. I will try the autoprogress as you suggest tomorrow and upload results here.

zhusihan-python · 2023-04-12T02:22:53Z

run SimpleMontage with the itk-vkfft-0.2.0 backend get error

Loading ITKPyBase... done
Loading ITKCommon... done
Loading ITKStatistics... done
Loading ITKImageFilterBase... done
Loading ITKTransform... done
Loading ITKImageFrequency... done
Loading ITKIOImageBase... Loading ITKIOBMP... done
Loading ITKIOMRC... done
Loading ITKIOMeta... done
Loading ITKIONIFTI... done
Loading ITKIONRRD... done
Loading ITKIOPNG... done
Loading ITKIOStimulate... done
Loading ITKIOVTK... done
done
Loading ITKImageFunction... done
Loading ITKImageGrid... done
Loading ITKFFT... Loading ITKImageSources... done
Loading ITKMesh... done
Loading ITKSpatialObjects... done
Loading ITKImageCompose... done
Loading ITKImageStatistics... done
Loading ITKPath... done
Loading ITKImageIntensity... done
Loading ITKThresholding... done
Loading ITKConvolution... done
Loading ITKSmoothing... done
Loading ITKOptimizers... done
Loading ITKImageGradient... done
Loading ITKImageFeature... done
Loading ITKFiniteDifference... done
Loading ITKDisplacementField... done
Loading ITKRegistrationCommon... done
Loading VkFFTBackend... done
done
Loading Montage... done
Computing tile registration transforms
D:\a\im\src\itkVkCommon.cxx(407): clEnqueueWriteBuffer returned -4
D:\a\im\src\itkVkCommon.cxx(407): clEnqueueWriteBuffer returned -4
D:\a\im\src\itkVkCommon.cxx(407): clEnqueueWriteBuffer returned -4
D:\a\im\src\itkVkCommon.cxx(407): clEnqueueWriteBuffer returned -4
D:\a\im\src\itkVkCommon.cxx(407): clEnqueueWriteBuffer returned -4
D:\a\im\src\itkVkCommon.cxx(407): clEnqueueWriteBuffer returned -4
D:\a\im\src\itkVkCommon.cxx(407): clEnqueueWriteBuffer returned -4
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -6
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -6
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -6
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -6
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned D:\a\im\src\itkVkCommon.cxx(-6125): clCreateContext returned -6
-6

D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -6
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -6
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -6
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -6
-6
Traceback (most recent call last):
  File "E:\projects\ITKMontage\examples\SimpleMontage.py", line 71, in <module>
    montage.Update()
RuntimeError: D:\a\im\include\itkVkRealToHalfHermitianForwardFFTImageFilter.hxx:100:
ITK ERROR: VkFFT third-party library failed with error code 4045.

tbirdso · 2023-04-12T13:44:48Z

@zhusihan-python Thanks for the error printout. We can examine the error code definitions as defined in the OpenCL header cl.h and in the VkFFT library at vkFFT.h to translate what each failure means.

From VkFFT we see that VKFFT_ERROR_FAILED_TO_CREATE_CONTEXT = 4045, which agrees with the log messages showing that clCreateContext failed.

From cl.h we find CL_MEM_OBJECT_ALLOCATION_FAILURE = -4 and CL_OUT_OF_HOST_MEMORY = -6, which seems to suggest you may be running out of either PC (RAM) or GPU memory. In an earlier comment you said you are using 132 3000*4096 images, could you provide more details on what kind of images these are? I.e., are you using 8-bit grayscale images, floating-point RGB images, etc? And, does your Task Manager window show that you are running out of memory on your PC when you run your script?

@dzenanz, maybe you could comment on how ITKMontage filters schedule FFTs for tile inputs and whether there is room for optimization in that scheduling?

dzenanz · 2023-04-12T14:12:07Z

Here is the scheduling logic:
https://github.com/InsightSoftwareConsortium/ITKMontage/blob/d8c9a3fff6dfc765fc3eda8da48a40d72b621eec/include/itkTileMontage.hxx#L657-L699
I am trying to be conservative with memory.

zhusihan-python · 2023-04-13T02:58:54Z

@zhusihan-python Thanks for the error printout. We can examine the error code definitions as defined in the OpenCL header cl.h and in the VkFFT library at vkFFT.h to translate what each failure means.

From VkFFT we see that VKFFT_ERROR_FAILED_TO_CREATE_CONTEXT = 4045, which agrees with the log messages showing that clCreateContext failed.

From cl.h we find CL_MEM_OBJECT_ALLOCATION_FAILURE = -4 and CL_OUT_OF_HOST_MEMORY = -6, which seems to suggest you may be running out of either PC (RAM) or GPU memory. In an earlier comment you said you are using 132 3000*4096 images, could you provide more details on what kind of images these are? I.e., are you using 8-bit grayscale images, floating-point RGB images, etc? And, does your Task Manager window show that you are running out of memory on your PC when you run your script?

@dzenanz, maybe you could comment on how ITKMontage filters schedule FFTs for tile inputs and whether there is room for optimization in that scheduling?

the jpg image type in itk is <class 'itk.itkImagePython.itkImageRGBUC2'>
grayscale image type is <class 'itk.itkImagePython.itkImageF2'>
my pc memory total 48G is not out of memomy

the gpu memory total 4GB usage is over 3.6-3.7G

zhusihan-python · 2023-04-13T07:51:31Z

test the same jpg image type in another pc with 64G memory and 8G gpu memory, also gets erros, but clCreateContext returned -5 different from last time

Loading ITKPyBase... done
Loading ITKCommon... done
Loading ITKStatistics... done
Loading ITKImageFilterBase... done
Loading ITKTransform... done
Loading ITKImageFrequency... done
Loading ITKIOImageBase... Loading ITKIOBMP... done
Loading ITKIOBioRad... done
Loading ITKIOBruker... done
Loading ITKIOGDCM... done
Loading ITKIOIPL... done
Loading ITKIOGE... done
Loading ITKIOGIPL... done
Loading ITKIOHDF5... done
Loading ITKIOJPEG... done
Loading ITKIOJPEG2000... done
Loading ITKIOTIFF... done
Loading ITKIOLSM... done
Loading ITKIOMINC... done
Loading ITKIOMRC... done
Loading ITKIOMeta... done
Loading ITKIONIFTI... done
Loading ITKIONRRD... done
Loading ITKIOPNG... done
Loading ITKIOStimulate... done
Loading ITKIOVTK... done
done
Loading ITKImageFunction... done
Loading ITKImageGrid... done
Loading ITKFFT... Loading ITKImageSources... done
Loading ITKMesh... done
Loading ITKSpatialObjects... done
Loading ITKImageCompose... done
Loading ITKImageStatistics... done
Loading ITKPath... done
Loading ITKImageIntensity... done
Loading ITKThresholding... done
Loading ITKConvolution... done
Loading ITKSmoothing... done
Loading ITKOptimizers... done
Loading ITKImageGradient... done
Loading ITKImageFeature... done
Loading ITKFiniteDifference... done
Loading ITKDisplacementField... done
Loading ITKRegistrationCommon... done
Loading VkFFTBackend... done
done
Loading Montage... done
Computing tile registration transforms
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
Traceback (most recent call last):
  File "E:\projects\ITKMontageCuda\examples\SimpleMontage.py", line 69, in <module>
    montage.Update()
RuntimeError: D:\a\im\include\itkVkHalfHermitianToRealInverseFFTImageFilter.hxx:103:
ITK ERROR: VkFFT third-party library failed with error code 4045.

memory usage

gpu memory usage reach its limit almost

tbirdso · 2023-04-14T18:43:34Z

Hi @zhusihan-python , it looks like the GPU may be running out of dedicated memory, hence the failures. Unfortunately, unless @dzenanz has additional thoughts on the operations of ITKMontage I don't have intuition here on where optimizations may need to take place to allow your processing to proceed.

My advice at this point is to open an issue on the main ITK repository for visibility in regards to your ITKOptimizersv4 C++ compilation failure preventing you from using cuFFT for acceleration. I am unable to recreate that issue on Windows 11, but someone else there may be able to help you move forward.

dzenanz · 2023-04-14T20:49:23Z

First easy thing to try is reducing montage->SetNumberOfWorkUnits(); which is initialized here: https://github.com/InsightSoftwareConsortium/ITKMontage/blob/d8c9a3fff6dfc765fc3eda8da48a40d72b621eec/include/itkTileMontage.hxx#L55-L60.

zhusihan-python · 2023-04-18T04:00:34Z

@tbirdso thank you tom. actually i open a similiar issue last year right after 5.3.0 released. then i use 5.2.1 cufft itk instead. i build 5.3.0 with cufft ON on windows 11 too, got the same build errors as on window 10.

tbirdso mentioned this issue Apr 20, 2023

build itk with ITK_USE_CUFFTW ITK_USE_GPU ON gets error InsightSoftwareConsortium/ITK#3744

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to enable VkFFTBackend when building itk 5.3.0 on windows #64

How to enable VkFFTBackend when building itk 5.3.0 on windows #64

zhusihan-python commented Apr 6, 2023

dzenanz commented Apr 6, 2023

zhusihan-python commented Apr 6, 2023

tbirdso commented Apr 6, 2023

zhusihan-python commented Apr 7, 2023 •

edited

dzenanz commented Apr 7, 2023

zhusihan-python commented Apr 7, 2023

dzenanz commented Apr 7, 2023

zhusihan-python commented Apr 7, 2023

zhusihan-python commented Apr 7, 2023

tbirdso commented Apr 10, 2023

zhusihan-python commented Apr 11, 2023

dzenanz commented Apr 11, 2023

zhusihan-python commented Apr 11, 2023

dzenanz commented Apr 11, 2023

zhusihan-python commented Apr 11, 2023

dzenanz commented Apr 11, 2023

tbirdso commented Apr 11, 2023

zhusihan-python commented Apr 11, 2023

zhusihan-python commented Apr 12, 2023

tbirdso commented Apr 12, 2023

dzenanz commented Apr 12, 2023

zhusihan-python commented Apr 13, 2023

zhusihan-python commented Apr 13, 2023

tbirdso commented Apr 14, 2023

dzenanz commented Apr 14, 2023

zhusihan-python commented Apr 18, 2023

How to enable VkFFTBackend when building itk 5.3.0 on windows #64

How to enable VkFFTBackend when building itk 5.3.0 on windows #64

Comments

zhusihan-python commented Apr 6, 2023

dzenanz commented Apr 6, 2023

zhusihan-python commented Apr 6, 2023

tbirdso commented Apr 6, 2023

zhusihan-python commented Apr 7, 2023 • edited

dzenanz commented Apr 7, 2023

zhusihan-python commented Apr 7, 2023

dzenanz commented Apr 7, 2023

zhusihan-python commented Apr 7, 2023

zhusihan-python commented Apr 7, 2023

tbirdso commented Apr 10, 2023

zhusihan-python commented Apr 11, 2023

dzenanz commented Apr 11, 2023

zhusihan-python commented Apr 11, 2023

dzenanz commented Apr 11, 2023

zhusihan-python commented Apr 11, 2023

dzenanz commented Apr 11, 2023

tbirdso commented Apr 11, 2023

zhusihan-python commented Apr 11, 2023

zhusihan-python commented Apr 12, 2023

tbirdso commented Apr 12, 2023

dzenanz commented Apr 12, 2023

zhusihan-python commented Apr 13, 2023

zhusihan-python commented Apr 13, 2023

tbirdso commented Apr 14, 2023

dzenanz commented Apr 14, 2023

zhusihan-python commented Apr 18, 2023

zhusihan-python commented Apr 7, 2023 •

edited