Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to enable VkFFTBackend when building itk 5.3.0 on windows #64

Open
zhusihan-python opened this issue Apr 6, 2023 · 26 comments
Open

Comments

@zhusihan-python
Copy link

hello, im trying to build itk 5.3.0 with ITK_USE_CUFFTW ON on windows, but get some problems with that.
is the VkFFTBackend a substitude for cufft, if yes, how to enable VkFFTBackend in itk? will the itk montage benefit from VkFFTBackend set ON

my environment: windows 10 cmake 3.26.1 cuda v12.1

@dzenanz
Copy link
Member

dzenanz commented Apr 6, 2023

Montage might benefit from doing FFTs on the GPU, but the GPU needs to be more powerful than the CPU and tiles need to be big enough to justify the time it takes to transfer them to the GPU. The only sure way to to know is to try it.

@Leengit or @tbirdso might be able to help with building, but you need to better describe the problem, e.g. provide the error message.

@zhusihan-python
Copy link
Author

thanks for the reply, here is my cmake config
image
cmake configure output
cmake_configure.txt
build log
itk_build_log.txt
one of the error message:

683>ITKOptimizersv4-5.4.lib(ITKOptimizersv4-5.4.dll) : error LNK2005: "public: class itk::LBFGSOptimizerBaseHelperv4<class vnl_lbfgs> * __cdecl itk::LBFGSOptimizerBasev4<class vnl_lbfgs>::GetOptimizer(void)" (?GetOptimizer@?$LBFGSOptimizerBasev4@Vvnl_lbfgs@@@itk@@QEAAPEAV?$LBFGSOptimizerBaseHelperv4@Vvnl_lbfgs@@@2@XZ) 已经在 itkLBFGSOptimizerBasev4Python.obj 中定义
683>  正在创建库 E:/BUILD/master-cuda/itk/lib/Release/ITKOptimizersv4Python.lib 和对象 E:/BUILD/master-cuda/itk/lib/Release/ITKOptimizersv4Python.exp
683>E:\BUILD\master-cuda\itk\Wrapping\Generators\Python\itk\_ITKOptimizersv4Python.pyd : fatal error LNK1169: 找到一个或多个多重定义的符号

sor for the Chinese in the message, fyi:
已经在 itkLBFGSOptimizerBasev4Python.obj 中定义:already defined in itkLBFGSOptimizerBasev4Python.obj
找到一个或多个多重定义的符号:one or more multiply defined symbols found

@tbirdso
Copy link
Contributor

tbirdso commented Apr 6, 2023

Hi @zhusihan-python, a few thoughts:

im trying to build itk 5.3.0 with ITK_USE_CUFFTW ON on windows, but get some problems with that

Does ITK 5.3.0 compile successfully for you when ITK_USE_CUFFTW is turned off? Have you tested to verify that this is the flag that causes the compilation failure you noted above?

is the VkFFTBackend a substitude for cufft

ITKVkFFTBackend is a wrapper to use the VkFFT cross-platform library in ITK image processing pipelines. VkFFT itself sits on top of FFT implementations such as cuFFT, meaning ITKVkFFTBackend is not a substitute for cuFFT, but can provide an alternate path for compiling ITK with cuFFT support.

how to enable VkFFTBackend in itk?

To compile ITKVkFFTBackend as a remote module alongside ITK, set the Module_VkFFTBackend parameter to ON in your ITK-build cmake configuration.

@zhusihan-python
Copy link
Author

zhusihan-python commented Apr 7, 2023

hi tom @tbirdso , i already build ITK 5.3.0 and ITK 5.3.0 with ITK_USE_MKL ON successful on windows 10. but failed compile it with ITK_USE_CUFFTW ON or FFTWD FFTWF ITK_USE_FFTWF_DEFAULT ON or ITK_USE_CUFFTW FFTWD FFTWF ITK_USE_FFTWF_DEFAULT ON.

as i want to compare the performance of itk Montage CompleteMontage of the default fft、mkl fft、cuda fft backends.
the first two is ready, compiling ITK with cuFFT support is exactly what i want now. so i want to try ITKVkFFTBackend

by default the Module_VkFFTBackend didn't show in the configure, then i tried move this part from Modules/Remote/CmakeLists to the root CmakeLists

file(GLOB Modules/Remote/remotes "*.remote.cmake")
foreach(remote_module ${remotes})
  include(${remote_module})
endforeach()

or add an entry by hand
image
it seems both not working, so my problem now is how to set Module_VkFFTBackend ON in cmake gui

@dzenanz
Copy link
Member

dzenanz commented Apr 7, 2023

Try setting ITK_MINIMUM_COMPLIANCE_LEVEL to 1. That should expose more module ON/OFF settings in CMake GUI.

@zhusihan-python
Copy link
Author

i found that cmake advanced mode will get the remote Module entries, then i enabled Module_VkFFTBackend and Module_Montage, removed ITK_WRAP_PYTHOH,
still get compile error:

272>PhaseCorrelationImageRegistration.obj : error LNK2001: unresolved external symbol "void __cdecl itk::FFTWFFTImageFilterInitFactoryRegister__Private(void)" (?FFTWFFTImageFilterInitFactoryRegister__Private@itk@@YAXXZ)
272>E:\BUILD\master-cuda\itk\Wrapping\Generators\Python\itk\PhaseCorrelationImageRegistration.exe : fatal error LNK1120: 1 unresolved externals
271>RefineMontage.obj : error LNK2001: unresolved external symbol "void __cdecl itk::FFTWFFTImageFilterInitFactoryRegister__Private(void)" (?FFTWFFTImageFilterInitFactoryRegister__Private@itk@@YAXXZ)
271>E:\BUILD\master-cuda\itk\Wrapping\Generators\Python\itk\RefineMontage.exe : fatal error LNK1120: 1 unresolved externals

fulllog: build_vkfft_nopy.txt

@dzenanz
Copy link
Member

dzenanz commented Apr 7, 2023

Is this InsightSoftwareConsortium/ITKMontage#214 reappearing?

Does the error persist if you do a clean build, in a new directory?

@zhusihan-python
Copy link
Author

Is this InsightSoftwareConsortium/ITKMontage#214 reappearing?

Does the error persist if you do a clean build, in a new directory?

yes sir. I compiled several times in seperate dst dir from seperate source code. the difference is last time in the issue you referred I didn't set the cufft ON.

@zhusihan-python
Copy link
Author

the error message seems similar FFT image filter but not exactly the same. maybe it's caused by the same reason

@tbirdso
Copy link
Contributor

tbirdso commented Apr 10, 2023

Hi @zhusihan-python , for your use case would it be reasonable to install and use prebuilt ITKVkFFTBackend OpenCL Python packages instead of building them yourself, similar to the workaround in InsightSoftwareConsortium/ITKMontage#214?

$ python -m pip install itk==5.3.0 itk-montage itk-vkfft

itk-vkfft packages available on PyPI will look for an OpenCL implementation on your machine by default. Your CUDA installation likely provides this OpenCL DLL. The ITK Python factory loading mechanism takes care of setting up the accelerated FFT filters at runtime, so you will be able to use accelerated FFTs with ITKMontage filters without any additional lines of Python code.

@zhusihan-python
Copy link
Author

hi @tbirdso
the python version itk is kind of slow in stitching. 150 3072*2048 images reading and stitching take more than 55s whie c++CompleteMontage cost less than 12s.

i can use 5.2.1 cuda version now. just curious about the speed of 5.3.0 with vkfft backends. hope i can try it in following releases.

@dzenanz
Copy link
Member

dzenanz commented Apr 11, 2023

ITKPython has a big startup cost. Just loading all the relevant DLLs takes ~10 seconds. But most of the computation is done in C++ libraries. I wonder whether you have a relatively slow graphics card, which would make the Python+vkfft slower.

@tbirdso is there a way to disable vkFFT via environment variable or something similar? That way @zhusihan-python could compare C++ CPU vs Python CPU, and test my theory of a slow graphics card.

@zhusihan-python
Copy link
Author

the python version itk what i test is 5.3.0 installed from pypi, i think the vkfft backend is not enabled by default, right.

@dzenanz
Copy link
Member

dzenanz commented Apr 11, 2023

Correct, it is not enabled by default. Could it be that Python version is single-threaded? If you look at CPU usage when you run, does Python version use all CPU cores?

@zhusihan-python
Copy link
Author

i think it does use multiple CPU cores, 132 3000*4096 images reading、compute transfrom and Resampling cost 53.8s, writing output not included
image

@dzenanz
Copy link
Member

dzenanz commented Apr 11, 2023

Then I don't know why is Python variant that much slower.

@tbirdso
Copy link
Contributor

tbirdso commented Apr 11, 2023

Hi @zhusihan-python , there are a couple of things that might be happening here.

Echoing @dzenanz 's note, ITK uses lazy library loading and as a result the first ITK filter execution in a script can seem to take much longer to execute. To account for this we can force libraries to load before timing later executions. itk.auto_progress can be used to turn on and off verbose messages about lazy library loading to give better insight into what is happening. Try adding the following before the body of your script:

import itk
itk.auto_progress(2)
itk.TileMergeImageFilter # Forces underlying libraries to load
itk.auto_progress(0)
...
# your code here

As part of the step above ITK will load any modules that define FFT implementations. If you have installed itk-vkfft then you should see that VkFFTBackend loads by default when you run the code above. Likewise, the easiest way to disable VkFFT filters is by uninstalling the itk-vkfft module and re-running your script. There is also a way to disable VkFFT filters through the ITK object factory.

Would you please confirm that the GPU is in use when itk-vkfft is installed and itk.TileMergeImageFilter runs on your data? In the screenshot above it seems that only the CPU is used, it would be good to confirm that the GPU is in use when we expect it.

There is a possibility that the overhead of moving images to GPU outweighs the advantage of FFT computation for your data. In our benchmarking we focused primarily on the performance of convolution of large 3D kernels compared with large 3D images. While your data represents large 2D images, it might be the case that they are not large enough to benefit from VkFFT GPU acceleration.

Would you please comment on what CPU and GPU you are using to run the script?

@zhusihan-python
Copy link
Author

My gpu in this pc is 1050ti. And i only installed itk without vkfft from pypi. I will try the autoprogress as you suggest tomorrow and upload results here.

@zhusihan-python
Copy link
Author

run SimpleMontage with the itk-vkfft-0.2.0 backend get error

Loading ITKPyBase... done
Loading ITKCommon... done
Loading ITKStatistics... done
Loading ITKImageFilterBase... done
Loading ITKTransform... done
Loading ITKImageFrequency... done
Loading ITKIOImageBase... Loading ITKIOBMP... done
Loading ITKIOMRC... done
Loading ITKIOMeta... done
Loading ITKIONIFTI... done
Loading ITKIONRRD... done
Loading ITKIOPNG... done
Loading ITKIOStimulate... done
Loading ITKIOVTK... done
done
Loading ITKImageFunction... done
Loading ITKImageGrid... done
Loading ITKFFT... Loading ITKImageSources... done
Loading ITKMesh... done
Loading ITKSpatialObjects... done
Loading ITKImageCompose... done
Loading ITKImageStatistics... done
Loading ITKPath... done
Loading ITKImageIntensity... done
Loading ITKThresholding... done
Loading ITKConvolution... done
Loading ITKSmoothing... done
Loading ITKOptimizers... done
Loading ITKImageGradient... done
Loading ITKImageFeature... done
Loading ITKFiniteDifference... done
Loading ITKDisplacementField... done
Loading ITKRegistrationCommon... done
Loading VkFFTBackend... done
done
Loading Montage... done
Computing tile registration transforms
D:\a\im\src\itkVkCommon.cxx(407): clEnqueueWriteBuffer returned -4
D:\a\im\src\itkVkCommon.cxx(407): clEnqueueWriteBuffer returned -4
D:\a\im\src\itkVkCommon.cxx(407): clEnqueueWriteBuffer returned -4
D:\a\im\src\itkVkCommon.cxx(407): clEnqueueWriteBuffer returned -4
D:\a\im\src\itkVkCommon.cxx(407): clEnqueueWriteBuffer returned -4
D:\a\im\src\itkVkCommon.cxx(407): clEnqueueWriteBuffer returned -4
D:\a\im\src\itkVkCommon.cxx(407): clEnqueueWriteBuffer returned -4
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -6
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -6
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -6
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -6
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned D:\a\im\src\itkVkCommon.cxx(-6125): clCreateContext returned -6
-6

D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -6
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -6
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -6
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -6
-6
Traceback (most recent call last):
  File "E:\projects\ITKMontage\examples\SimpleMontage.py", line 71, in <module>
    montage.Update()
RuntimeError: D:\a\im\include\itkVkRealToHalfHermitianForwardFFTImageFilter.hxx:100:
ITK ERROR: VkFFT third-party library failed with error code 4045.

@tbirdso
Copy link
Contributor

tbirdso commented Apr 12, 2023

@zhusihan-python Thanks for the error printout. We can examine the error code definitions as defined in the OpenCL header cl.h and in the VkFFT library at vkFFT.h to translate what each failure means.

From VkFFT we see that VKFFT_ERROR_FAILED_TO_CREATE_CONTEXT = 4045, which agrees with the log messages showing that clCreateContext failed.

From cl.h we find CL_MEM_OBJECT_ALLOCATION_FAILURE = -4 and CL_OUT_OF_HOST_MEMORY = -6, which seems to suggest you may be running out of either PC (RAM) or GPU memory. In an earlier comment you said you are using 132 3000*4096 images, could you provide more details on what kind of images these are? I.e., are you using 8-bit grayscale images, floating-point RGB images, etc? And, does your Task Manager window show that you are running out of memory on your PC when you run your script?

@dzenanz, maybe you could comment on how ITKMontage filters schedule FFTs for tile inputs and whether there is room for optimization in that scheduling?

@dzenanz
Copy link
Member

dzenanz commented Apr 12, 2023

Here is the scheduling logic:
https://github.com/InsightSoftwareConsortium/ITKMontage/blob/d8c9a3fff6dfc765fc3eda8da48a40d72b621eec/include/itkTileMontage.hxx#L657-L699
I am trying to be conservative with memory.

@zhusihan-python
Copy link
Author

@zhusihan-python Thanks for the error printout. We can examine the error code definitions as defined in the OpenCL header cl.h and in the VkFFT library at vkFFT.h to translate what each failure means.

From VkFFT we see that VKFFT_ERROR_FAILED_TO_CREATE_CONTEXT = 4045, which agrees with the log messages showing that clCreateContext failed.

From cl.h we find CL_MEM_OBJECT_ALLOCATION_FAILURE = -4 and CL_OUT_OF_HOST_MEMORY = -6, which seems to suggest you may be running out of either PC (RAM) or GPU memory. In an earlier comment you said you are using 132 3000*4096 images, could you provide more details on what kind of images these are? I.e., are you using 8-bit grayscale images, floating-point RGB images, etc? And, does your Task Manager window show that you are running out of memory on your PC when you run your script?

@dzenanz, maybe you could comment on how ITKMontage filters schedule FFTs for tile inputs and whether there is room for optimization in that scheduling?

the jpg image type in itk is <class 'itk.itkImagePython.itkImageRGBUC2'>
grayscale image type is <class 'itk.itkImagePython.itkImageF2'>
my pc memory total 48G is not out of memomy
1681354547696
the gpu memory total 4GB usage is over 3.6-3.7G
image

@zhusihan-python
Copy link
Author

test the same jpg image type in another pc with 64G memory and 8G gpu memory, also gets erros, but clCreateContext returned -5 different from last time

Loading ITKPyBase... done
Loading ITKCommon... done
Loading ITKStatistics... done
Loading ITKImageFilterBase... done
Loading ITKTransform... done
Loading ITKImageFrequency... done
Loading ITKIOImageBase... Loading ITKIOBMP... done
Loading ITKIOBioRad... done
Loading ITKIOBruker... done
Loading ITKIOGDCM... done
Loading ITKIOIPL... done
Loading ITKIOGE... done
Loading ITKIOGIPL... done
Loading ITKIOHDF5... done
Loading ITKIOJPEG... done
Loading ITKIOJPEG2000... done
Loading ITKIOTIFF... done
Loading ITKIOLSM... done
Loading ITKIOMINC... done
Loading ITKIOMRC... done
Loading ITKIOMeta... done
Loading ITKIONIFTI... done
Loading ITKIONRRD... done
Loading ITKIOPNG... done
Loading ITKIOStimulate... done
Loading ITKIOVTK... done
done
Loading ITKImageFunction... done
Loading ITKImageGrid... done
Loading ITKFFT... Loading ITKImageSources... done
Loading ITKMesh... done
Loading ITKSpatialObjects... done
Loading ITKImageCompose... done
Loading ITKImageStatistics... done
Loading ITKPath... done
Loading ITKImageIntensity... done
Loading ITKThresholding... done
Loading ITKConvolution... done
Loading ITKSmoothing... done
Loading ITKOptimizers... done
Loading ITKImageGradient... done
Loading ITKImageFeature... done
Loading ITKFiniteDifference... done
Loading ITKDisplacementField... done
Loading ITKRegistrationCommon... done
Loading VkFFTBackend... done
done
Loading Montage... done
Computing tile registration transforms
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
D:\a\im\src\itkVkCommon.cxx(125): clCreateContext returned -5
Traceback (most recent call last):
  File "E:\projects\ITKMontageCuda\examples\SimpleMontage.py", line 69, in <module>
    montage.Update()
RuntimeError: D:\a\im\include\itkVkHalfHermitianToRealInverseFFTImageFilter.hxx:103:
ITK ERROR: VkFFT third-party library failed with error code 4045.

memory usage
611197f5f48fe9704ce7398795355f9
gpu memory usage reach its limit almost
8acf008a1d1511d48bab519e3d0a269

@tbirdso
Copy link
Contributor

tbirdso commented Apr 14, 2023

Hi @zhusihan-python , it looks like the GPU may be running out of dedicated memory, hence the failures. Unfortunately, unless @dzenanz has additional thoughts on the operations of ITKMontage I don't have intuition here on where optimizations may need to take place to allow your processing to proceed.

My advice at this point is to open an issue on the main ITK repository for visibility in regards to your ITKOptimizersv4 C++ compilation failure preventing you from using cuFFT for acceleration. I am unable to recreate that issue on Windows 11, but someone else there may be able to help you move forward.

@dzenanz
Copy link
Member

dzenanz commented Apr 14, 2023

First easy thing to try is reducing montage->SetNumberOfWorkUnits(); which is initialized here: https://github.com/InsightSoftwareConsortium/ITKMontage/blob/d8c9a3fff6dfc765fc3eda8da48a40d72b621eec/include/itkTileMontage.hxx#L55-L60.

@zhusihan-python
Copy link
Author

@tbirdso thank you tom. actually i open a similiar issue last year right after 5.3.0 released. then i use 5.2.1 cufft itk instead. i build 5.3.0 with cufft ON on windows 11 too, got the same build errors as on window 10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants