CITK - an architecture and examples of CUDA enabled ITK filters

Beare, Richard1*,Micevski, Daniel,Share, Chris,Parkinson, Luke,Ward, Phil,Goscinski, Wojtek,Kuiper, Mike
1.Monash University
Abstract

Abstract

There is great interest in the use of graphics processing units (GPU)for general purpose applications because the highly parallel architectures used in GPUs offer the potential for huge performance increases. The use of GPUs in image analysis applications has been under investigation for a number of years. This article describes modifications to the InsightToolkit (ITK) that provide a simple architecture for transparent use of GPU enabled filters and examples of how to write GPU enabled filters using the NVIDIA CUDA tools. This work was performed between late 2009 and early 2010 and is being published as modifications to ITK 3.20. It is hoped that publication will help inform development of more general GPU support in ITK 4.0 and facilitate experimentation by users requiring functionality of 3.20 or wishing to pursue CUDA based developments.

Keywords

CUDAGPU
Manuscript
Source Code and Data

Source Code and Data

cuda-insight-toolkitExamplesCMakecudaFindCUDAmake2cmake.cmake3 KBparse_cubin.cmake3.7 KBrun_nvcc.cmake10 KBFindCUDA.cmake52.4 KBCMakeLists.txt1.5 KBsrcAbsImageFilterCudaAbsImageFilter.h2.8 KBCudaAbsImageFilter.txx1.7 KBCudaAbsImageFilterKernel.cu2.8 KBCudaAbsImageFilterKernel.h734 BAddConstantToImageFilterCudaAddConstantToImageFilter.h3.4 KBCudaAddConstantToImageFilter.txx2 KBCudaAddConstantToImageFilterKernel.cu3 KBCudaAddConstantToImageFilterKernel.h991 BAddImageFilterCudaAddImageFilter.h4.2 KBCudaAddImageFilter.txx2.5 KBCudaAddImageFilterKernel.h750 BCudaAddImageFilterKernel.cu3.2 KBBackUpCudaFunctionsCudaNeighborhoodFunctions.cu6.2 KBCudaTextureFunctions.cu2.6 KBBackUpNeighborhoodFilterCudaNeighborhoodFilter.h2.5 KBCudaNeighborhoodFilter.txx3.1 KBCudaNeighborhoodFilterKernel.cu9.1 KBCudaNeighborhoodFilterKernel.h850 BBinaryThresholdImageFilterCudaBinaryThresholdImageFilter.h4.5 KBCudaBinaryThresholdImageFilter.txx2.3 KBCudaBinaryThresholdImageFilterKernel.cu3.3 KBCudaBinaryThresholdImageFilterKernel.h860 BCMakeFilesCMakeCCompiler.cmake1014 BCMakeCXXCompiler.cmake1.1 KBCMakeDetermineCompilerABI_C.bin7.1 KBCMakeDetermineCompilerABI_CXX.bin7.4 KBCMakeOutput.log4.7 KBCMakeSystem.cmake406 BCompilerIdCCMakeCCompilerId.c4.6 KBa.out6.6 KBCompilerIdCXXCMakeCXXCompilerId.cpp4.5 KBa.out6.9 KBcmake.check_cache85 BCMakeCache.txt10.4 KBCMakeLists.txt23.8 KBCommonCudaImageToImageFilter.txx2.8 KBCudaImageToImageFilter.h3.7 KBCudaInPlaceImageFilter.h6.1 KBCudaInPlaceImageFilter.txx4.1 KBCudaFunctionsCudaNeighborhoodFunctions.cu6.2 KBCudaTextureFunctions.cu2.6 KBDivideByConstantImageFilterCudaDivideByConstantImageFilter.h3.4 KBCudaDivideByConstantImageFilter.txx2.1 KBCudaDivideByConstantImageFilterKernel.cu2.5 KBCudaDivideByConstantImageFilterKernel.h1002 BCudaTest.h8.5 KBDivideImageFilterCudaDivideImageFilter.h3.7 KBCudaDivideImageFilter.txx2.4 KBCudaDivideImageFilterKernel.h764 BCudaDivideImageFilterKernel.cu2.7 KBFilterTemplateCudaFilter.h2.2 KBCudaFilter.txx2.8 KBCudaFilterKernel.cu1.8 KBCudaFilterKernel.h874 BEclipseCompat.h896 BGrayscaleDilateImageFilterCudaGrayscaleDilateImageFilter.h4.9 KBCudaGrayscaleDilateImageFilter.txx2.6 KBCudaGrayscaleDilateImageFilterKernel.cu11.6 KBCudaGrayscaleDilateImageFilterKernel.h961 BGrayscaleErodeImageFilterCudaGrayscaleErodeImageFilter.h4.8 KBCudaGrayscaleErodeImageFilterKernel.cu11.8 KBCudaGrayscaleErodeImageFilter.txx2.6 KBCudaGrayscaleErodeImageFilterKernel.h955 BGrayscaleMorphologicalClosingImageFilterCudaGrayscaleMorphologicalClosingImageFilter.h4 KBCudaGrayscaleMorphologicalClosingImageFilter.txx2.7 KBGrayscaleMorphologicalOpeningImageFilterCudaGrayscaleMorphologicalOpeningImageFilter.h5 KBCudaGrayscaleMorphologicalOpeningImageFilter.txx2.6 KBImageCompare.cxx8.2 KBLukeITK_513x385.mhd304 BMaximumImageFilterCudaMaximumImageFilter.h3.5 KBCudaMaximumImageFilter.txx2.2 KBCudaMaximumImageFilterKernel.cu2.7 KBCudaMaximumImageFilterKernel.h758 BMinimumImageFilterCudaMinimumImageFilter.txx2.3 KBCudaMinimumImageFilter.h3.6 KBCudaMinimumImageFilterKernel.cu2.8 KBCudaMinimumImageFilterKernel.h989 BMeanImageFilterCudaMeanImageFilter.h3.1 KBCudaMeanImageFilter.txx2.5 KBCudaMeanImageFilterKernel.cu9.4 KBCudaMeanImageFilterKernel.h916 BMultiplyByConstantImageFilterCudaMultiplyByConstantImageFilter.h3.5 KBCudaMultiplyByConstantImageFilter.txx2 KBCudaMultiplyByConstantImageFilterKernel.cu2.5 KBCudaMultiplyByConstantImageFilterKernel.h767 BNeighborhoodFilterCudaNeighborhoodFilter.txx3.1 KBCudaNeighborhoodFilter.h2.5 KBCudaNeighborhoodFilterKernel.cu8.9 KBCudaNeighborhoodFilterKernel.h850 BMultiplyImageFilterCudaMultiplyImageFilter.h3.4 KBCudaMultiplyImageFilter.txx2.3 KBCudaMultiplyImageFilterKernel.cu2.7 KBCudaMultiplyImageFilterKernel.h761 BStatisticsImageFilterCudaStatisticsImageFilter.h5 KBCudaStatisticsImageFilter.txx7.1 KBCudaStatisticsImageFilterKernel.cu2.9 KBCudaStatisticsImageFilterKernel.h793 BRescaleIntensityImageFilterCudaRescaleIntensityImageFilter.h4.4 KBCudaRescaleIntensityImageFilter.txx2.1 KBCudaRescaleIntensityImageFilterKernel.cu4.4 KBCudaRescaleIntensityImageFilterKernel.h808 BSubtractConstantFromImageFilterCudaSubtractConstantFromImageFilter.h3.4 KBCudaSubtractConstantFromImageFilter.txx2 KBCudaSubtractConstantFromImageFilterKernel.cu2.6 KBCudaSubtractConstantFromImageFilterKernel.h1011 BSubtractImageFilterCudaSubtractImageFilter.h3.5 KBCudaSubtractImageFilter.txx2.6 KBCudaSubtractImageFilterKernel.cu2.5 KBCudaSubtractImageFilterKernel.h762 Baffine.mhd339 Bexternal_dependency.h1013 Bimagescthead1.png192.6 KBcthead2.png14.6 KBexternal_dependency3.h671 Binput.mhd300 Binput3D.mhd333 Bitk-cpu-abs.cxx1.5 KBitk-cpu-add.cxx1.5 KBitk-cpu-addc.cxx1.8 KBitk-cpu-binarythreshold.cxx1.8 KBitk-cpu-close.cxx2.1 KBitk-cpu-dilate.cxx2 KBitk-cpu-divide.cxx1.5 KBitk-cpu-dividec.cxx1.8 KBitk-cpu-maximum.cxx1.5 KBitk-cpu-erode.cxx2 KBitk-cpu-minimum.cxx1.5 KBitk-cpu-mean.cxx1.8 KBitk-cpu-multiply.cxx1.5 KBitk-cpu-multiplyc.cxx1.8 KBitk-cpu-rescaleintensity.cxx1.8 KBitk-cpu-open.cxx2.1 KBitk-cpu-statistics.cxx1.8 KBitk-cpu-subtract.cxx1.5 KBitk-cpu-subtractc.cxx1.8 KBitk-cuda-driver.cxx4 KBitk-gpu-abs.cxx1.5 KBitk-gpu-add.cxx1.5 KBitk-gpu-addc.cxx1.8 KBitk-gpu-binarythreshold.cxx1.8 KBitk-gpu-close.cxx2.1 KBitk-gpu-dilate.cxx2.1 KBitk-gpu-divide.cxx1.5 KBitk-gpu-dividec.cxx1.8 KBitk-gpu-erode.cxx2 KBitk-gpu-maximum.cxx1.5 KBitk-gpu-mean.cxx1.8 KBitk-gpu-minimum.cxx1.5 KBitk-gpu-multiply.cu3.1 KBitk-gpu-multiply.cxx1.5 KBitk-gpu-multiplyc.cxx1.8 KBitk-gpu-open.cxx2.1 KBitk-gpu-rescaleintensity.cxx1.8 KBitk-gpu-statistics.cxx2.3 KBitk-gpu-subtract.cxx1.5 KBitk-gpu-subtractc.cxx1.8 KBmain.cc198 Bmain_for_lib.cc336 Bmean_perf_test.cxx2.8 KBoutput.mhd282 BoutputCPU.mhd285 BoutputCPU3D.mhd337 BoutputGPU.mhd285 BoutputGPU3D.mhd337 Bsimple_perf_test.cxx2.9 KBtest_bin.cu2.1 KBtest_lib.cu2.4 KBthrustCHANGELOG13.3 KBadjacent_difference.h4.4 KBadvance.h1.6 KBbinary_search.h37.3 KBcopy.h12.7 KBcount.h4.3 KBdetailadjacent_difference.inl1.8 KBadvance.inl1.1 KBbinary_search.inl7.7 KBconfigcompiler.h2.3 KBconfig.h972 Bdebug.h908 Bdevice_backend.h1.6 KBhost_device.h1.1 KBsimple_defines.h833 Bcasts.h1.9 KBconfig.h740 Bcontiguous_storage.h2.6 KBcontiguous_storage.inl4.3 KBcopy.inl3.7 KBcount.inl2.2 KBcstdint.h1.5 KBdestroy.h1.3 KBdeviceadjacent_difference.h1.3 KBbinary_search.h3.5 KBcudaarch.h3.8 KBarch.inl10.3 KBblockcopy.h3.1 KBinclusive_scan.h2.8 KBmerging_sort.h7.6 KBodd_even_sort.h3.8 KBreduce.h4.3 KBcopy.h1.2 KBcopy_cross_space.h8.6 KBcopy_device_to_device.h4 KBcopy_if.h1.2 KBcopy_if.inl8.6 KBdetailb40ckernel_utils.h8.2 KBradixsort_api.h27.7 KBradixsort_kernel_common.h7.2 KBradixsort_key_conversion.h9.7 KBradixsort_reduction_kernel.h18.4 KBradixsort_scanscatter_kernel.h43.3 KBradixsort_spine_kernel.h5 KBvector_types.h2.5 KBfast_scan.h1.7 KBfast_scan.inl18.8 KBsafe_scan.h1.7 KBsafe_scan.inl18.6 KBstable_merge_sort.h1.6 KBstable_merge_sort.inl50.1 KBstable_radix_sort.h1.6 KBstable_radix_sort.inl48.7 KBstable_radix_sort_bits.h4.7 KBstable_radix_sort_key.inl10 KBstable_radix_sort_key_value.inl13.9 KBstable_radix_sort_merrill.inl8 KBstable_radix_sort_util.h3.2 KBtrivial_copy.h2.8 KBdispatchcopy.h3.3 KBreduce.h3.1 KBreduce.inl3.5 KBscan.h3.3 KBextern_shared_ptr.h1.4 KBfill.h1.1 KBfill.inl7.5 KBfor_each.h1.2 KBfor_each.inl3.3 KBfree.h1.1 KBfree.inl1.4 KBlaunch_closure.h973 Blaunch_closure.inl3.5 KBmalloc.h1.1 KBmalloc.inl1.4 KBno_throw_free.h1 KBno_throw_free.inl1.3 KBpartition.h1.9 KBreduce.inl16.5 KBreduce.h4 KBscalarbinary_search.h3.9 KBrotate.h2.2 KBscan.h1.6 KBscan.inl4.3 KBsegmented_scan.h2.1 KBsegmented_scan.inl27.2 KBset_operations.h1.4 KBsort.h1.4 KBset_operations.inl19.7 KBsort.inl16.8 KBsynchronize.h981 Bsynchronize.inl1.5 KBtrivial_copy.h1.1 KBwarpany.h1.6 KBscan.h1.6 KBtrivial_copy.inl3.8 KBcopy.h1.9 KBdispatchfill.h1.6 KBcopy.h4.8 KBfor_each.h2.2 KBfree.h1.4 KBmalloc.h1.6 KBno_throw_free.h1.5 KBreduce.h5 KBscan.h3.4 KBset_operations.h2.6 KBsort.h2.9 KBuninitialized_copy.h3 KBdereference.h7.1 KBextrema.h1.9 KBfill.h1.2 KBfill.inl1.5 KBfind.h1.1 KBfor_each.h1.3 KBfor_each.inl1.6 KBgenericadjacent_difference.h1.2 KBadjacent_difference.inl3.4 KBbinary_search.h2.9 KBbinary_search.inl10.4 KBcopy.h1.5 KBcopy_if.h1.2 KBcopy_if.inl5.1 KBextrema.h1.7 KBextrema.inl8.9 KBfill.h1.2 KBfind.h1.1 KBfind.inl3.9 KBfree.h1.1 KBmalloc.h1.2 KBno_throw_free.h1.2 KBpartition.h2.1 KBreduce.h1.1 KBpartition.inl3.5 KBreduce.inl2.7 KBreduce_by_key.h1.5 KBreduce_by_key.inl4.7 KBremove.h2.1 KBremove.inl3.2 KBscan_by_key.h2.1 KBscan_by_key.inl5.8 KBset_operations.h1.5 KBunique.h2.3 KBunique.inl5.4 KBinternal_allocator.h2.9 KBno_throw_free.h1.1 KBompcopy.h1.6 KBcopy_device_to_device.h1.1 KBcopy_device_to_device.inl1.8 KBcopy_device_to_host_or_any.h1.1 KBcopy_device_to_host_or_any.inl2.5 KBcopy_host_or_any_to_device.h1.1 KBcopy_host_or_any_to_device.inl2.5 KBdetailstable_merge_sort.h1.5 KBstable_merge_sort.inl3.8 KBdispatchcopy.h3.4 KBsort.h2 KBfor_each.h1.2 KBreduce.h1.8 KBfor_each.inl2.6 KBreduce.inl3.7 KBscan.h1.6 KBsort.h1.4 KBscan.inl2.7 KBsort.inl2.3 KBpartition.h2.9 KBreduce.h2.6 KBreduce.inl3.3 KBremove.h2.4 KBscan.h2.7 KBscan.inl3.5 KBset_operations.h1.6 KBsort.h1.9 KBsort.inl3.6 KBuninitialized_copy.h1.5 KBunique.h2.7 KBdevice_delete.inl1 KBdevice_free.inl1 KBdevice_malloc.inl1.2 KBdevice_new.inl1.6 KBdevice_ptr.inl4 KBdevice_ptr_traits.h2.1 KBdevice_reference.inl7.7 KBdevice_vector.inl1018 Bdiagnostic.h6.9 KBdispatchadjacent_difference.h2 KBadvance.h1.6 KBcopy.h5.6 KBbinary_search.h7.3 KBdestroy.h1.6 KBdistance.h1.5 KBfill.h2.1 KBextrema.h3.4 KBfind.h2 KBfor_each.h3.1 KBis_trivial_copy.h1.5 KBpartition.h5.5 KBreduce.h4 KBremove.h5.2 KBscan.h6.8 KBset_operations.h2.4 KBsort.h4.3 KBuninitialized_copy.h1.8 KBuninitialized_fill.h2.9 KBunique.h5.4 KBdistance.inl1.1 KBequal.inl1.5 KBextrema.inl3.6 KBfill.inl1.5 KBfind.inl1.7 KBfor_each.inl1.8 KBfunctional.inl4.1 KBgather.inl3.1 KBgenerate.inl1.4 KBhostadjacent_difference.h1.5 KBbinary_search.h4.6 KBcopy.h1.8 KBdetailcopy_backward.h1.2 KBgeneral_copy.h1.1 KBinsertion_sort.h3.5 KBmerge.h2.5 KBmerge.inl5 KBstable_merge_sort.h1.4 KBstable_merge_sort.inl2.6 KBtrivial_copy.h1 KBdispatchcopy.h1.8 KBextrema.h2.1 KBfill.h1.3 KBfind.h1.2 KBfor_each.h1.2 KBfor_each.inl1.5 KBpartition.h3.7 KBreduce.h3 KBremove.h3.4 KBscan.h5.1 KBset_operations.h1.3 KBsort.h1.9 KBsort.inl3.6 KBunique.h4 KBhost_vector.inl1008 Binner_product.inl2.3 KBinteger_traits.h2.9 KBinternal_functional.h11.4 KBlogical.inl1.3 KBmismatch.inl2.4 KBmove.h1.6 KBmplmath.h2.9 KBnumeric_traits.h3 KBpair.inl4.3 KBpartition.inl3.7 KBraw_buffer.h5.1 KBraw_buffer.inl1.2 KBreduce.inl4.6 KBremove.inl3.5 KBreplace.inl5 KBreverse.inl1.8 KBscan.inl8.4 KBscatter.inl2.5 KBsequence.inl2.3 KBset_operations.inl2.3 KBsort.inl6.2 KBstatic_assert.h1.9 KBswap.inl894 Bswap_ranges.inl2.1 KBtransform.inl8.5 KBtransform_reduce.inl1.4 KBtransform_scan.inl2.5 KBtrivial_sequence.h2.6 KBtuple.inl28 KBtuple_meta_transform.h7.3 KBtype_traitshas_trivial_assign.h1.3 KBtuple_transform.h13.4 KBtype_traits.h14.9 KBuninitialized_copy.inl1.5 KBunique.inl4.9 KButilalign.h1.2 KBblocking.h1.5 KBuninitialized_fill.inl1.9 KBvector_base.inl34.1 KBvector_base.h19.3 KBdevice_allocator.h2.5 KBdevice_delete.h1.4 KBdevice_free.h1.7 KBdevice_malloc.h2.7 KBdevice_malloc_allocator.h3.6 KBdevice_new.h3.7 KBdevice_new_allocator.h3.5 KBdevice_ptr.h12 KBdevice_reference.h27.5 KBdevice_vector.h4.7 KBdistance.h1.6 KBexperimentalcudadetailogl_interop_allocator.inl8.6 KBogl_interop_allocator.h3.5 KBpinned_allocator.h6.1 KBequal.h4.3 KBextrema.h11.4 KBfill.h3.7 KBfind.h5.5 KBfor_each.h2.1 KBfunctional.h35.7 KBgather.h7.7 KBgenerate.h3.9 KBhost_vector.h4.9 KBinner_product.h5.5 KBis_sorted.h1.5 KBiteratorconstant_iterator.h7.5 KBcounting_iterator.h6.8 KBdetailany_space_tag.h1.2 KBbackend_iterator_categories.h2.5 KBbackend_iterator_spaces.h1.2 KBconstant_iterator.inl2 KBconstant_iterator_base.h1.8 KBcounting_iterator.inl4.9 KBdevice_iterator_category_to_backend_space.h1.6 KBdistance_from_result.h1 KBforced_iterator.h2.9 KBis_iterator_category.h1.6 KBiterator_adaptor.inl2.4 KBiterator_category_to_space.h2.6 KBiterator_category_to_traversal.h5.3 KBiterator_facade.inl13.2 KBiterator_traits.inl4.3 KBminimum_category.h2.6 KBminimum_space.h2.7 KBnormal_iterator.h3.4 KBpermutation_iterator.inl2.4 KBplacementis_placed.h1.4 KBpermutation_iterator_base.h1.6 KBreverse_iterator.inl4.2 KBreverse_iterator_base.h1.4 KBtransform_iterator.inl4.4 KBuniversal_categories.h3.2 KBzip_iterator.inl5 KBzip_iterator_base.h12.1 KBiterator_adaptor.h4.2 KBiterator_categories.h7.9 KBiterator_facade.h13.9 KBiterator_traits.h2 KBpermutation_iterator.h6.6 KBreverse_iterator.h6.8 KBtransform_iterator.h9.8 KBzip_iterator.h7.1 KBlogical.h4.8 KBmismatch.h5.4 KBpair.h8.6 KBpartition.h15.8 KBrandomdetaildiscard_block_engine.inl4.6 KBlinear_congruential_engine.inl4.6 KBlinear_congruential_engine_discard.h3 KBlinear_feedback_shift_engine.inl4.6 KBlinear_feedback_shift_engine_wordmask.h1.1 KBmod.h1.6 KBnormal_distribution.inl6.6 KBrandom_core_access.h1.3 KBsubtract_with_carry_engine.inl5.7 KBuniform_int_distribution.inl6.2 KBuniform_real_distribution.inl6.2 KBxor_combine_engine.inl5.8 KBxor_combine_engine_max.h7.8 KBdiscard_block_engine.h8.2 KBlinear_congruential_engine.h9.4 KBlinear_feedback_shift_engine.h7.3 KBnormal_distribution.h9 KBsubtract_with_carry_engine.h8.4 KBuniform_int_distribution.h9.2 KBuniform_real_distribution.h9.3 KBxor_combine_engine.h9 KBrandom.h3.7 KBreduce.h17 KBremove.h16.6 KBreplace.h13.2 KBreverse.h4 KBscan.h31.2 KBscatter.h7.2 KBsequence.h5.6 KBset_intersection.h1.5 KBset_operations.h4.7 KBsort.h23.8 KBsystemcuda_error.h5 KBdetailerrno.h4.3 KBcuda_error.inl2 KBerror_code.inl4.4 KBerror_category.inl9.3 KBerror_condition.inl3.1 KBsystem_error.inl2.2 KBsystem_error.h5.4 KBerror_code.h17.8 KBswap.h3.8 KBtransform.h12.5 KBsystem_error.h1.3 KBtransform_reduce.h4.1 KBtransform_scan.h7.1 KBtuple.h18.3 KBuninitialized_copy.h3.7 KBuninitialized_fill.h5.1 KBunique.h20.3 KBversion.h2.9 KButility.h1.6 KBthrust_test.cu1.4 KBtimer.h2.7 KBProfilingadd_profile.csv601 Bconfig.txt32 Bsetup_profiling.sh123 BarticleArticle.tex19.1 KBInsightArticle.cls3.9 KBInsightJournal.bib155.8 KBInsightJournal.sty35.2 KBMakefile1.9 KBalgorithm.sty2.2 KBalgorithmic.sty5.4 KBamssymb.sty14.9 KBfancyhdr.sty14.4 KBfloatflt.sty10.8 KBfncychap.sty10.1 KBtimes.sty857 Bpatch.3.20.0.dif114.7 KB

Select a file to preview