Skip to content

__shfl_up_sync with mask for CUDA >= 9 #13658

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 21, 2019

Conversation

nglee
Copy link
Contributor

@nglee nglee commented Jan 19, 2019

For warp shuffle functions introduced from CUDA 9, giving full mask may cause the program to hang on some devices. This PR tries to fix this for __shfl_up_sync function. Without this PR, following tests hang with RTX 2080 Ti.

./bin/opencv_test_cudev --gtest_filter=*Integral*
./bin/opencv_test_cudaarithm --gtest_filter=*Integral*
./bin/opencv_test_cudaimgproc --gtest_filter=*CLAHE*

Resolves #13014

force_builders=Custom
docker_image:Custom=ubuntu-cuda:16.04
buildworker:Custom=linux-1

@nglee nglee force-pushed the dev_CudaShflUpCompat branch 2 times, most recently from e483c48 to 5cdf377 Compare January 21, 2019 01:27
@nglee
Copy link
Contributor Author

nglee commented Jan 21, 2019

Following tests were performed:

opencv_test_cudev --gtest_filter=*BlockScan*
opencv_test_cudev --gtest_filter=*Integral*
opencv_test_cudaarithm --gtest_filter=*Integral*
opencv_test_cudaimgproc --gtest_filter=*CLAHE*

Verified on following test environments:

CUDA 10.0 on 2080 Ti
CUDA 10.0 on 1080
CUDA 8.0 on 1080

I'll squash commits into one.

* __shfl_up_sync with proper mask value for CUDA >= 9

* BlockScanInclusive for CUDA >= 9

* compatible_shfl_up for use in integral.hpp

* Use CLAHE in cudev

* Add tests for BlockScan
@nglee nglee force-pushed the dev_CudaShflUpCompat branch from 46374a1 to 970293a Compare January 21, 2019 15:36
Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done! Thank you 👍

I will merge these changes into master branch in few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug category: gpu/cuda (contrib) OpenCV 4.0+: moved to opencv_contrib feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants