Skip to content

add cuda::Stream constructor with cuda stream flags #19286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 3, 2021
Merged

add cuda::Stream constructor with cuda stream flags #19286

merged 1 commit into from
Feb 3, 2021

Conversation

diablodale
Copy link
Contributor

@diablodale diablodale commented Jan 8, 2021

adds feature described in issue #19285
includes code, test, and doc.

TLDR: adds a constructor for cuda::Stream which accepts a raw cudaStream_t and assumes ownership of it including destroying the raw CUDA stream in the Stream's destructor.

Pull Request Readiness Checklist

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=Custom
buildworker:Custom=linux-4
build_image:Custom=ubuntu-cuda:18.04

@diablodale
Copy link
Contributor Author

updated PR to remove python wrap for the new constructor. Python and move semantics are not supported. And not workable for python to have a true cudaStream_t from OpenCV or numby. Might be possible via CuPy, but that is out of scope.

Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contribution!

@diablodale diablodale changed the title add cuda::Stream constructor from cudaStream_t add cuda::Stream constructor with cuda stream flags Jan 21, 2021
@diablodale
Copy link
Contributor Author

I see two sets of warnings/errors in the CI builds. I request assist as I do not understand them

warnings caused by the Py wrapper infrastructure

https://pullrequest.opencv.org/buildbot/builders/precommit_opencl_linux/builds/27820/steps/compile%20release/logs/warnings%20%282%29. The parameter on the C++ constructor is intentionally required and should not have a default value. This is the generated code.

{
    PyObject* pyobj_cudaFlags = NULL;
    unsigned cudaFlags;

    const char* keywords[] = { "cudaFlags", NULL };
    if( PyArg_ParseTupleAndKeywords(py_args, kw, "O:Stream", (char**)keywords, &pyobj_cudaFlags) &&
        pyopencv_to(pyobj_cudaFlags, cudaFlags, ArgInfo("cudaFlags", 0)) )
    {
        new (&(self->v)) Ptr<cv::cuda::Stream>(); // init Ptr with placement new
        if(self) ERRWRAP2(self->v.reset(new cv::cuda::Stream(cudaFlags)));
        return 0;
    }
}

fail in the "custom" CI platform

I see errors in some .cu file that is likely generated by something that is part of a maybe a contrib module. I am unable to discern the process and/or initial source code to do any diagnostics.

-- Generating /build/precommit_custom_linux/build/modules/cudaoptflow/CMakeFiles/cuda_compile.dir/src/cuda/./cuda_compile_generated_nvidiaOpticalFlow.cu.o
/usr/local/cuda/bin/nvcc /build/precommit_custom_linux/opencv_contrib/modules/cudaoptflow/src/cuda/nvidiaOpticalFlow.cu -c -o /build/precommit_custom_linux/build/modules/cudaoptflow/CMakeFiles/cuda_compile.dir/src/cuda/./cuda_compile_generated_nvidiaOpticalFlow.cu.o -ccbin /usr/bin/gcc-5 -m64 -D__OPENCV_BUILD=1 -D_USE_MATH_DEFINES -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -D__STDC_FORMAT_MACROS -Xcompiler ,\"-fsigned-char\",\"-W\",\"-Wall\",\"-Werror=return-type\",\"-Werror=non-virtual-dtor\",\"-Werror=address\",\"-Werror=sequence-point\",\"-Wformat\",\"-Werror=format-security\",\"-Winit-self\",\"-Wpointer-arith\",\"-Wuninitialized\",\"-Wno-comment\",\"-fdiagnostics-show-option\",\"-Wno-long-long\",\"-pthread\",\"-fomit-frame-pointer\",\"-ffunction-sections\",\"-fdata-sections\",\"-msse\",\"-msse2\",\"-msse3\",\"-fvisibility=hidden\",\"-Wno-undef\",\"-Wno-missing-declarations\",\"-Wno-shadow\",\"-Wno-strict-aliasing\",\"-Wno-unused-but-set-variable\",\"-O3\",\"-DNDEBUG\",\"-DNDEBUG\" -gencode arch=compute_61,code=sm_61 -D_FORCE_INLINES -Xcompiler -DCVAPI_EXPORTS -Xcompiler -fPIC --std=c++11 -DNVCC -I/usr/local/cuda/include -I/build/precommit_custom_linux/build/3rdparty/ippicv/ippicv_lnx/icv/include -I/build/precommit_custom_linux/build/3rdparty/ippicv/ippicv_lnx/iw/include -I/build/precommit_custom_linux/build -I/usr/include/gdal -I/usr/include/eigen3 -I/usr/include -I/build/precommit_custom_linux/opencv_contrib/modules/cudaoptflow/include -I/build/precommit_custom_linux/build/modules/cudaoptflow -I/build/precommit_custom_linux/opencv_contrib/modules/cudev/include -I/build/precommit_custom_linux/opencv/modules/core/include -I/build/precommit_custom_linux/opencv_contrib/modules/cudaarithm/include -I/build/precommit_custom_linux/opencv/modules/flann/include -I/build/precommit_custom_linux/opencv/modules/imgproc/include -I/build/precommit_custom_linux/opencv_contrib/modules/cudafilters/include -I/build/precommit_custom_linux/opencv_contrib/modules/cudaimgproc/include -I/build/precommit_custom_linux/opencv_contrib/modules/cudawarping/include -I/build/precommit_custom_linux/opencv/modules/dnn/include -I/build/precommit_custom_linux/opencv/modules/features2d/include -I/build/precommit_custom_linux/opencv/modules/imgcodecs/include -I/build/precommit_custom_linux/opencv/modules/calib3d/include -I/build/precommit_custom_linux/opencv/modules/objdetect/include -I/build/precommit_custom_linux/opencv/modules/video/include -I/build/precommit_custom_linux/opencv_contrib/modules/ximgproc/include -I/build/precommit_custom_linux/opencv_contrib/modules/cudalegacy/include -I/build/precommit_custom_linux/opencv_contrib/modules/optflow/include -I/usr/local/cuda/include
/build/precommit_custom_linux/opencv_contrib/modules/cudaoptflow/src/cuda/nvidiaOpticalFlow.cu(76): error: no instance of overloaded function "surf2Dwrite" matches the argument list
            argument types are: (short2, cudaSurfaceObject_t, unsigned long, int, cudaSurfaceBoundaryMode)

1 error detected in the compilation of "/tmp/tmpxft_000010f5_00000000-7_nvidiaOpticalFlow.cpp1.ii".
-- Removing /build/precommit_custom_linux/build/modules/cudaoptflow/CMakeFiles/cuda_compile.dir/src/cuda/./cuda_compile_generated_nvidiaOpticalFlow.cu.o
/usr/bin/cmake -E remove /build/precommit_custom_linux/build/modules/cudaoptflow/CMakeFiles/cuda_compile.dir/src/cuda/./cuda_compile_generated_nvidiaOpticalFlow.cu.o
CMake Error at cuda_compile_generated_nvidiaOpticalFlow.cu.o.cmake:264 (message):
  Error generating file
  /build/precommit_custom_linux/build/modules/cudaoptflow/CMakeFiles/cuda_compile.dir/src/cuda/./cuda_compile_generated_nvidiaOpticalFlow.cu.o


modules/cudaoptflow/CMakeFiles/opencv_cudaoptflow.dir/build.make:84: recipe for target 'modules/cudaoptflow/CMakeFiles/cuda_compile.dir/src/cuda/cuda_compile_generated_nvidiaOpticalFlow.cu.o' failed
make[2]: *** [modules/cudaoptflow/CMakeFiles/cuda_compile.dir/src/cuda/cuda_compile_generated_nvidiaOpticalFlow.cu.o] Error 1
make[2]: Leaving directory '/build/precommit_custom_linux/build'
CMakeFiles/Makefile2:23199: recipe for target 'modules/cudaoptflow/CMakeFiles/opencv_cudaoptflow.dir/all' failed
make[1]: *** [modules/cudaoptflow/CMakeFiles/opencv_cudaoptflow.dir/all] Error 2

@alalek
Copy link
Member

alalek commented Jan 22, 2021

const unsigned cudaFlags

try int cudaFlags


nvidiaOpticalFlow.cu

Problem is from here: opencv/opencv_contrib#2807 (comment)
Workarounded through using Ubuntu 18.04 build image (with CUDA 10.0).

@diablodale
Copy link
Contributor Author

diablodale commented Jan 22, 2021

The cuda api parameter signature is unsigned int. It is unsafe to use a signed parameter or to cast it.
I just learned that python has no unsigned integer variables. Ugh. Python. Yuck.
I can't cast, because I can't guarantee the size of each "int" in C++ and/or Python across 32/64/Win/Mac/Linux.
So....I recommend we remove the python wrapper for this constructor because of the python limitation.

I use unsigned because the python wrapper infrastructure has a bug if you use const unsigned int...it thinks the type is const unsigned and the variable is int xxxx 😂 a variable name with a space in it. Therefore, I use const unsigned.

nvidiaOpticalFlow.cu

This is part of the OpenCV CI. I have no control of that system. Do you want to fix that or turn off the failing platform?

@alalek
Copy link
Member

alalek commented Jan 22, 2021

This is not about Python only. Bindings wrappers support only subset of types.
Unsigned is not supported due to a multiple reasons. Use 'int'. It is by design.


I can't cast

This doesn't matter until you have such platform in your hands (with CUDA).

@diablodale
Copy link
Contributor Author

I will experiment today, but no guarantees. The most important user to me is C++. And I will not compromise c++ because of python itself or the python infrastructure. The C++ constructor will have unsigned int. I will experiment to see if I can somehow have python/infrastructure have instead an int. Debugging/enhancing the python wrappers is out-of-scope for this PR -- that is something else for someone else.

I will include a disclaimer in the code/docs that the sign is unreliable and may cause problems now or in the future.
I still suggest we not python wrap this constructor.

@diablodale
Copy link
Contributor Author

I do not see anything that I would change in the constructor.

The two warnings I reported above only appear in 3 platforms. The two test cases I wrote pass and my app itself works well.
If it is desired to not have these warnings, I recommend someone with experience on the python wrapper infrastructure debug this issue.

The python wrapping infrastructure already has some code to support size_t and unsigned int. Perhaps this code is not working correctly or incomplete.

template<typename T>
struct PyOpenCV_Converter
< T, typename std::enable_if< std::is_same<unsigned int, T>::value && !std::is_same<unsigned int, size_t>::value >::type >
{
static inline PyObject* from(const unsigned int& value)
{
return PyLong_FromUnsignedLong(value);
}
static inline bool to(PyObject* obj, unsigned int& value, const ArgInfo& info)
{
CV_UNUSED(info);
if(!obj || obj == Py_None)

template<>
PyObject* pyopencv_from(const size_t& value)
{
return PyLong_FromSize_t(value);
}
template<>
bool pyopencv_to(PyObject* obj, size_t& value, const ArgInfo& info)
{
if (!obj || obj == Py_None)
{
return true;
}
if (isBool(obj))

I also found at least one function in opencv that uses the size_t support and I don't see warnings about it in the logs.
CV_WRAP void cv::cuda::DeviceInfo::queryMemory(size_t& totalMemory, size_t& freeMemory) const;

cv::cuda::Stream cvStream(cudaStreamNonBlocking);
@endcode
*/
CV_WRAP Stream(const unsigned cudaFlags);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove CV_WRAP from here to keep this C++ only.

We can add CV_WRAP static Stream create(int cudaFlags) or similar for bindings later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of an alternative. I will need to code it and force push to see if it removes the two warnings in the CI platforms. (Functionality has not been a problem...only two warnings on a subset of platforms).

I can code the constructor to accept an int (you suggest this is preferred in OpenCV) and also reject negative numbers. By accepting only positive int values...I can safely cast it to an unsigned int. This alternative supports all the values that the NVidia API currently supports. If in the future NVidia defines a value that is larger than a signed int can represent and someone uses that value, then this constructor will fail. And OpenCV will need to make a choice how to handle NVidia's new value.

I would prefer to not have the cost of a branch/if in the constructor. What is the OpenCV philosophy to use CV_DbgAssert(param>=0). I would prefer that so there is no cost in release builds.
If the core team doesn't like that, then I can use CV_Assert(param>=0).

@@ -9,6 +9,10 @@
#include "opencv2/core/eigen.hpp"
#endif

#if defined(HAVE_CUDA)
#include <cuda_runtime.h>
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It make sense to create new file test_cuda.cpp for this test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I will move it into a new file.

@@ -2355,5 +2359,15 @@ TEST(Mat, regression_18473)
EXPECT_EQ((int)5, (int)m.at<short>(19, 49, 99));
}

TEST(CUDA_Stream, construct_cudaFlags)
{
#ifdef HAVE_CUDA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guard the whole test (or the file as mentioned above) with #ifdef HAVE_CUDA.

IMHO, "#else" branch is not really needed (it is hard enough to introduce such regression).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll incorporate with the coming PR update.

Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is wrong with cv::cuda::StreamAccessor?
It was specially added to work with CUDA SDK directly (see all CUDA enums / types, respect type safety).

Stream API has limitation, which doesn't allow to use 3rdparty dependencies (CUDA SDK).


StreamAccessor is not a part of bindings.

@@ -33,6 +33,8 @@ def test_cuda_interop(self):
self.assertTrue(cuMat.cudaPtr() != 0)
stream = cv.cuda_Stream()
self.assertTrue(stream.cudaPtr() != 0)
asyncstream = cv.cuda_Stream(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a comment with description for magic numbers:

asyncstream = cv.cuda_Stream(1)  # cudaStreamNonBlocking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CUDA api defines are documented in the doxygen for the constructor. I can also add it to this test file, no problem.

@diablodale
Copy link
Contributor Author

Did you write again "What is wrong with cv::cuda::StreamAccessor?"?
If you did, this was discussed multiples times before.

cv::cuda::Stream cvStream(cudaStreamNonBlocking);
@endcode
*/
CV_WRAP Stream(const int cudaFlags);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look a bit more at bindings generators.
There is size_t type support in bindings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pushing again to check for any warnings on the CI platforms. Using size_t in this way is not in alignment with intention of the type https://en.cppreference.com/w/cpp/types/size_t
It may work on all common platforms/data models today. And it may fail in the future due to using size_t in a way the C++ standard did not intend.

Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 👍

@opencv-pushbot opencv-pushbot merged commit 6ca46af into opencv:master Feb 3, 2021
@alalek alalek mentioned this pull request Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: gpu/cuda (contrib) OpenCV 4.0+: moved to opencv_contrib feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants