Error reported by Tesla k20c sm_35 but not GeForce GTX 295 sm_13

unknown error. Caused by wrong shared memory request

[NiftyReg CUDA ERROR] file 'gpu.cu' in line 191 : unknown error.

In a development version of CUDA 5.0 code, an image compiled without -arch ran on both GTX 295 and k20c. On a GTX 295 the correct answer is reported dispite a coding error. On the k20c tesla the kernel fails and "unknown error." is reported.
In other words the compute level 3.5 GPU spots and reports the error but compute level 1.1 and 1.3 ignore it. The error may also cause compute level 2.0 Tesla GPUs to abort the kernel but other 2.0 hardware did not.

Work around

Correct the code so that shared memory bytes are requested when the kernel is launched. Eg. replace zero with block_size*8*sizeof(float) in
reg_spline_getDeformationField3D
                <<< G1, B1, block_size*8*sizeof(float) >>>
	  (*positionFieldImageArray_d,

nvcc used to compile but doesn't anymore and you didnt change the program source

Have you created a file called new anywhere in the compiler's include path? Especially in the current directory.

Somewhere in the gcc I/O library it includes a file called new. If you have such a file, the compiler picks it up instead of the one from the I/O library. The error messages it produces depend upon what is in your new file. Usually they are very confusing. E.g.:

/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/locale
_facets.h(3319): error: incomplete type is not allowed

/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/locale
_facets.h(3351): error: incomplete type is not allowed

Error limit reached.

Work around

Rename or move new

This error (when I fell for it the first time) was reported to nVidia. Their response was basically ``that is what C does''.


W.B.Langdon Back 21 Jan 2014