Skip to content

Commit 0838f49

Browse files
authored
Update README.md
1 parent 6554c32 commit 0838f49

File tree

1 file changed

+54
-19
lines changed
  • Tools/ApplicationDebugger/guided_matrix_mult_SLMSize

1 file changed

+54
-19
lines changed

Tools/ApplicationDebugger/guided_matrix_mult_SLMSize/README.md

Lines changed: 54 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -237,45 +237,58 @@ In `1_matrix_mul_SLM_size`, the local_accessor class is used to reserve an illeg
237237

238238
#### Root-Cause the Issue
239239

240-
You can see that there is something wrong in the submit at line `104`. You need some more information to understand what is happening. For that we need to capture the lower-level API calls using the `onetrace` tool.
240+
You can see that there is something wrong in the submit at line `104`, but we need more information to understand what is happening. For that we need to capture the lower-level API calls using the `onetrace` tool.
241241

242242
>**Note**: You must have already built the [Tracing and Profiling Tool](https://github.com/intel/pti-gpu/tree/master/tools/onetrace). Once you have built the utility, you can invoke it before your program (similar to GBD).
243243
244-
One of the things that the Tracing and Profiling utility can help us identify is printing every low-level API call made to OpenCL™ or Level Zero. This is the features that we will use to attempt to match the source to the events.
244+
Among other things, the Tracing and Profiling utility can print every low-level API call made to OpenCL™ or Level Zero. This is the feature that we will use to get more information about the crash.
245245

246-
2. Run the program with `onetrace` and enable the RT debug messages:
246+
2. Run the program with `onetrace` and enable the runtime debug messages:
247247
```
248248
onetrace -c ./1_matrix_mul_SLM_size
249249
```
250250

251-
3. Continue listing the output until the error occurs and the program stops.
251+
3. Let the output continue until the error occurs and the program stops.
252252
```
253-
<<<< [504780292] zeEventHostReset [3564 ns] -> ZE_RESULT_SUCCESS(0x0)
254-
>>>> [504789109] zeCommandListAppendLaunchKernel: hCommandList = 0x4cf64b0 hKernel = 0x53b0350 (_ZTSZZ4mainENKUlRN4sycl3_V17handlerEE_clES2_EUlNS0_7nd_itemILi1EEEE_) pLaunchFuncArgs = 0x7ffcc0831cec {16385, 1, 1} hSignalEvent = 0x48332d0 numWaitEvents = 0 phWaitEvents = 0
255-
<<<< [504818879] zeCommandListAppendLaunchKernel [17599 ns] -> ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY(0x1879048195)
253+
:
254+
>>>> [1066578697845396] zeKernelSetGroupSize: hKernel = 55242736 groupSizeX = 10 groupSizeY = 1 groupSizeZ = 1
255+
<<<< [1066578697849285] zeKernelSetGroupSize [1449 ns] -> ZE_RESULT_SUCCESS(0x0)
256+
>>>> [1066578697854047] zeCommandListCreateImmediate: hContext = 41540224 hDevice = 37134192 altdesc = 140733241819552 {ZE_STRUCTURE_TYPE_COMMAND_QUEUE_DESC(0xe) 0 0 0 0 2 0} phCommandList = 140733241819544 (hCommandList = 0)
257+
<<<< [1066578698107437] zeCommandListCreateImmediate [248694 ns] hCommandList = 61984688 -> ZE_RESULT_SUCCESS(0x0)
258+
>>>> [1066578698115446] zeEventHostReset: hEvent = 39536208
259+
<<<< [1066578698119590] zeEventHostReset [1854 ns] -> ZE_RESULT_SUCCESS(0x0)
260+
>>>> [1066578698126085] zeCommandListAppendLaunchKernel: hCommandList = 61984688 hKernel = 55242736 (_ZTSZZ4mainENKUlRN4sycl3_V17handlerEE_clES2_EUlNS0_7nd_itemILi1EEEE_) pLaunchFuncArgs = 140733241820008 {16385, 1, 1} hSignalEvent = 39536208 numWaitEvents = 0 phWaitEvents = 0
261+
<<<< [1066578698169233] zeCommandListAppendLaunchKernel [34637 ns] -> ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY(0x1879048195)
256262
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
257263
what(): Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
258264
Aborted (core dumped)
259265
```
260266

261-
**Clue**: Due to the running the program under onetrace we can see that the error happens during launching of the kernel:
262-
```
263-
<<<< [504818879] zeCommandListAppendLaunchKernel [17599 ns] -> ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY(0x1879048195)
264-
```
267+
**Clue**: By running the program under onetrace we can see that the error happens when launching a kernel called `(_ZTSZZ4mainENKUlRN4sycl3_V17handlerEE_clES2_EUlNS0_7nd_itemILi1EEEE_`), and that this fails with an `ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY` error.
268+
269+
A note about the output above. You will see that is has two lines that read:
270+
271+
```
272+
>>>> [1066578697845396] zeKernelSetGroupSize: hKernel = 55242736 groupSizeX = 10 groupSizeY = 1 groupSizeZ = 1
273+
:
274+
>>>> [1066578698126085] zeCommandListAppendLaunchKernel: hCommandList = 61984688 hKernel = 55242736 (_ZTSZZ4mainENKUlRN4sycl3_V17handlerEE_clES2_EUlNS0_7nd_itemILi1EEEE_) pLaunchFuncArgs = 140733241820008 {16385, 1, 1} hSignalEvent = 39536208 numWaitEvents = 0 phWaitEvents = 0
275+
```
276+
277+
We used the form of `parallel_for` that takes the `nd_range`, which specifies the global iteration range (163850) and the local work-group size (10) like so: `nd_range<1>{{163850}, {10}}`. The first line above shows the workgroup size (`groupSizeX = 10 groupSizeY = 1 groupSizeZ = 1`), and the second shows how many total workgroups will be needed to process the global iteration range (`{16385, 1, 1}`).
265278

266279
#### Determine Device Limits
267280

268-
If you have access to a version of the graphics drivers built with debug functionality, you can get even more information about this error by setting two NEO variables and values: `PrintDebugMessages=1` and `NEOReadDebugKeys=1` ().
281+
If you have access to a version of the graphics drivers built with debug functionality, you can get even more information about this error by setting two NEO variables to the following values:
269282

270283
```
271-
$ export NEOReadDebugKeys=1
272-
$ export PrintDebugMessages=1
284+
export NEOReadDebugKeys=1
285+
export PrintDebugMessages=1
273286
```
274287

275-
When you set these environment variables and and re-run the program, you should see results similar to the following:
288+
When you set these environment variables and re-run the program, you should see results similar to the following:
276289

277290
```
278-
$ ./1_matrix_mul_SLM_size
291+
./1_matrix_mul_SLM_size
279292
Initializing
280293
:Problem size: c(150,600) = a(150,300) * b(300,600)
281294
Ignored kernel-scope Patch Token: 21
@@ -287,10 +300,32 @@ terminate called after throwing an instance of 'sycl::_V1::runtime_error'
287300
what(): Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
288301
Aborted (core dumped)
289302
```
290-
The new message is `Size of SLM (656384) larger than available (131072)`. This tells you the size of the Shared Local Memory (SLM) memory on the device, 131072 bytes (128Kb), is smaller than the requested size of 656384 bytes.
291303

292-
If the `parallel_for` were operating over a multi-dimensional range (for example, if `acc` were two or three-dimensional), you need to multiply the dimensions together to determine the number of floating point numbers we are trying to store in SLM. In our case, the calculation is easy: 163850 (`globalSizeX`) times 1 (`glocalSizeY`) times 1 (`globalSizeZ`). So the problem is that the size of work-group local memory we tried to allocate, (163850 floats or 4*163850=655,400 bytes), doesn't fit in the SLM on this device.
293-
You should notice that the different devices will have different amounts of memory set aside as SLM. In SYCL, you can query this number by passing `info::device::local_mem_size` to the `get_info` member of the `device` class.
304+
The new message of interest is `Size of SLM (656384) larger than available (131072)`. This tells you that the size of the Shared Local Memory (SLM) memory on the device, 131072 bytes (128Kb), is smaller than the requested size of 656384 bytes.
305+
306+
If the `parallel_for` were operating over a multi-dimensional range (for example, if `acc` were two or three-dimensional), you need to multiply the dimensions together to determine the number of floating point numbers we are trying to store in SLM. In our case, the calculation is easy: the first argument to the `nd_range` in the `parallel_for` is single-dimensional, so it's just 163850. Thus the problem is that the size of work-group local memory we tried to allocate, (163850 floats or 4*163850=655,400 bytes rounded up to the nearest 64-byte cache line), doesn't fit in the SLM on this device.
307+
308+
You should know that different devices will have different amounts of memory set aside as SLM. In SYCL, you can query this number by passing `info::device::local_mem_size` to the `get_info` member of the `device` class.
309+
310+
Finally, running under `onetrace -c` you see:
311+
312+
```
313+
>>>> [1066578697845396] zeKernelSetGroupSize: hKernel = 55242736 groupSizeX = 10 groupSizeY = 1 groupSizeZ = 1
314+
<<<< [1066578697849285] zeKernelSetGroupSize [1449 ns] -> ZE_RESULT_SUCCESS(0x0)
315+
>>>> [1066578697854047] zeCommandListCreateImmediate: hContext = 41540224 hDevice = 37134192 altdesc = 140733241819552 {ZE_STRUCTURE_TYPE_COMMAND_QUEUE_DESC(0xe) 0 0 0 0 2 0} phCommandList = 140733241819544 (hCommandList = 0)
316+
<<<< [1066578698107437] zeCommandListCreateImmediate [248694 ns] hCommandList = 61984688 -> ZE_RESULT_SUCCESS(0x0)
317+
>>>> [1066578698115446] zeEventHostReset: hEvent = 39536208
318+
<<<< [1066578698119590] zeEventHostReset [1854 ns] -> ZE_RESULT_SUCCESS(0x0)
319+
>>>> [1066578698126085] zeCommandListAppendLaunchKernel: hCommandList = 61984688 hKernel = 55242736 (_ZTSZZ4mainENKUlRN4sycl3_V17handlerEE_clES2_EUlNS0_7nd_itemILi1EEEE_) pLaunchFuncArgs = 140733241820008 {16385, 1, 1} hSignalEvent = 39536208 numWaitEvents = 0 phWaitEvents = 0
320+
Size of SLM (656384) larger than available (131072)
321+
<<<< [1066578698169233] zeCommandListAppendLaunchKernel [34637 ns] -> ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY(0x1879048195)
322+
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
323+
what(): Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
324+
```
325+
326+
This is useful because it shows you the kernel being called that caused the error (`_ZTSZZ4mainENKUlRN4sycl3_V17handlerEE_clES2_EUlNS0_7nd_itemILi1EEEE_` which `c++filt` resolves to `typeinfo name for main::{lambda(sycl::_V1::handler&)#1}::operator()(sycl::_V1::handler&) const::{lambda(sycl::_V1::nd_item<1>)#1} `) in addition to the amount of memory requested vs. the available size of SLM.
327+
328+
294329

295330
#### Resolving the Problem
296331

0 commit comments

Comments
 (0)