You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Tools/ApplicationDebugger/guided_matrix_mult_SLMSize/README.md
+54-19Lines changed: 54 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -237,45 +237,58 @@ In `1_matrix_mul_SLM_size`, the local_accessor class is used to reserve an illeg
237
237
238
238
#### Root-Cause the Issue
239
239
240
-
You can see that there is something wrong in the submit at line `104`. You need some more information to understand what is happening. For that we need to capture the lower-level API calls using the `onetrace` tool.
240
+
You can see that there is something wrong in the submit at line `104`, but we need more information to understand what is happening. For that we need to capture the lower-level API calls using the `onetrace` tool.
241
241
242
242
>**Note**: You must have already built the [Tracing and Profiling Tool](https://github.com/intel/pti-gpu/tree/master/tools/onetrace). Once you have built the utility, you can invoke it before your program (similar to GBD).
243
243
244
-
One of the things that the Tracing and Profiling utility can help us identify is printing every low-level API call made to OpenCL™ or Level Zero. This is the features that we will use to attempt to match the source to the events.
244
+
Among other things, the Tracing and Profiling utility can print every low-level API call made to OpenCL™ or Level Zero. This is the feature that we will use to get more information about the crash.
245
245
246
-
2. Run the program with `onetrace` and enable the RT debug messages:
246
+
2. Run the program with `onetrace` and enable the runtime debug messages:
247
247
```
248
248
onetrace -c ./1_matrix_mul_SLM_size
249
249
```
250
250
251
-
3.Continue listing the output until the error occurs and the program stops.
251
+
3.Let the output continue until the error occurs and the program stops.
**Clue**: By running the program under onetrace we can see that the error happens when launching a kernel called `(_ZTSZZ4mainENKUlRN4sycl3_V17handlerEE_clES2_EUlNS0_7nd_itemILi1EEEE_`), and that this fails with an `ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY` error.
268
+
269
+
A note about the output above. You will see that is has two lines that read:
We used the form of `parallel_for` that takes the `nd_range`, which specifies the global iteration range (163850) and the local work-group size (10) like so: `nd_range<1>{{163850}, {10}}`. The first line above shows the workgroup size (`groupSizeX = 10 groupSizeY = 1 groupSizeZ = 1`), and the second shows how many total workgroups will be needed to process the global iteration range (`{16385, 1, 1}`).
265
278
266
279
#### Determine Device Limits
267
280
268
-
If you have access to a version of the graphics drivers built with debug functionality, you can get even more information about this error by setting two NEO variables and values: `PrintDebugMessages=1` and `NEOReadDebugKeys=1` ().
281
+
If you have access to a version of the graphics drivers built with debug functionality, you can get even more information about this error by setting two NEO variables to the following values:
269
282
270
283
```
271
-
$ export NEOReadDebugKeys=1
272
-
$ export PrintDebugMessages=1
284
+
export NEOReadDebugKeys=1
285
+
export PrintDebugMessages=1
273
286
```
274
287
275
-
When you set these environment variables and and re-run the program, you should see results similar to the following:
288
+
When you set these environment variables and re-run the program, you should see results similar to the following:
@@ -287,10 +300,32 @@ terminate called after throwing an instance of 'sycl::_V1::runtime_error'
287
300
what(): Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
288
301
Aborted (core dumped)
289
302
```
290
-
The new message is `Size of SLM (656384) larger than available (131072)`. This tells you the size of the Shared Local Memory (SLM) memory on the device, 131072 bytes (128Kb), is smaller than the requested size of 656384 bytes.
291
303
292
-
If the `parallel_for` were operating over a multi-dimensional range (for example, if `acc` were two or three-dimensional), you need to multiply the dimensions together to determine the number of floating point numbers we are trying to store in SLM. In our case, the calculation is easy: 163850 (`globalSizeX`) times 1 (`glocalSizeY`) times 1 (`globalSizeZ`). So the problem is that the size of work-group local memory we tried to allocate, (163850 floats or 4*163850=655,400 bytes), doesn't fit in the SLM on this device.
293
-
You should notice that the different devices will have different amounts of memory set aside as SLM. In SYCL, you can query this number by passing `info::device::local_mem_size` to the `get_info` member of the `device` class.
304
+
The new message of interest is `Size of SLM (656384) larger than available (131072)`. This tells you that the size of the Shared Local Memory (SLM) memory on the device, 131072 bytes (128Kb), is smaller than the requested size of 656384 bytes.
305
+
306
+
If the `parallel_for` were operating over a multi-dimensional range (for example, if `acc` were two or three-dimensional), you need to multiply the dimensions together to determine the number of floating point numbers we are trying to store in SLM. In our case, the calculation is easy: the first argument to the `nd_range` in the `parallel_for` is single-dimensional, so it's just 163850. Thus the problem is that the size of work-group local memory we tried to allocate, (163850 floats or 4*163850=655,400 bytes rounded up to the nearest 64-byte cache line), doesn't fit in the SLM on this device.
307
+
308
+
You should know that different devices will have different amounts of memory set aside as SLM. In SYCL, you can query this number by passing `info::device::local_mem_size` to the `get_info` member of the `device` class.
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
323
+
what(): Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
324
+
```
325
+
326
+
This is useful because it shows you the kernel being called that caused the error (`_ZTSZZ4mainENKUlRN4sycl3_V17handlerEE_clES2_EUlNS0_7nd_itemILi1EEEE_` which `c++filt` resolves to `typeinfo name for main::{lambda(sycl::_V1::handler&)#1}::operator()(sycl::_V1::handler&) const::{lambda(sycl::_V1::nd_item<1>)#1} `) in addition to the amount of memory requested vs. the available size of SLM.
0 commit comments