You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: DirectProgramming/DPC++FPGA/ReferenceDesigns/qrd/README.md
+6-7Lines changed: 6 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -174,11 +174,10 @@ After learning how to use the extensions for Intel oneAPI Toolkits, return to th
174
174
You can compile and run this Reference Design in the Eclipse* IDE (in Linux*) and the Visual Studio* IDE (in Windows*). For instructions, refer to the following link: [Intel® oneAPI DPC++ FPGA Workflows on Third-Party IDEs](https://software.intel.com/en-us/articles/intel-oneapi-dpcpp-fpga-workflow-on-ide)
175
175
176
176
## Running the Reference Design
177
-
You can apply QR decomposition to a number of matrices, as shown below. This step performs the following:
178
-
* Generates the number of random matrices specified as the command line argument (defaults to 128).
179
-
* Computes QR decomposition on all matrices.
180
-
* Evaluates performance.
181
-
NOTE: The design is optimized to perform best when run on a large number of matrices, where the total number of matrices is a power of 2.
177
+
You can perform the QR decomposition of 8 matrices repeatedly, as shown below. This step performs the following:
178
+
* Generates 8 random matrices.
179
+
* Computes the QR decomposition of the 8 matrices.
180
+
* Repeats the decomposition multiple times (specified as a command line argument) to evaluate performance.
182
181
183
182
184
183
1. Run the sample on the FPGA emulator (the kernel executes on the CPU).
@@ -191,7 +190,7 @@ NOTE: The design is optimized to perform best when run on a large number of matr
191
190
qrd.fpga_emu.exe (Windows)
192
191
```
193
192
194
-
2. Run the sample on the FPGA device. It is recommended to pass in an optional argument (as shown) when invoking the sample on hardware. Otherwise, the performance will not be representative of the design's throughput. Indeed, the throughput is measured as the total kernel execution time divided by the number of matrices decomposed. However, the transfer of the matrices from the host/device to the device/host also takes some time. This memory transfer is performed by chunks of matrices in parallel to the compute kernel. The first/last chunk of matrices transferred will therefore occur with the computation kernel doing nothing. Thus, the higher the number of matrices to be decomposed, the more accurate the throughput result will be.
193
+
2. Run the sample on the FPGA device.
195
194
```
196
195
./qrd.fpga (Linux)
197
196
```
@@ -223,7 +222,7 @@ Verifying results on matrix 0
223
222
PASSED
224
223
```
225
224
226
-
Example output when running on Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) for the decomposition of 8 matrices 409600 times (each matrix consisting of 256*256 complex numbers):
225
+
Example output when running on Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) for the decomposition of 8 matrices 819200 times (each matrix consisting of 256*256 complex numbers):
Copy file name to clipboardExpand all lines: DirectProgramming/DPC++FPGA/ReferenceDesigns/qri/README.md
+9-11Lines changed: 9 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -147,12 +147,10 @@ When compiling for FPGA hardware, it is recommended to increase the job timeout
147
147
You can compile and run this Reference Design in the Eclipse* IDE (in Linux*) and the Visual Studio* IDE (in Windows*). For instructions, refer to the following link: [Intel® oneAPI DPC++ FPGA Workflows on Third-Party IDEs](https://software.intel.com/en-us/articles/intel-oneapi-dpcpp-fpga-workflow-on-ide)
148
148
149
149
## Running the Reference Design
150
-
You can apply QR matrix inversion to a number of matrices, as shown below. This step performs the following:
151
-
* Generates the number of random matrices specified as the command line argument (defaults to 128).
152
-
* Computes QR matrix inversion on all matrices.
153
-
* Evaluates performance.
154
-
NOTE: The design is optimized to perform best when run on a large number of matrices, where the total number of matrices is a power of 2.
155
-
150
+
You can perform the QR-based inversion of 8 matrices repeatedly, as shown below. This step performs the following:
151
+
* Generates 8 random matrices.
152
+
* Computes the QR-based inversion of the 8 matrices.
153
+
* Repeats the decomposition multiple times (specified as a command line argument) to evaluate performance.
156
154
157
155
1. Run the sample on the FPGA emulator (the kernel executes on the CPU).
158
156
Increase the amount of memory that the emulator runtime is permitted to allocate by setting the CL_CONFIG_CPU_FORCE_PRIVATE_MEM_SIZE environment variable before running the executable.
@@ -164,9 +162,9 @@ NOTE: The design is optimized to perform best when run on a large number of matr
164
162
qri.fpga_emu.exe (Windows)
165
163
```
166
164
167
-
2. Run the sample on the FPGA device. It is recommended to pass in an optional argument (as shown) when invoking the sample on hardware. Otherwise, the performance will not be representative of the design's throughput. Indeed, the throughput is measured as the total kernel execution time divided by the number of matrices inverted. However, the transfer of the matrices from the host/device to the device/host also takes some time. This memory transfer is performed by chunks of matrices in parallel to the compute kernel. The first/last chunk of matrices transferred will therefore occur with the computation kernel doing nothing. Then, the higher the number of matrices to be inverted, the more accurate the throughput result will be.
165
+
2. Run the sample on the FPGA device.
168
166
```
169
-
./qri.fpga 40960 (Linux)
167
+
./qri.fpga (Linux)
170
168
```
171
169
### Application Parameters
172
170
@@ -176,7 +174,7 @@ NOTE: The design is optimized to perform best when run on a large number of matr
176
174
177
175
### Example of Output
178
176
179
-
Example output when running the emulator on 2048 matrices (each consisting of 32*32 real numbers):
177
+
Example output when running the emulator on 8 matrices (each consisting of 32*32 real numbers):
180
178
181
179
```
182
180
Device name: Intel(R) FPGA Emulation Device
@@ -196,7 +194,7 @@ Verifying results on matrix 0
196
194
PASSED
197
195
```
198
196
199
-
Example output when running on Intel® PAC with Intel Arria® 10 GX FPGA for 32768 matrices (each consisting of 32*32 real numbers):
197
+
Example output when running on Intel® PAC with Intel Arria® 10 GX FPGA for 8 matrices (each consisting of 32*32 real numbers):
0 commit comments