Skip to content

Commit 64df0b5

Browse files
authored
FPGA: Correct the qrd and qri readmes (oneapi-src#866)
* correcting the qrd and qri readmes Signed-off-by: Yohann Uguen <[email protected]> * rephrasing the qrd and qri readmes Signed-off-by: Yohann Uguen <[email protected]>
1 parent 72532ab commit 64df0b5

File tree

2 files changed

+15
-18
lines changed

2 files changed

+15
-18
lines changed

DirectProgramming/DPC++FPGA/ReferenceDesigns/qrd/README.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -174,11 +174,10 @@ After learning how to use the extensions for Intel oneAPI Toolkits, return to th
174174
You can compile and run this Reference Design in the Eclipse* IDE (in Linux*) and the Visual Studio* IDE (in Windows*). For instructions, refer to the following link: [Intel® oneAPI DPC++ FPGA Workflows on Third-Party IDEs](https://software.intel.com/en-us/articles/intel-oneapi-dpcpp-fpga-workflow-on-ide)
175175
176176
## Running the Reference Design
177-
You can apply QR decomposition to a number of matrices, as shown below. This step performs the following:
178-
* Generates the number of random matrices specified as the command line argument (defaults to 128).
179-
* Computes QR decomposition on all matrices.
180-
* Evaluates performance.
181-
NOTE: The design is optimized to perform best when run on a large number of matrices, where the total number of matrices is a power of 2.
177+
You can perform the QR decomposition of 8 matrices repeatedly, as shown below. This step performs the following:
178+
* Generates 8 random matrices.
179+
* Computes the QR decomposition of the 8 matrices.
180+
* Repeats the decomposition multiple times (specified as a command line argument) to evaluate performance.
182181
183182
184183
1. Run the sample on the FPGA emulator (the kernel executes on the CPU).
@@ -191,7 +190,7 @@ NOTE: The design is optimized to perform best when run on a large number of matr
191190
qrd.fpga_emu.exe (Windows)
192191
```
193192
194-
2. Run the sample on the FPGA device. It is recommended to pass in an optional argument (as shown) when invoking the sample on hardware. Otherwise, the performance will not be representative of the design's throughput. Indeed, the throughput is measured as the total kernel execution time divided by the number of matrices decomposed. However, the transfer of the matrices from the host/device to the device/host also takes some time. This memory transfer is performed by chunks of matrices in parallel to the compute kernel. The first/last chunk of matrices transferred will therefore occur with the computation kernel doing nothing. Thus, the higher the number of matrices to be decomposed, the more accurate the throughput result will be.
193+
2. Run the sample on the FPGA device.
195194
```
196195
./qrd.fpga (Linux)
197196
```
@@ -223,7 +222,7 @@ Verifying results on matrix 0
223222
PASSED
224223
```
225224
226-
Example output when running on Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) for the decomposition of 8 matrices 409600 times (each matrix consisting of 256*256 complex numbers):
225+
Example output when running on Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) for the decomposition of 8 matrices 819200 times (each matrix consisting of 256*256 complex numbers):
227226
228227
```
229228
Device name: pac_s10 : Intel PAC Platform (pac_f100000)

DirectProgramming/DPC++FPGA/ReferenceDesigns/qri/README.md

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -147,12 +147,10 @@ When compiling for FPGA hardware, it is recommended to increase the job timeout
147147
You can compile and run this Reference Design in the Eclipse* IDE (in Linux*) and the Visual Studio* IDE (in Windows*). For instructions, refer to the following link: [Intel® oneAPI DPC++ FPGA Workflows on Third-Party IDEs](https://software.intel.com/en-us/articles/intel-oneapi-dpcpp-fpga-workflow-on-ide)
148148
149149
## Running the Reference Design
150-
You can apply QR matrix inversion to a number of matrices, as shown below. This step performs the following:
151-
* Generates the number of random matrices specified as the command line argument (defaults to 128).
152-
* Computes QR matrix inversion on all matrices.
153-
* Evaluates performance.
154-
NOTE: The design is optimized to perform best when run on a large number of matrices, where the total number of matrices is a power of 2.
155-
150+
You can perform the QR-based inversion of 8 matrices repeatedly, as shown below. This step performs the following:
151+
* Generates 8 random matrices.
152+
* Computes the QR-based inversion of the 8 matrices.
153+
* Repeats the decomposition multiple times (specified as a command line argument) to evaluate performance.
156154
157155
1. Run the sample on the FPGA emulator (the kernel executes on the CPU).
158156
Increase the amount of memory that the emulator runtime is permitted to allocate by setting the CL_CONFIG_CPU_FORCE_PRIVATE_MEM_SIZE environment variable before running the executable.
@@ -164,9 +162,9 @@ NOTE: The design is optimized to perform best when run on a large number of matr
164162
qri.fpga_emu.exe (Windows)
165163
```
166164
167-
2. Run the sample on the FPGA device. It is recommended to pass in an optional argument (as shown) when invoking the sample on hardware. Otherwise, the performance will not be representative of the design's throughput. Indeed, the throughput is measured as the total kernel execution time divided by the number of matrices inverted. However, the transfer of the matrices from the host/device to the device/host also takes some time. This memory transfer is performed by chunks of matrices in parallel to the compute kernel. The first/last chunk of matrices transferred will therefore occur with the computation kernel doing nothing. Then, the higher the number of matrices to be inverted, the more accurate the throughput result will be.
165+
2. Run the sample on the FPGA device.
168166
```
169-
./qri.fpga 40960 (Linux)
167+
./qri.fpga (Linux)
170168
```
171169
### Application Parameters
172170
@@ -176,7 +174,7 @@ NOTE: The design is optimized to perform best when run on a large number of matr
176174
177175
### Example of Output
178176
179-
Example output when running the emulator on 2048 matrices (each consisting of 32*32 real numbers):
177+
Example output when running the emulator on 8 matrices (each consisting of 32*32 real numbers):
180178
181179
```
182180
Device name: Intel(R) FPGA Emulation Device
@@ -196,7 +194,7 @@ Verifying results on matrix 0
196194
PASSED
197195
```
198196
199-
Example output when running on Intel® PAC with Intel Arria® 10 GX FPGA for 32768 matrices (each consisting of 32*32 real numbers):
197+
Example output when running on Intel® PAC with Intel Arria® 10 GX FPGA for 8 matrices (each consisting of 32*32 real numbers):
200198
201199
```
202200
Device name: pac_a10 : Intel PAC Platform (pac_f100000)
@@ -246,4 +244,4 @@ The performance was measured by Intel on Jan 31, 2022.
246244
247245
Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
248246
249-
(C) Intel Corporation.
247+
(C) Intel Corporation.

0 commit comments

Comments
 (0)