Skip to content

Commit 32f6f6c

Browse files
authored
Pushing iso3dfd_omp_offload code sample (oneapi-src#105)
* Initial commit for iso3dfd_dpcpp code sample Signed-off-by: Gogar, Sunny L <[email protected]> * Update License.txt * Update sample.json * Adding iso3dfd_omp_offload and changing dpc++ compile for windows to dpcpp * Delete .nfs000000043228fc3f00000140 * Removing build directory accidently checked in * Update sample.json Fixing a missing comma * Adding couple of changes as per Paul's recommendation * Updating some variable names as per guidelines * Moving iso3dfd_omp_offload to C++ folder
1 parent a915158 commit 32f6f6c

File tree

10 files changed

+1006
-5
lines changed

10 files changed

+1006
-5
lines changed
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# CMakeLists.txt for ISO3DFD_OMP_OFFLOAD project
2+
cmake_minimum_required (VERSION 3.0)
3+
set(CMAKE_CXX_COMPILER "icpx")
4+
project (iso3dfd_omp_offload)
5+
add_subdirectory (src)
6+
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Copyright Intel Corporation
2+
3+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4+
5+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6+
7+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
# `ISO3DFD OpenMP Offload` Sample
2+
3+
The ISO3DFD sample refers to Three-Dimensional Finite-Difference Wave Propagation in Isotropic Media. It is a three-dimensional stencil to simulate a wave propagating in a 3D isotropic medium and shows some of the more common challenges and techniques when targeting OMP Offload devices (GPU) in more complex applications to achieve good performance.
4+
5+
| Optimized for | Description
6+
|:--- |:---
7+
| OS | Linux* Ubuntu* 18.04
8+
| Hardware | Skylake with GEN9 or newer
9+
| Software | Intel&reg; oneAPI DPC++/C++ Compiler;
10+
| What you will learn | How to offload the computation to GPU using Intel&reg; oneAPI DPC++/C++ Compiler
11+
| Time to complete | 15 minutes
12+
13+
Performance number tabulation
14+
15+
| iso3dfd_omp_offload sample | Performance data
16+
|:--- |:---
17+
| Default Baseline version | 1.0
18+
| Optimized version 1 | 1.11x
19+
| Optimized version 2 | 1.48x
20+
| Optimized version 3 | 1.60x
21+
22+
23+
## Purpose
24+
25+
ISO3DFD is a finite difference stencil kernel for solving the 3D acoustic isotropic wave equation which can be used as a proxy for propogating a seismic wave. Kernels in this sample are implemented as 16th order in space, with symmetric coefficients, and 2nd order in time scheme without boundary conditions.. Using OpenMP Offload, the sample can explicitly run on the GPU to propagate a seismic wave which is a compute intensive task.
26+
27+
The code will attempt to find an available GPU or OpenMP Offload capable device and exit if a compatible device is not detected. By default, the output will print the device name where the OpenMP Offload code ran along with the grid computation metrics - flops and effective throughput. For validating results, a OpenMP/CPU-only version of the application will be run on host/CPU and results will be compared to the OpenMP Offload version.
28+
29+
The code also demonstrates some of the common optimization techniques which can be used to improve performance of 3D-stencil code running on a GPU device.
30+
31+
## Key Implementation Details
32+
33+
The basic OpenMP Offload implementation explained in the code includes the use of the following :
34+
* OpenMP offload target data map construct
35+
* Default Baseline version demonstrates use of OpenMP offload target parallel for construct with collapse
36+
* Optimized version 1 demonstrates use of OpenMP offload teams distribute construct and use of num_teams and thread_limit clause
37+
* Incremental Optimized version 2 demonstrates use of OpenMP offload teams distribute construct with improved data-access pattern
38+
* Incremental Optimized version 3 demonstrates use of OpenMP CPU threads along with OpenMP offload target construct
39+
40+
41+
## License
42+
43+
This code sample is licensed under MIT license.
44+
45+
46+
## Building the `ISO3DFD` Program for GPU
47+
48+
### Running Samples In DevCloud
49+
If running a sample in the Intel DevCloud, remember that you must specify the compute node (CPU, GPU) as well whether to run in batch or interactive mode. For more information see the Intel® oneAPI Base Toolkit Get Started Guide (https://devcloud.intel.com/oneapi/get-started/base-toolkit/) and Intel® oneAPI HPC Toolkit Get Started Guide (https://devcloud.intel.com/oneapi/get-started/hpc-toolkit/)
50+
51+
### On a Linux* System
52+
Perform the following steps:
53+
1. Build the program using the following `cmake` commands.
54+
```
55+
$ mkdir build
56+
$ cd build
57+
$ cmake ..
58+
$ make -j
59+
```
60+
61+
> Note: by default, executable is build with default baseline version. You can build the kernel with optimized versions with the following:
62+
```
63+
cmake -DUSE_OPT1=1 ..
64+
make -j
65+
```
66+
```
67+
cmake -DUSE_OPT2=1 ..
68+
make -j
69+
```
70+
```
71+
cmake -DUSE_OPT3=1 ..
72+
make -j
73+
```
74+
75+
2. Run the program :
76+
```
77+
make run
78+
```
79+
80+
3. Clean the program using:
81+
```
82+
make clean
83+
```
84+
85+
## Running the Sample
86+
```
87+
make run
88+
```
89+
90+
### Application Parameters
91+
You can modify the ISO3DFD parameters from the command line.
92+
* Configurable Application Parameters
93+
94+
Usage: src/iso3dfd n1 n2 n3 n1_block n2_block n3_block Iterations
95+
96+
n1 n2 n3 : Grid sizes for the stencil
97+
n1_block n2_block n3_block : cache block sizes for CPU
98+
: OR TILE sizes for OMP Offload
99+
Iterations : No. of timesteps.
100+
101+
### Example of Output with default baseline version
102+
```
103+
Grid Sizes: 256 256 256
104+
Tile sizes ignored for OMP Offload
105+
--Using Baseline version with omp target with collapse
106+
Memory Usage (MBytes): 230
107+
--------------------------------------
108+
time : 4.827 secs
109+
throughput : 347.57 Mpts/s
110+
flops : 21.2018 GFlops
111+
bytes : 4.17084 GBytes/s
112+
113+
--------------------------------------
114+
115+
--------------------------------------
116+
Checking Results ...
117+
Final wavefields from OMP Offload device and CPU are equivalent: Success
118+
--------------------------------------
119+
```
120+
121+
### Example of Output with Optimized version 3
122+
```
123+
Grid Sizes: 256 256 256
124+
Tile sizes: 16 8 64
125+
Using Optimized target code - version 3:
126+
--OMP Threads + OMP_Offload with Tiling and Z Window
127+
Memory Usage (MBytes): 230
128+
--------------------------------------
129+
time : 3.014 secs
130+
throughput : 556.643 Mpts/s
131+
flops : 33.9552 GFlops
132+
bytes : 6.67971 GBytes/s
133+
134+
--------------------------------------
135+
136+
--------------------------------------
137+
Checking Results ...
138+
Final wavefields from OMP Offload device and CPU are equivalent: Success
139+
140+
```
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
//==============================================================
2+
// Copyright © 2020 Intel Corporation
3+
//
4+
// SPDX-License-Identifier: MIT
5+
// =============================================================
6+
7+
#include <omp.h>
8+
#include <chrono>
9+
#include <cmath>
10+
#include <cstring>
11+
#include <ctime>
12+
#include <fstream>
13+
#include <iostream>
14+
15+
constexpr float dt = 0.002f;
16+
constexpr float dxyz = 50.0f;
17+
constexpr unsigned int kHalfLength = 8;
18+
constexpr unsigned int kMaxTeamSizeLimit = 256;
19+
20+
#define STENCIL_LOOKUP(ir) \
21+
(coeff[ir] * ((ptr_prev[ix + ir] + ptr_prev[ix - ir]) + \
22+
(ptr_prev[ix + ir * n1] + ptr_prev[ix - ir * n1]) + \
23+
(ptr_prev[ix + ir * dimn1n2] + ptr_prev[ix - ir * dimn1n2])))
24+
25+
#define STENCIL_LOOKUP_Z(ir) \
26+
(coeff[ir] * (front[ir] + back[ir - 1] + ptr_prev_base[gid + ir] + \
27+
ptr_prev_base[gid - ir] + ptr_prev_base[gid + ir * n1] + \
28+
ptr_prev_base[gid - ir * n1]))
29+
30+
void Usage(const std::string& programName);
31+
32+
void PrintStats(double time, unsigned int n1, unsigned int n2, unsigned int n3,
33+
unsigned int num_iterations);
34+
35+
bool WithinEpsilon(float* output, float* reference, unsigned int dim_x,
36+
unsigned int dim_y, unsigned int dim_z, unsigned int radius,
37+
const int zadjust, const float delta);
38+
39+
void Initialize(float* ptr_prev, float* ptr_next, float* ptr_vel,
40+
unsigned int n1, unsigned int n2, unsigned int n3);
41+
42+
bool VerifyResults(float* next_base, float* prev_base, float* vel_base,
43+
float* coeff, unsigned int n1, unsigned int n2,
44+
unsigned int n3, unsigned int num_iterations,
45+
unsigned int n1_block, unsigned int n2_block,
46+
unsigned int n3_block);
47+
48+
bool ValidateInput(unsigned int n1, unsigned int n2, unsigned int n3,
49+
unsigned int n1_block, unsigned int n2_block,
50+
unsigned int n3_block, unsigned int num_iterations);
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
{
2+
"guid": "E3407632-7F3D-4B5B-A956-5155408D7468",
3+
"name": "iso3dfd_omp_offload",
4+
"categories": [ "Toolkit/Intel® oneAPI HPC Toolkit" ],
5+
"description": "A finite difference stencil kernel for solving 3D acoustic isotropic wave equation",
6+
"toolchain": [ "icpx" ],
7+
"targetDevice": [ "GPU" ],
8+
"languages": [ { "cpp": {} } ],
9+
"os": [ "linux" ],
10+
"builder": [ "cmake" ],
11+
"ciTests": {
12+
"linux": [{
13+
"steps": [
14+
"mkdir build",
15+
"cd build",
16+
"cmake ..",
17+
"make",
18+
"make run"
19+
]
20+
}]
21+
}
22+
}
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
OPTION(VERIFY_RESULTS "Use Results Validation" ON)
2+
OPTION(USE_OPT1 "Select Optimized target code - version 1" OFF)
3+
OPTION(USE_OPT2 "Select Optimized target code - version 2" OFF)
4+
OPTION(USE_OPT3 "Select Optimized target code - version 3" OFF)
5+
6+
set(CMAKE_BUILD_TYPE "RelWithDebInfo")
7+
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fiopenmp -std=c++17 -fopenmp-targets=spir64 -O3 -D__STRICT_ANSI__ ")
8+
9+
set(SOURCES iso3dfd.cpp utils.cpp)
10+
11+
if(USE_OPT3)
12+
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_OPT3")
13+
message("-- Using Optimized target code - version 3")
14+
elseif(USE_OPT2)
15+
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_OPT2")
16+
message("-- Using Optimized target code - version 2")
17+
elseif(USE_OPT1)
18+
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_OPT1")
19+
message("-- Using Optimized target code - version 1")
20+
else()
21+
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_BASELINE")
22+
message("-- Using Baseline target code")
23+
endif()
24+
25+
if(VERIFY_RESULTS)
26+
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DVERIFY_RESULTS")
27+
set(SOURCES ${SOURCES} iso3dfd_verify.cpp)
28+
endif(VERIFY_RESULTS)
29+
30+
31+
add_executable (iso3dfd ${SOURCES})
32+
33+
add_custom_target (run
34+
COMMAND iso3dfd 256 256 256 16 8 64 100
35+
WORKING_DIRECTORY ${CMAKE_PROJECT_DIR}
36+
)
37+

0 commit comments

Comments
 (0)