|
| 1 | +This SNAP code is ported to AMD's HCC programming environment to support |
| 2 | +execution on accelerator devices, specifically GPU's. |
| 3 | + |
| 4 | +This HCC port started with the OpenCL C code contributed by Tom Deakin of |
| 5 | +the University of Bristol. That code is found at: |
| 6 | +https://github.com/UoB-HPC/SNAP_MPI_OpenCL |
| 7 | + |
| 8 | +Minor modifications are made to the code before the HCC porting effort. |
| 9 | +These changes include adding a inner loop counter and print statement to more |
| 10 | +closely match the original SNAP fortran code, resolving an issue with |
| 11 | +correctness of the first iteration, and arranging some parameter defaults to |
| 12 | +more closely match the original fortran. All of these changes were primarily |
| 13 | +for valiadation purposes so that the OpenCL output would match the fortran |
| 14 | +output for a nearly matching input file. After these changes, it is possible |
| 15 | +to directly compare the OpenCL output to the Fortran output. |
| 16 | + |
| 17 | +Next the code was ported to the AMD HCC environment. |
| 18 | + |
| 19 | +These ports parallelize and offload key parts of the SNAP program, |
| 20 | +including the dim3 sweep inner loop, scalar flux and moment calculations, |
| 21 | +and inner and outer source computations. |
| 22 | + |
| 23 | +The compiler used is an experimental clang compiler. This compiler accepts |
| 24 | +HCC (c++amp) and emits code suitable for a GPU target device, in particular AMD |
| 25 | +APUs and recent AMD discrete GPUs. The compiler revision in use at the time of |
| 26 | +writing is from package hcc_hsail 0.10.16253-6ceea64-ec648b0. |
| 27 | + |
| 28 | +The primary GPU work arrangement is to run the xyz space coordinates at the |
| 29 | +team level, and groups*angles at the thread level. |
| 30 | + |
| 31 | +Reductions are fully working on a workgroup basis. The reductions are all |
| 32 | +computed accross the number of angles. Since the code inplements these |
| 33 | +reductions only across a workgroup, the NANG parameter is limited to the size |
| 34 | +of a workgroup, which is 1024. |
| 35 | + |
| 36 | + |
| 37 | +The Makefile builds on our AMD system just by typing "make" and builds the |
| 38 | +target exe snap. |
| 39 | +To run, we use the command: |
| 40 | +snap snap_input |
| 41 | + |
| 42 | +Sample output: |
| 43 | + SNAP: SN (Discrete Ordinates) Application Proxy |
| 44 | + MPI+HCC port |
| 45 | + Run on Wed Aug 17 09:36:10 2016 |
| 46 | + |
| 47 | + |
| 48 | +******************************************************** |
| 49 | + Input Parameters |
| 50 | +******************************************************** |
| 51 | + Geometry |
| 52 | + Problem size: 0.100 x 0.100 x 0.100 |
| 53 | + Cells: 8 x 8 x 8 |
| 54 | + Cell size: 0.013 x 0.013 x 0.013 |
| 55 | + |
| 56 | + Discrete Ordinates |
| 57 | + Angles per octant: 64 |
| 58 | + Moments: 2 |
| 59 | + "Computational" moments: 4 |
| 60 | + |
| 61 | + Energy groups |
| 62 | + Number of groups: 30 |
| 63 | + |
| 64 | + Timesteps |
| 65 | + Timesteps: 10 |
| 66 | + Simulation time: 0.100 |
| 67 | + Time delta: 0.010 |
| 68 | + |
| 69 | + Iterations |
| 70 | + Max outers per timestep: 10 |
| 71 | + Max inners per outer: 5 |
| 72 | + Stopping criteria |
| 73 | + Inner convergence: 1.00E-04 |
| 74 | + Outer convergence: 1.00E-02 |
| 75 | + |
| 76 | + MPI decomposition |
| 77 | + Rank layout: 1 x 1 x 1 |
| 78 | + Chunk size: 8 |
| 79 | + |
| 80 | +device : AMD HSA Agent Kaveri0 |
| 81 | +tile static memory: 65536 |
| 82 | +required memory: 131MB |
| 83 | +******************************************************** |
| 84 | + Iteration Monitor |
| 85 | +******************************************************** |
| 86 | + Timestep 0 |
| 87 | + Outer Difference Inners |
| 88 | + 0 5.0678e-02 3 |
| 89 | + 1 4.1302e-02 3 |
| 90 | + 2 7.4568e-04 2 |
| 91 | + |
| 92 | + Timestep= 0 No. Outers= 3 No. Inners= 196 |
| 93 | + |
| 94 | + Population: 0.00 |
| 95 | + |
| 96 | + Timestep 1 |
| 97 | + Outer Difference Inners |
| 98 | + 0 7.2102e-02 3 |
| 99 | + 1 2.9102e-02 2 |
| 100 | + 2 4.3693e-04 2 |
| 101 | + |
| 102 | + Timestep= 1 No. Outers= 3 No. Inners= 195 |
| 103 | + |
| 104 | + Population: 0.00 |
| 105 | + |
| 106 | + Timestep 2 |
| 107 | + Outer Difference Inners |
| 108 | + 0 6.8614e-02 3 |
| 109 | + 1 1.9153e-02 2 |
| 110 | + 2 2.3149e-04 2 |
| 111 | + |
| 112 | + Timestep= 2 No. Outers= 3 No. Inners= 195 |
| 113 | + |
| 114 | + Population: 0.00 |
| 115 | + |
| 116 | + Timestep 3 |
| 117 | + Outer Difference Inners |
| 118 | + 0 6.7457e-02 3 |
| 119 | + 1 1.6605e-02 2 |
| 120 | + 2 1.8773e-04 2 |
| 121 | + |
| 122 | + Timestep= 3 No. Outers= 3 No. Inners= 195 |
| 123 | + |
| 124 | + Population: 0.00 |
| 125 | + |
| 126 | + Timestep 4 |
| 127 | + Outer Difference Inners |
| 128 | + 0 6.6755e-02 3 |
| 129 | + 1 1.6078e-02 2 |
| 130 | + 2 1.9246e-04 2 |
| 131 | + |
| 132 | + Timestep= 4 No. Outers= 3 No. Inners= 195 |
| 133 | + |
| 134 | + Population: 0.00 |
| 135 | + |
| 136 | + Timestep 5 |
| 137 | + Outer Difference Inners |
| 138 | + 0 6.4709e-02 3 |
| 139 | + 1 1.5481e-02 2 |
| 140 | + 2 1.8692e-04 2 |
| 141 | + |
| 142 | + Timestep= 5 No. Outers= 3 No. Inners= 195 |
| 143 | + |
| 144 | + Population: 0.00 |
| 145 | + |
| 146 | + Timestep 6 |
| 147 | + Outer Difference Inners |
| 148 | + 0 6.4372e-02 3 |
| 149 | + 1 1.6188e-02 2 |
| 150 | + 2 1.9009e-04 2 |
| 151 | + |
| 152 | + Timestep= 6 No. Outers= 3 No. Inners= 195 |
| 153 | + |
| 154 | + Population: 0.00 |
| 155 | + |
| 156 | + Timestep 7 |
| 157 | + Outer Difference Inners |
| 158 | + 0 6.6715e-02 3 |
| 159 | + 1 1.5658e-02 2 |
| 160 | + 2 1.8843e-04 2 |
| 161 | + |
| 162 | + Timestep= 7 No. Outers= 3 No. Inners= 195 |
| 163 | + |
| 164 | + Population: 0.00 |
| 165 | + |
| 166 | + Timestep 8 |
| 167 | + Outer Difference Inners |
| 168 | + 0 6.7384e-02 3 |
| 169 | + 1 1.5943e-02 2 |
| 170 | + 2 1.8962e-04 2 |
| 171 | + |
| 172 | + Timestep= 8 No. Outers= 3 No. Inners= 195 |
| 173 | + |
| 174 | + Population: 0.00 |
| 175 | + |
| 176 | + Timestep 9 |
| 177 | + Outer Difference Inners |
| 178 | + 0 6.7405e-02 3 |
| 179 | + 1 1.5835e-02 2 |
| 180 | + 2 1.8873e-04 2 |
| 181 | + |
| 182 | + Timestep= 9 No. Outers= 3 No. Inners= 194 |
| 183 | + |
| 184 | + Population: 0.00 |
| 185 | + |
| 186 | + |
| 187 | +******************************************************** |
| 188 | + Timing Report |
| 189 | +******************************************************** |
| 190 | + Setup 0.020s |
| 191 | + Outer source 0.000s |
| 192 | + Outer parameters 0.000s |
| 193 | + Inner source 0.000s |
| 194 | + Sweeps 5.208s |
| 195 | + MPI Send time 0.000s |
| 196 | + MPI Recv time 0.076s |
| 197 | + PCIe transfer time 0.000s |
| 198 | + Compute time 5.132s |
| 199 | + Scalar flux reductions 0.000s |
| 200 | + Convergence checking 0.013s |
| 201 | + Other 2.223s |
| 202 | + Total simulation 7.444s |
| 203 | + |
| 204 | + Grind time 13.332ns |
| 205 | +******************************************************** |
| 206 | + |
| 207 | +snap_input: |
| 208 | +! Input from namelist |
| 209 | +&invar |
| 210 | + nthreads=1 |
| 211 | + nnested=1 |
| 212 | + npex=1 |
| 213 | + npey=1 |
| 214 | + npez=1 |
| 215 | + ndimen=3 |
| 216 | + nx=8 |
| 217 | + lx=0.1 |
| 218 | + ny=8 |
| 219 | + ly=0.1 |
| 220 | + nz=8 |
| 221 | + lz=0.1 |
| 222 | + ichunk=8 |
| 223 | + nmom=2 |
| 224 | + nang=64 |
| 225 | + ng=30 |
| 226 | + mat_opt=0 |
| 227 | + src_opt=0 |
| 228 | + timedep=1 |
| 229 | + it_det=0 |
| 230 | + tf=0.1 |
| 231 | + nsteps=10 |
| 232 | + iitm=5 |
| 233 | + oitm=10 |
| 234 | + epsi=1.E-4 |
| 235 | + fluxp=0 |
| 236 | + scatp=0 |
| 237 | + fixup=1 |
| 238 | + angcpy=2 |
| 239 | +/ |
| 240 | + |
0 commit comments