Skip to content

Commit d8ee396

Browse files
authored
Update matmul-advisor readme (oneapi-src#584)
Added instructions for how to run in batch mode on DevCloud.
1 parent 496592c commit d8ee396

File tree

2 files changed

+133
-3
lines changed

2 files changed

+133
-3
lines changed

Tools/Advisor/matrix_multiply_advisor/README.md

Lines changed: 132 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,15 +24,19 @@ Code samples are licensed under the MIT license. See
2424

2525
Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt)
2626

27-
## How to Build
27+
28+
### Running Samples In DevCloud
29+
Running samples in the Intel DevCloud requires you to specify a compute node. For specific instructions, jump to [Run the Matrix Multiply Advisor sample on the DevCloud](#run-matmul-advisor-on-devcloud)
30+
31+
## How to Build
2832

2933
This sample contains 3 version of matrix multiplication using DPC++:
3034

3135
multiply1 – basic implementation of matrix multiply using DPC++
3236
multiply1_1 – basic implementation that replaces the buffer store with a local accessor “acc” to reduce memory traffic
3337
multiply1_2 – the basic implementation, plus adding the local accessor and matrix tiling
3438

35-
Edit the line in multiply.h to select the version of the multiply function:
39+
Edit the line in src/multiply.hpp to select the version of the multiply function:
3640
#define MULTIPLY multiply1
3741

3842

@@ -68,8 +72,134 @@ Edit the line in multiply.h to select the version of the multiply function:
6872

6973
Elapsed Time: 0.539631s
7074

75+
76+
## Running an Intel Advisor analysis
77+
------------------------------------------
78+
79+
See the Advisor Cookbook here: https://software.intel.com/en-us/advisor-cookbook
80+
81+
82+
### Running the Matrix Multiply Advisor sample in the DevCloud<a name="run-matmul-advisor-on-devcloud"></a>
83+
This sample contains 3 version of matrix multiplication using DPC++:
84+
85+
multiply1 – basic implementation of matrix multiply using DPC++
86+
multiply1_1 – basic implementation that replaces the buffer store with a local accessor “acc” to reduce memory traffic
87+
multiply1_2 – the basic implementation, plus adding the local accessor and matrix tiling
88+
89+
Edit the line in src/multiply.hpp to select the version of the multiply function:
90+
#define MULTIPLY multiply1
91+
92+
1. Open a terminal on your Linux system.
93+
2. Log in to DevCloud.
94+
```
95+
ssh devcloud
96+
```
97+
3. Download the samples.
98+
```
99+
git clone https://github.com/oneapi-src/oneAPI-samples.git
100+
```
101+
102+
4. Change directories to the Matrix Multiply Advisor sample directory.
103+
```
104+
cd ~/oneAPI-samples/Tools/Advisor/matrix_multiply_advisor
105+
```
106+
#### Build and run the sample in batch mode
107+
The following describes the process of submitting build and run jobs to PBS.
108+
A job is a script that is submitted to PBS through the qsub utility. By default, the qsub utility does not inherit the current environment variables or your current working directory. For this reason, it is necessary to submit jobs as scripts that handle the setup of the environment variables. In order to address the working directory issue, you can either use absolute paths or pass the -d \<dir\> option to qsub to set the working directory.
109+
110+
#### Create the Job Scripts
111+
1. Create a build.sh script with your preferred text editor:
112+
```
113+
nano build.sh
114+
```
115+
2. Add this text into the build.sh file:
116+
```
117+
source /opt/intel/inteloneapi/setvars.sh > /dev/null 2>&1
118+
mkdir build
119+
cd build
120+
cmake ..
121+
make
122+
```
123+
124+
3. Save and close the build.sh file.
125+
126+
4. Create a run.sh script with with your preferred text editor:
127+
```
128+
nano run.sh
129+
```
130+
131+
5. Add this text into the run.sh file:
132+
```
133+
source /opt/intel/inteloneapi/setvars.sh > /dev/null 2>&1
134+
cd build
135+
make run
136+
```
137+
6. Save and close the run.sh file.
138+
139+
#### Build and run
140+
Jobs submitted in batch mode are placed in a queue waiting for the necessary resources (compute nodes) to become available. The jobs will be executed on a first come basis on the first available node(s) having the requested property or label.
141+
1. Build the sample on a gpu node.
142+
143+
```
144+
qsub -l nodes=1:gpu:ppn=2 -d . build.sh
145+
```
146+
147+
Note: -l nodes=1:gpu:ppn=2 (lower case L) is used to assign one full GPU node to the job.
148+
Note: The -d . is used to configure the current folder as the working directory for the task.
149+
150+
2. In order to inspect the job progress, use the qstat utility.
151+
```
152+
watch -n 1 qstat -n -1
153+
```
154+
Note: The watch -n 1 command is used to run qstat -n -1 and display its results every second. If no results are displayed, the job has completed.
155+
156+
3. After the build job completes successfully, run the sample on a gpu node:
157+
```
158+
qsub -l nodes=1:gpu:ppn=2 -d . run.sh
159+
```
160+
4. When a job terminates, a couple of files are written to the disk:
161+
162+
<script_name>.sh.eXXXX, which is the job stderr
163+
164+
<script_name>.sh.oXXXX, which is the job stdout
165+
166+
Here XXXX is the job ID, which gets printed to the screen after each qsub command.
167+
168+
5. Inspect the output of the sample.
169+
```
170+
cat run.sh.oXXXX
171+
```
172+
You should see output similar to this:
173+
174+
```
175+
Scanning dependencies of target run
176+
Address of buf1 = 0x7f570456f010
177+
Offset of buf1 = 0x7f570456f180
178+
Address of buf2 = 0x7f5703d6e010
179+
Offset of buf2 = 0x7f5703d6e1c0
180+
Address of buf3 = 0x7f570356d010
181+
Offset of buf3 = 0x7f570356d100
182+
Address of buf4 = 0x7f5702d6c010
183+
Offset of buf4 = 0x7f5702d6c140
184+
Using multiply kernel: multiply1
185+
Running on Intel(R) UHD Graphics P630 [0x3e96]
186+
Elapsed Time: 1.79388s
187+
Built target run
188+
```
189+
190+
6. Remove the stdout and stderr files and clean-up the project files.
191+
```
192+
rm build.sh.*; rm run.sh.*; make clean
193+
```
194+
7. Disconnect from the Intel DevCloud.
195+
```
196+
exit
197+
```
71198
## Running an Intel Advisor analysis
72199
------------------------------------------
73200

74201
See the Advisor Cookbook here: https://software.intel.com/en-us/advisor-cookbook
75202

203+
### Build and run additional samples
204+
Several sample programs are available for you to try, many of which can be compiled and run in a similar fashion to this sample. Experiment with running the various samples on different kinds of compute nodes or adjust their source code to experiment with different workloads.
205+

Tools/Advisor/matrix_multiply_advisor/src/multiply.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ typedef TYPE Array[NUM];
3232

3333
// Select which multiply kernel to use via the following macro so that the
3434
// kernel being used can be reported when the test is run.
35-
#define MULTIPLY multiply1
35+
#define MULTIPLY multiply1_1
3636

3737
extern void multiply1(int msize, int tidx, int numt, TYPE a[][NUM],
3838
TYPE b[][NUM], TYPE c[][NUM], TYPE t[][NUM]);

0 commit comments

Comments
 (0)