Skip to content

Commit 45715b8

Browse files
committed
add horovod with tensorflow sample
1 parent c16efc7 commit 45715b8

File tree

4 files changed

+492
-0
lines changed

4 files changed

+492
-0
lines changed
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# `Enable distrubted deep learning using Intel® Optimization for Horovod and Tensorflow*` Sample
2+
3+
The `Enable distrubted inference using Intel® Optimization for Horovod and Tensorflow*` sample guides you through the process of how to run inference & training workloads across multi-cards using Intel Optimization for Horovod and TensorFlow* on Intel® dGPU's.
4+
5+
6+
| Area | Description
7+
|:--- |:---
8+
| What you will learn | Enable distrubted deep learning using Intel Optimization for Horovod and Tensorflow*
9+
| Time to complete | 10 minutes
10+
| Category | Code Optimization
11+
12+
## Purpose
13+
14+
Through the implementation of end-to-end deep learning example, this sample demonstrates important concepts:
15+
- The performance benefits of distrubuting deep learning workload among multiple dGPUs
16+
17+
## Prerequisites
18+
19+
| Optimized for | Description
20+
|:--- |:---
21+
| OS | Linux; Ubuntu* 18.04 or newer
22+
| Hardware | Intel® Data Center GPU Max/Flex Series
23+
| Software | Intel® AI Analytics Toolkit (AI Kit)
24+
25+
### For Local Development Environments
26+
27+
You will need to download and install the following toolkits, tools, and components to use the sample.
28+
29+
- **Intel® AI Analytics Toolkit (AI Kit)**
30+
31+
You can get the AI Kit from [Intel® oneAPI Toolkits](https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html#analytics-kit). <br> See [*Get Started with the Intel® AI Analytics Toolkit for Linux**](https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-ai-linux) for AI Kit installation information and post-installation steps and scripts.
32+
33+
- **Jupyter Notebook**
34+
35+
Install using PIP: `$pip install notebook`. <br> Alternatively, see [*Installing Jupyter*](https://jupyter.org/install) for detailed installation instructions.
36+
37+
- **Intel Extension for TensorFlow**
38+
39+
See *Intel Extension for TensorFlow* [*Installation*](https://www.tensorflow.org/tfx/serving/setup) for detailed installation options.
40+
41+
42+
### For Intel® Developer Cloud (Beta)
43+
44+
The necessary tools and components are already installed in the environment other than *intel-optimization-for-horovod* package. See [Intel® Developer Cloud for oneAPI](https://github.com/bjodom/idc) for information.
45+
46+
## Key Implementation Detailes
47+
48+
### Jupyter Notebook
49+
50+
| Notebook | Description
51+
|:--- |:---
52+
|`tensorflow_distributed_inference_with_horovod.ipynb` | Enabling Multi-Card Inference/Training with Intel® Optimizations for Horovod
53+
54+
## Run the distrubuted inference sample using Intel® Optimization for Horovod and Tensorflow:
55+
56+
### On Linux*
57+
58+
1. Set up oneAPI environment by running setvars.sh script
59+
Default installation: `source /opt/intel/oneapi/setvars.sh`
60+
61+
or `source /path/to/oneapi/setvars.sh`
62+
63+
3. Set up conda environment.
64+
```
65+
conda create --name tensorflow_xpu --clone tensorflow-gpu
66+
conda activate tensorflow_xpu
67+
```
68+
4. Install dependencies:
69+
If you havent already done so, you will need to install *Jupyter notebook* and *Intel® Optimization for Horovod*
70+
71+
```
72+
pip install intel-optimization-for-horovod
73+
```
74+
75+
```
76+
pip install notebook
77+
```
78+
79+
6. Launch Jupyter Notebook.
80+
```
81+
jupyter notebook --ip=0.0.0.0
82+
```
83+
7. Follow the instructions to open the URL with the token in your browser.
84+
8. Locate and select the Notebook.
85+
```
86+
tensorflow_distributed_inference_with_horovod.ipynb
87+
````
88+
9. Change your Jupyter Notebook kernel to **tensorflow_xpu**.
89+
10. Run every cell in the Notebook in sequence.
90+
91+
92+
### Run the Sample on Intel® Developer Cloud (Optional)
93+
94+
1. If you do not already have an account, follow the readme to request an Intel® Developer Cloud account at [*Setup an Intel® Developer Cloud Account*](https://github.com/bjodom/idc).
95+
2. On a Linux* system, open a terminal.
96+
3. SSH into Intel® Developer Cloud.
97+
```
98+
ssh idc
99+
```
100+
4. Run oneAPI setvars script.
101+
`source /opt/intel/oneapi/setvars.sh`
102+
103+
5. Activate the prepared `tensorflow_xpu` enviornment.
104+
```
105+
conda activate tensorflow_xpu
106+
```
107+
6. Install Intel® Optimizations for Horovod
108+
```
109+
pip install intel-optimization-for-horovod
110+
```
111+
112+
7. Follow the instructions [here](https://github.com/bjodom/idc#jupyter) to launch a jupyter notebook on the Intel® developer cloud.
113+
8. Locate and select the Notebook.
114+
```
115+
tensorflow_distributed_inference_with_horovod.ipynb
116+
````
117+
9. Change the kernel to **tensorflow_xpu**.
118+
10. Run every cell in the Notebook in sequence.
119+
120+
121+
#### Troubleshooting
122+
123+
If you receive an error message, troubleshoot the problem using the **Diagnostics Utility for Intel® oneAPI Toolkits**. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the [Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html) for more information on using the utility.
124+
125+
126+
## License
127+
128+
Code samples are licensed under the MIT license. See
129+
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
130+
131+
Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
{
2+
"guid": "1C4791A0-4189-43D0-8B42-A4318E771DEA",
3+
"name": "Run distributed deep learning workloads with Intel® Optimization for Horovod",
4+
"categories": ["Toolkit/oneAPI AI And Analytics/Features And Functionality"],
5+
"description": "This sample demonstrates how to run multi-card inference and training on Intel GPUs using Intel Optimization for Horovod and TensorFlow",
6+
"builder": ["cli"],
7+
"toolchain": ["jupyter"],
8+
"languages": [{"python":{}}],
9+
"os":["linux"],
10+
"targetDevice": ["GPU"],
11+
"ciTests": {
12+
"linux": [
13+
{
14+
"env": [
15+
"source /opt/intel/oneapi/setvars.sh --force",
16+
"conda create --name tensorflow_horovod --clone tensorflow-gpu",
17+
"conda activate tensorflow_horovod",
18+
"pip install intel-optimization-for-horovod",
19+
"~/.conda/envs/user_pytorch/bin/python -m ipykernel install --user --name=tensorflow_horovod"
20+
],
21+
"id": "distributed_learning_tensorflow_horovod_py",
22+
"steps": [
23+
"source activate tensorflow_horovod",
24+
"python scripts/ci_test.py"
25+
]
26+
}
27+
]
28+
},
29+
"expertise": "Getting Started"
30+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
def runJupyterNotebook(input_notebook_filename, output_notebook_filename, conda_env, fdpath='./'):
2+
import nbformat
3+
import os
4+
from nbconvert.preprocessors import ExecutePreprocessor
5+
from nbconvert.preprocessors import CellExecutionError
6+
if os.path.isfile(input_notebook_filename) is False:
7+
print("No Jupyter notebook found : ",input_notebook_filename)
8+
try:
9+
with open(input_notebook_filename) as f:
10+
nb = nbformat.read(f, as_version=4)
11+
ep = ExecutePreprocessor(timeout=6000, kernel_name=conda_env, allow_errors=True)
12+
ep.preprocess(nb, {'metadata': {'path': fdpath}})
13+
with open(output_notebook_filename, 'w', encoding='utf-8') as f:
14+
nbformat.write(nb, f)
15+
return 0
16+
except CellExecutionError:
17+
print("Exception!")
18+
return -1
19+
20+
runJupyterNotebook(os.path.join(os.path.dirname(os.path.realpath(__file__)),'tensorflow_with_horovod.ipynb'), 'tensorflow_horovod')

0 commit comments

Comments
 (0)