|
| 1 | +# `Enable distrubted deep learning using Intel® Optimization for Horovod and Tensorflow*` Sample |
| 2 | + |
| 3 | +The `Enable distrubted inference using Intel® Optimization for Horovod and Tensorflow*` sample guides you through the process of how to run inference & training workloads across multi-cards using Intel Optimization for Horovod and TensorFlow* on Intel® dGPU's. |
| 4 | + |
| 5 | + |
| 6 | +| Area | Description |
| 7 | +|:--- |:--- |
| 8 | +| What you will learn | Enable distrubted deep learning using Intel Optimization for Horovod and Tensorflow* |
| 9 | +| Time to complete | 10 minutes |
| 10 | +| Category | Code Optimization |
| 11 | + |
| 12 | +## Purpose |
| 13 | + |
| 14 | +Through the implementation of end-to-end deep learning example, this sample demonstrates important concepts: |
| 15 | +- The performance benefits of distrubuting deep learning workload among multiple dGPUs |
| 16 | + |
| 17 | +## Prerequisites |
| 18 | + |
| 19 | +| Optimized for | Description |
| 20 | +|:--- |:--- |
| 21 | +| OS | Linux; Ubuntu* 18.04 or newer |
| 22 | +| Hardware | Intel® Data Center GPU Max/Flex Series |
| 23 | +| Software | Intel® AI Analytics Toolkit (AI Kit) |
| 24 | + |
| 25 | +### For Local Development Environments |
| 26 | + |
| 27 | +You will need to download and install the following toolkits, tools, and components to use the sample. |
| 28 | + |
| 29 | +- **Intel® AI Analytics Toolkit (AI Kit)** |
| 30 | + |
| 31 | + You can get the AI Kit from [Intel® oneAPI Toolkits](https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html#analytics-kit). <br> See [*Get Started with the Intel® AI Analytics Toolkit for Linux**](https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-ai-linux) for AI Kit installation information and post-installation steps and scripts. |
| 32 | + |
| 33 | +- **Jupyter Notebook** |
| 34 | + |
| 35 | + Install using PIP: `$pip install notebook`. <br> Alternatively, see [*Installing Jupyter*](https://jupyter.org/install) for detailed installation instructions. |
| 36 | + |
| 37 | +- **Intel Extension for TensorFlow** |
| 38 | + |
| 39 | + See *Intel Extension for TensorFlow* [*Installation*](https://www.tensorflow.org/tfx/serving/setup) for detailed installation options. |
| 40 | + |
| 41 | + |
| 42 | +### For Intel® Developer Cloud (Beta) |
| 43 | + |
| 44 | +The necessary tools and components are already installed in the environment other than *intel-optimization-for-horovod* package. See [Intel® Developer Cloud for oneAPI](https://github.com/bjodom/idc) for information. |
| 45 | + |
| 46 | +## Key Implementation Detailes |
| 47 | + |
| 48 | +### Jupyter Notebook |
| 49 | + |
| 50 | +| Notebook | Description |
| 51 | +|:--- |:--- |
| 52 | +|`tensorflow_distributed_inference_with_horovod.ipynb` | Enabling Multi-Card Inference/Training with Intel® Optimizations for Horovod |
| 53 | + |
| 54 | +## Run the distrubuted inference sample using Intel® Optimization for Horovod and Tensorflow: |
| 55 | + |
| 56 | +### On Linux* |
| 57 | + |
| 58 | +1. Set up oneAPI environment by running setvars.sh script |
| 59 | + Default installation: `source /opt/intel/oneapi/setvars.sh` |
| 60 | + |
| 61 | + or `source /path/to/oneapi/setvars.sh` |
| 62 | + |
| 63 | +3. Set up conda environment. |
| 64 | + ``` |
| 65 | + conda create --name tensorflow_xpu --clone tensorflow-gpu |
| 66 | + conda activate tensorflow_xpu |
| 67 | + ``` |
| 68 | +4. Install dependencies: |
| 69 | + If you havent already done so, you will need to install *Jupyter notebook* and *Intel® Optimization for Horovod* |
| 70 | + |
| 71 | + ``` |
| 72 | + pip install intel-optimization-for-horovod |
| 73 | + ``` |
| 74 | + |
| 75 | + ``` |
| 76 | + pip install notebook |
| 77 | + ``` |
| 78 | + |
| 79 | +6. Launch Jupyter Notebook. |
| 80 | + ``` |
| 81 | + jupyter notebook --ip=0.0.0.0 |
| 82 | + ``` |
| 83 | +7. Follow the instructions to open the URL with the token in your browser. |
| 84 | +8. Locate and select the Notebook. |
| 85 | + ``` |
| 86 | + tensorflow_distributed_inference_with_horovod.ipynb |
| 87 | + ```` |
| 88 | +9. Change your Jupyter Notebook kernel to **tensorflow_xpu**. |
| 89 | +10. Run every cell in the Notebook in sequence. |
| 90 | +
|
| 91 | +
|
| 92 | +### Run the Sample on Intel® Developer Cloud (Optional) |
| 93 | +
|
| 94 | +1. If you do not already have an account, follow the readme to request an Intel® Developer Cloud account at [*Setup an Intel® Developer Cloud Account*](https://github.com/bjodom/idc). |
| 95 | +2. On a Linux* system, open a terminal. |
| 96 | +3. SSH into Intel® Developer Cloud. |
| 97 | + ``` |
| 98 | + ssh idc |
| 99 | + ``` |
| 100 | +4. Run oneAPI setvars script. |
| 101 | + `source /opt/intel/oneapi/setvars.sh` |
| 102 | +
|
| 103 | +5. Activate the prepared `tensorflow_xpu` enviornment. |
| 104 | + ``` |
| 105 | + conda activate tensorflow_xpu |
| 106 | + ``` |
| 107 | +6. Install Intel® Optimizations for Horovod |
| 108 | + ``` |
| 109 | + pip install intel-optimization-for-horovod |
| 110 | + ``` |
| 111 | + |
| 112 | +7. Follow the instructions [here](https://github.com/bjodom/idc#jupyter) to launch a jupyter notebook on the Intel® developer cloud. |
| 113 | +8. Locate and select the Notebook. |
| 114 | + ``` |
| 115 | + tensorflow_distributed_inference_with_horovod.ipynb |
| 116 | + ```` |
| 117 | +9. Change the kernel to **tensorflow_xpu**. |
| 118 | +10. Run every cell in the Notebook in sequence. |
| 119 | +
|
| 120 | +
|
| 121 | +#### Troubleshooting |
| 122 | +
|
| 123 | +If you receive an error message, troubleshoot the problem using the **Diagnostics Utility for Intel® oneAPI Toolkits**. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the [Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html) for more information on using the utility. |
| 124 | +
|
| 125 | +
|
| 126 | +## License |
| 127 | +
|
| 128 | +Code samples are licensed under the MIT license. See |
| 129 | +[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details. |
| 130 | +
|
| 131 | +Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt). |
0 commit comments