Skip to content

Commit 370f62e

Browse files
author
min-jean-cho
committed
Merge branch 'launcher' of https://github.com/min-jean-cho/serve into launcher
2 parents 6d1e315 + 9fee5b4 commit 370f62e

File tree

1 file changed

+60
-14
lines changed
  • examples/intel_extension_for_pytorch

1 file changed

+60
-14
lines changed

examples/intel_extension_for_pytorch/README.md

Lines changed: 60 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Here we show how to use TorchServe with IPEX.
77
* [Install Intel Extension for PyTorch](#install-intel-extension-for-pytorch)
88
* [Serving model with Intel Extension for PyTorch](#serving-model-with-intel-extension-for-pytorch)
99
* [Creating and Exporting INT8 model for IPEX](#creating-and-exporting-int8-model-for-ipex)
10+
* [Torchserve with Launcher](#torchserve-with-launcher)
1011
* [Benchmarking with Launcher](#benchmarking-with-launcher)
1112

1213

@@ -98,36 +99,81 @@ Once the serialized file ( `.pt`) is created, it can be used with `torch-model-a
9899
torch-model-archiver --model-name rn50_ipex_int8 --version 1.0 --serialized-file rn50_int8_jit.pt --handler image_classifier
99100
```
100101
### 3. Start Torchserve to serve the model
101-
Make sure to set `ipex_enable = True` in `config.properties`. Use the following command to start Torchserve with IPEX.
102+
Make sure to set `ipex_enable=true` in `config.properties`. Use the following command to start Torchserve with IPEX.
102103
```
103104
torchserve --start --ncs --model-store model_store --ts-config config.properties
104105
```
105106

106107
### 4. Registering and Deploying model
107108
Registering and deploying the model follows the same steps shown [here](https://pytorch.org/serve/use_cases.html).
108109

109-
## Benchmarking with Launcher
110-
`intel_extension_for_pytorch.cpu.launch` launcher can be used with Torchserve official [benchmark](https://github.com/pytorch/serve/tree/master/benchmarks) to launch server and benchmark requests with optimal configuration on Intel hardware.
110+
## Torchserve with Launcher
111+
Launcher is a script to automate the process of tunining configuration setting on intel hardware to boost performance. Tuning configurations such as OMP_NUM_THREADS, thread affininty, memory allocator can have a dramatic effect on performance. Please refer to [here](https://github.com/intel/intel-extension-for-pytorch/blob/master/docs/tutorials/performance_tuning/tuning_guide.md) and [here](https://github.com/intel/intel-extension-for-pytorch/blob/master/docs/tutorials/performance_tuning/tuning_guide.md) for details on performance tuning with launcher.
112+
113+
All it needs to be done to use Torchserve with launcher is to set its configuration in `config.properties`.
111114

112-
In this section, we provde an example of using launcher to benchmark on a single socket and using all physical cores on that socket. This is to avoid thread oversupscription while using all resources.
113115

114-
### 1. Launcher configuration
115-
All it needs to be done to use Torchserve with launcher is to set its configuration at `config.properties` in the benchmark directory. `ncore_per_instance` and other tunable configuration of launcher can be set as appropriately by checking the hardware configuration.
116+
Add the following lines in `config.properties` to use launcher with its default configuration.
117+
```
118+
ipex_enable=true
119+
cpu_launcher_enable=true
120+
```
121+
122+
Launcher by default uses `numactl` if its installed to ensure socket is pinned and thus memory is allocated from local numa node. To use launcher without numactl, add the following lines in `config.properties`.
123+
```
124+
ipex_enable=true
125+
cpu_launcher_enable=true
126+
cpu_launcher_args=--disable_numactl
127+
```
116128

129+
Launcher by default uses only non-hyperthreaded cores if hyperthreading is present to avoid core compute resource sharing. To use launcher with all cores, both physical and logical, add the following lines in `config.properties`.
130+
```
131+
ipex_enable=true
132+
cpu_launcher_enable=true
133+
cpu_launcher_args=--use_logical_core
134+
```
117135
Please refer to [here](https://github.com/intel/intel-extension-for-pytorch/blob/master/docs/tutorials/performance_tuning/launch_script.md) for a full list of tunable configuration of launcher.
118136

119-
Add the following lines to `config.properties` in the benchmark directory.
137+
## Benchmarking with Launcher
138+
Launcher can be used with Torchserve official [benchmark](https://github.com/pytorch/serve/tree/master/benchmarks) to launch server and benchmark requests with optimal configuration on Intel hardware.
139+
140+
In this section we provide examples of benchmarking with launcher with its default configuration.
141+
142+
Add the following lines to `config.properties` in the benchmark directory to use launcher with its default setting.
120143
```
121-
ipex_enable=True
144+
ipex_enable=true
122145
cpu_launcher_enable=true
123-
cpu_launcher_args=--socket_id 0 --ncore_per_instance 28
124146
```
125147

126-
### 2. Benchmarking with Launcher
127148
The rest of the steps for benchmarking follows the same steps shown [here](https://github.com/pytorch/serve/tree/master/benchmarks).
128149

129-
CPU usage is shown as below.
130-
![sample_launcher](https://user-images.githubusercontent.com/93151422/143912711-cacbd38b-4be9-430a-810b-e5d3a9be9732.gif)
150+
`model_log.log` contains information and command that were used for this execution launch.
151+
152+
153+
CPU usage on a machine with Intel(R) Xeon(R) Platinum 8180 CPU, 2 sockets, 28 cores per socket, 2 threads per core is shown as below:
154+
![launcher_default_2sockets](https://user-images.githubusercontent.com/93151422/144373537-07787510-039d-44c4-8cfd-6afeeb64ac78.gif)
131155

132-
### Note about number of worker
133-
We recommend using launcher with single worker only.
156+
```
157+
$ cat logs/model_log.log
158+
2021-12-01 21:22:40,096 - __main__ - WARNING - Both TCMalloc and JeMalloc are not found in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or /home/<user>/.local/lib/ so the LD_PRELOAD environment variable will not be set. This may drop the performance
159+
2021-12-01 21:22:40,096 - __main__ - INFO - OMP_NUM_THREADS=56
160+
2021-12-01 21:22:40,096 - __main__ - INFO - Using Intel OpenMP
161+
2021-12-01 21:22:40,096 - __main__ - INFO - KMP_AFFINITY=granularity=fine,compact,1,0
162+
2021-12-01 21:22:40,096 - __main__ - INFO - KMP_BLOCKTIME=1
163+
2021-12-01 21:22:40,096 - __main__ - INFO - LD_PRELOAD=<VIRTUAL_ENV>/lib/libiomp5.so
164+
2021-12-01 21:22:40,096 - __main__ - WARNING - Numa Aware: cores:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55] in different NUMA node
165+
```
166+
167+
CPU usage on a machine with Intel(R) Xeon(R) Platinum 8375C CPU, 1 socket, 2 cores per socket, 2 threads per socket is shown as below:
168+
![launcher_default_1socket](https://user-images.githubusercontent.com/93151422/144372993-92b2ca96-f309-41e2-a5c8-bf2143815c93.gif)
169+
170+
```
171+
$ cat logs/model_log.log
172+
2021-12-02 06:15:03,981 - __main__ - WARNING - Both TCMalloc and JeMalloc are not found in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or /home/<user>/.local/lib/ so the LD_PRELOAD environment variable will not be set. This may drop the performance
173+
2021-12-02 06:15:03,981 - __main__ - INFO - OMP_NUM_THREADS=2
174+
2021-12-02 06:15:03,982 - __main__ - INFO - Using Intel OpenMP
175+
2021-12-02 06:15:03,982 - __main__ - INFO - KMP_AFFINITY=granularity=fine,compact,1,0
176+
2021-12-02 06:15:03,982 - __main__ - INFO - KMP_BLOCKTIME=1
177+
2021-12-02 06:15:03,982 - __main__ - INFO - LD_PRELOAD=<VIRTUAL_ENV>/lib/libiomp5.so
178+
179+
```

0 commit comments

Comments
 (0)