Merge branch 'launcher' of https://github.com/min-jean-cho/serve into launcher

min-jean-cho · min-jean-cho · commit 4cc8a76f9b5c · 2021-12-09T14:40:36.000-08:00
diff --git a/binaries/conda/build_packages.py b/binaries/conda/build_packages.py
@@ -8,6 +8,10 @@
 MINICONDA_DOWNLOAD_URL = "https://repo.anaconda.com/miniconda/Miniconda3-py39_4.9.2-Linux-x86_64.sh"
 CONDA_BINARY = os.popen("which conda").read().strip() if os.system(f"conda --version") == 0 else  f"$HOME/miniconda/condabin/conda"
 
+if os.name == "nt":
+    #Assumes miniconda is installed in windows
+    CONDA_BINARY = "conda"
+
 def install_conda_build():
     """
     Install conda-build, required to create conda packages
@@ -24,6 +28,9 @@ def install_miniconda():
     if exit_code == 0:
         print(f"'conda' already present on the system. Proceeding without a fresh minconda installation.")
         return
+    if os.name == "nt":
+        print("Identified as Windows system. Please install miniconda using this URL: https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe")
+        return
 
     os.system(f"rm -rf $HOME/miniconda")
     exit_code = os.system(f"wget {MINICONDA_DOWNLOAD_URL} -O ~/miniconda.sh")
diff --git a/binaries/upload.py b/binaries/upload.py
@@ -46,7 +46,7 @@ def upload_conda_packages():
     """
 
     # Identify *.tar.bz2 files to upload
-    anaconda_token = os.environ[CONDA_TOKEN_ENV_VARIABLE]
+    anaconda_token = os.getenv(CONDA_TOKEN_ENV_VARIABLE)
 
     for root, _, files in os.walk(CONDA_PACKAGES_PATH):
         for name in files:
diff --git a/examples/intel_extension_for_pytorch/README.md b/examples/intel_extension_for_pytorch/README.md
@@ -1,13 +1,15 @@
 # TorchServe with Intel® Extension for PyTorch*
 
-TorchServe can be used with Intel® Extension for PyTorch* (IPEX) to give performance boost on Intel hardware. 
+TorchServe can be used with Intel® Extension for PyTorch* (IPEX) to give performance boost on Intel hardware<sup>1</sup>. 
 Here we show how to use TorchServe with IPEX.
 
+<sup>1. While IPEX benefits all platforms, plaforms with AVX512 benefit the most. </sup>
+
 ## Contents of this Document 
 * [Install Intel Extension for PyTorch](#install-intel-extension-for-pytorch)
 * [Serving model with Intel Extension for PyTorch](#serving-model-with-intel-extension-for-pytorch)
+* [TorchServe with Launcher](#torchserve-with-launcher)
 * [Creating and Exporting INT8 model for IPEX](#creating-and-exporting-int8-model-for-ipex)
-* [Torchserve with Launcher](#torchserve-with-launcher)
 * [Benchmarking with Launcher](#benchmarking-with-launcher)
 
 
@@ -19,7 +21,50 @@ After installation, all it needs to be done to use TorchServe with IPEX is to en
 ```
 ipex_enable=true
 ```
-Once IPEX is enabled, deploying PyTorch model follows the same procedure shown [here](https://pytorch.org/serve/use_cases.html). Torchserve with IPEX can deploy any model and do inference. 
+Once IPEX is enabled, deploying PyTorch model follows the same procedure shown [here](https://pytorch.org/serve/use_cases.html). TorchServe with IPEX can deploy any model and do inference. 
+
+## TorchServe with Launcher
+Launcher is a script to automate the process of tunining configuration setting on intel hardware to boost performance. Tuning configurations such as OMP_NUM_THREADS, thread affininty, memory allocator can have a dramatic effect on performance. Please refer to [here](https://github.com/intel/intel-extension-for-pytorch/blob/master/docs/tutorials/performance_tuning/tuning_guide.md) and [here](https://github.com/intel/intel-extension-for-pytorch/blob/master/docs/tutorials/performance_tuning/launch_script.md) for details on performance tuning with launcher. 
+
+All it needs to be done to use TorchServe with launcher is to set its configuration in `config.properties`.
+
+Add the following lines in `config.properties` to use launcher with its default configuration. 
+```
+ipex_enable=true
+cpu_launcher_enable=true
+```
+
+Launcher by default uses `numactl` if its installed to ensure socket is pinned and thus memory is allocated from local numa node. To use launcher without numactl, add the following lines in `config.properties`.
+```
+ipex_enable=true
+cpu_launcher_enable=true
+cpu_launcher_args=--disable_numactl
+```
+
+Launcher by default uses only non-hyperthreaded cores if hyperthreading is present to avoid core compute resource sharing. To use launcher with all cores, both physical and logical, add the following lines in `config.properties`.  
+```
+ipex_enable=true
+cpu_launcher_enable=true
+cpu_launcher_args=--use_logical_core
+```
+
+Below is an example of passing multiple args to `cpu_launcher_args`.
+```
+ipex_enable=true
+cpu_launcher_enable=true
+cpu_launcher_args=--use_logical_core --disable_numactl 
+```
+
+Some useful `cpu_launcher_args` to note are:
+1. Memory Allocator: [ PTMalloc `--use_default_allocator` | *TCMalloc `--enable_tcmalloc`* | JeMalloc `--enable_jemalloc`]
+   * PyTorch by defualt uses PTMalloc. TCMalloc/JeMalloc generally gives better performance.
+2. OpenMP library: [GNU OpenMP `--disable_iomp` | *Intel OpenMP*]
+   * PyTorch by default uses GNU OpenMP. Launcher by default uses Intel OpenMP. Intel OpenMP library generally gives better performance.
+3. Socket id: [`--socket_id`]
+   * Launcher by default uses all physical cores. Limit memory access to local memories on the Nth socket to avoid Non-Uniform Memory Access (NUMA).
+
+Please refer to [here](https://github.com/intel/intel-extension-for-pytorch/blob/master/docs/tutorials/performance_tuning/launch_script.md) for a full list of tunable configuration of launcher. 
+
 
 ## Creating and Exporting INT8 model for IPEX
 Intel Extension for PyTorch supports both eager and torchscript mode. In this section, we show how to deploy INT8 model for IPEX. 
@@ -30,11 +75,10 @@ First create `.pt` serialized file using IPEX INT8 inference. Here we show two e
 #### BERT
 
 ```
+import torch
 import intel_extension_for_pytorch as ipex
-from transformers import AutoModelForSequenceClassification, AutoConfig
 import transformers
-from datasets import load_dataset
-import torch
+from transformers import AutoModelForSequenceClassification, AutoConfig
 
 # load the model 
 config = AutoConfig.from_pretrained(
@@ -43,99 +87,101 @@ model = AutoModelForSequenceClassification.from_pretrained(
     "bert-base-uncased", config=config)
 model = model.eval()
 
-max_length = 384 
-dummy_tensor = torch.ones((1, max_length), dtype=torch.long)
-jit_inputs = (dummy_tensor, dummy_tensor, dummy_tensor)
-conf = ipex.quantization.QuantConf(qscheme=torch.per_tensor_affine)
-
+# define dummy input tensor to use for the model's forward call to record operations in the model for tracing
+N, max_length = 1, 384 
+dummy_tensor = torch.ones((N, max_length), dtype=torch.long)
 
 # calibration 
-n_iter = 100 
+# ipex supports two quantization schemes to be used for activation: torch.per_tensor_affine and torch.per_tensor_symmetric
+# default qscheme is torch.per_tensor_affine
+conf = ipex.quantization.QuantConf(qscheme=torch.per_tensor_affine)
+n_iter = 100
 with torch.no_grad():
     for i in range(n_iter):
         with ipex.quantization.calibrate(conf):
             model(dummy_tensor, dummy_tensor, dummy_tensor)
 
-# optionally save the configuraiton for later use 
-conf.save(‘model_conf.json’, default_recipe=True)
+# optionally save the configuraiton for later use
+# save:
+# conf.save("model_conf.json")
+# load:
+# conf = ipex.quantization.QuantConf("model_conf.json")
 
 # conversion 
+jit_inputs = (dummy_tensor, dummy_tensor, dummy_tensor)
 model = ipex.quantization.convert(model, conf, jit_inputs)
 
+# enable fusion path work(need to run forward propagation twice)
+with torch.no_grad():
+    y = model(dummy_tensor,dummy_tensor,dummy_tensor)
+    y = model(dummy_tensor,dummy_tensor,dummy_tensor)
+
 # save to .pt 
 torch.jit.save(model, 'bert_int8_jit.pt')
 ```
 
 #### ResNet50 
 
 ```
-import intel_extension_for_pytorch as ipex
-import torchvision.models as models
 import torch
 import torch.fx.experimental.optimization as optimization
-from copy import deepcopy
-
+import intel_extension_for_pytorch as ipex
+import torchvision.models as models
 
+# load the model
 model = models.resnet50(pretrained=True)
 model = model.eval()
+model = optimization.fuse(model)
 
-C, H, W = 3, 224, 224
-dummy_tensor = torch.randn(1, C, H, W).contiguous(memory_format=torch.channels_last)
-jit_inputs = (dummy_tensor)
-conf = ipex.quantization.QuantConf(qscheme=torch.per_tensor_symmetric)
+# define dummy input tensor to use for the model's forward call to record operations in the model for tracing
+N, C, H, W = 1, 3, 224, 224
+dummy_tensor = torch.randn(N, C, H, W).contiguous(memory_format=torch.channels_last)
 
-n_iter = 100 
+# calibration
+# ipex supports two quantization schemes to be used for activation: torch.per_tensor_affine and torch.per_tensor_symmetric
+# default qscheme is torch.per_tensor_affine
+conf = ipex.quantization.QuantConf(qscheme=torch.per_tensor_symmetric)
+n_iter = 100
 with torch.no_grad():
-	for i in range(n_iter):
-		with ipex.quantization.calibrate(conf):
-			model(dummy_tensor)
+    for i in range(n_iter):
+        with ipex.quantization.calibrate(conf):
+           model(dummy_tensor)
+
+# optionally save the configuraiton for later use
+# save:
+# conf.save("model_conf.json")
+# load:
+# conf = ipex.quantization.QuantConf("model_conf.json")
 
+# conversion
+jit_inputs = (dummy_tensor)
 model = ipex.quantization.convert(model, conf, jit_inputs)
+
+# enable fusion path work(need to run two iterations)
+with torch.no_grad():
+    y = model(dummy_tensor)
+    y = model(dummy_tensor)
+
+# save to .pt
 torch.jit.save(model, 'rn50_int8_jit.pt')
 ```
+
 ### 2. Creating a Model Archive 
 Once the serialized file ( `.pt`) is created, it can be used with `torch-model-archiver` as ususal. Use the following command to package the model.  
 ```
 torch-model-archiver --model-name rn50_ipex_int8 --version 1.0 --serialized-file rn50_int8_jit.pt --handler image_classifier 
 ```
-### 3. Start Torchserve to serve the model 
-Make sure to set `ipex_enable=true` in `config.properties`. Use the following command to start Torchserve with IPEX. 
+### 3. Start TorchServe to serve the model 
+Make sure to set `ipex_enable=true` in `config.properties`. Use the following command to start TorchServe with IPEX. 
 ```
 torchserve --start --ncs --model-store model_store --ts-config config.properties
 ```
 
 ### 4. Registering and Deploying model 
 Registering and deploying the model follows the same steps shown [here](https://pytorch.org/serve/use_cases.html). 
 
-## Torchserve with Launcher
-Launcher is a script to automate the process of tunining configuration setting on intel hardware to boost performance. Tuning configurations such as OMP_NUM_THREADS, thread affininty, memory allocator can have a dramatic effect on performance. Please refer to [here](https://github.com/intel/intel-extension-for-pytorch/blob/master/docs/tutorials/performance_tuning/tuning_guide.md) and [here](https://github.com/intel/intel-extension-for-pytorch/blob/master/docs/tutorials/performance_tuning/tuning_guide.md) for details on performance tuning with launcher. 
-
-All it needs to be done to use Torchserve with launcher is to set its configuration in `config.properties`.
-
-
-Add the following lines in `config.properties` to use launcher with its default configuration. 
-```
-ipex_enable=true
-cpu_launcher_enable=true
-```
-
-Launcher by default uses `numactl` if its installed to ensure socket is pinned and thus memory is allocated from local numa node. To use launcher without numactl, add the following lines in `config.properties`.
-```
-ipex_enable=true
-cpu_launcher_enable=true
-cpu_launcher_args=--disable_numactl
-```
-
-Launcher by default uses only non-hyperthreaded cores if hyperthreading is present to avoid core compute resource sharing. To use launcher with all cores, both physical and logical, add the following lines in `config.properties`.  
-```
-ipex_enable=true
-cpu_launcher_enable=true
-cpu_launcher_args=--use_logical_core
-```
-Please refer to [here](https://github.com/intel/intel-extension-for-pytorch/blob/master/docs/tutorials/performance_tuning/launch_script.md) for a full list of tunable configuration of launcher.
-
 ## Benchmarking with Launcher 
-Launcher can be used with Torchserve official [benchmark](https://github.com/pytorch/serve/tree/master/benchmarks) to launch server and benchmark requests with optimal configuration on Intel hardware.
+Launcher can be used with TorchServe official [benchmark](https://github.com/pytorch/serve/tree/master/benchmarks) to launch server and benchmark requests with optimal configuration on Intel hardware.
 
 In this section we provide examples of benchmarking with launcher with its default configuration.