Intel Modin Getting Started Sample. Readme edits. (oneapi-src#892)

anton-intel · web-flow · commit 9acd4a046797 · 2022-03-17T14:02:45.000-07:00
- Used a new template.
- Edited the text.
- Added information on how to create a kernel to run the notebook in the DevCloud.
diff --git a/AI-and-Analytics/Getting-Started-Samples/IntelModin_GettingStarted/README.md b/AI-and-Analytics/Getting-Started-Samples/IntelModin_GettingStarted/README.md
@@ -1,164 +1,211 @@
-﻿# `Intel Modin Getting Started` Sample
-This Getting Started sample code shows how to use distributed Pandas using the Intel® Distribution of Modin* package. It demonstrates how to use software products that can be found in the [Intel® oneAPI AI Analytics Toolkit](https://software.intel.com/content/www/us/en/develop/tools/oneapi/ai-analytics-toolkit.html).
+﻿# Intel&reg; Modin* Get Started Sample
 
-| Optimized for                     | Description
+This get started sample code shows how to use distributed Pandas using the Intel® Distribution of Modin* package. It demonstrates how to use software products that can be found in the [Intel® oneAPI AI Analytics Toolkit](https://software.intel.com/content/www/us/en/develop/tools/oneapi/ai-analytics-toolkit.html).
+
+| Property                          | Description
 | :---                              | :---
-| OS                                | 64-bit Linux: Ubuntu 18.04 or higher
-| Hardware                          | Intel Atom® Processors; Intel® Core™ Processor Family; Intel® Xeon® Processor Family; Intel® Xeon® Scalable Performance Processor Family
-| Software                          | Intel Distribution of Modin*, Intel® oneAPI AI Analytics Toolkit
-| What you will learn               | Basic Intel Distribution of Modin* programming model for Intel CPU
+| Category                          | Get started sample
+| What you will learn               | Basic Intel&reg; Distribution of Modin* programming model for Intel processors
 | Time to complete                  | 5-8 minutes
 
+
 ## Purpose
+
 Intel Distribution of Modin* uses Ray or Dask to provide an effortless way to speed up your Pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Intel Distribution of Modin* provides seamless integration and compatibility with existing Pandas code.
 
 In this sample, you will run Intel Distribution of Modin*-accelerated Pandas functions and note the performance gain when compared to "stock" (aka standard) Pandas functions.
 
-## Key Implementation Details
+| Optimized for                     | Description
+| :---                              | :---
+| OS                                | 64-bit Linux: Ubuntu 18.04 or higher
+| Hardware                          | Intel® Atom® processors; Intel® Core™ processor family; Intel® Xeon® processor family; Intel® Xeon® Scalable Performance processor family
+| Software                          | Intel® Distribution of Modin*, Intel® oneAPI AI Analytics Toolkit
 
-This Getting Started sample code is implemented for CPU using the Python language. The example assumes you have Pandas and Modin installed inside a conda environment, similar to what is directed by the [Intel® oneAPI AI Analytics Toolkit](https://www.intel.com/content/www/us/en/develop/documentation/installation-guide-for-intel-oneapi-toolkits-linux/top/installation/install-using-package-managers/conda/install-intel-ai-analytics-toolkit-via-conda.html).
 
+## Key Implementation Details
 
-## License
+This get started sample code is implemented for CPU using the Python language. The example assumes you have Pandas and Modin installed inside a conda environment.
 
-Code samples are licensed under the MIT license. See
-[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
 
-Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
+## Environment Setup
 
+1. Install Intel Distribution of Modin in a new conda environment.
 
-## Running Samples on the Intel&reg; DevCloud
-If you are running this sample on the DevCloud, see [Running Samples on the Intel&reg; DevCloud](#run-samples-on-devcloud)
+   <!-- As of right now, you can install Intel Distribution of Modin only via Anaconda. -->
 
-## Building Modin for CPU
+   ``` bash
+   conda create -n aikit-modin
+   conda activate aikit-modin
+   conda install modin-all -c intel -y
+   ```
+   
+   <!-- You can refer to the oneAPI [main page](https://software.intel.com/en-us/oneapi) for toolkit installation and the Toolkit [Getting Started Guide for Linux](https://software.intel.com/en-us/get-started-with-intel-oneapi-linux-get-started-with-the-intel-ai-analytics-toolkit) for post-installation steps and scripts. -->
 
+   
+2. Install matplotlib.
 
-Intel Distribution of Modin* is ready for use once you finish the Intel Distribution of Modin installation and have run the post installation script.
+   ``` bash
+   conda install -c intel matplotlib -y
+   ```
+   
+3. Install Jupyter Notebook.
 
-For this sample, you will also have to install the matplotlib module.
+   Skip this step if you are working on the Intel DevCloud.
 
-Please install matplotlib with the command:
+   ``` bash
+   conda install jupyter nb_conda_kernels -y
+   ```
 
-```
-conda install -c intel matplotlib
-```
+4. Create a new kernel for Jupyter Notebook based on your activated conda environment.
 
-You can refer to the oneAPI [main page](https://software.intel.com/en-us/oneapi) for toolkit installation and the Toolkit [Getting Started Guide for Linux](https://software.intel.com/en-us/get-started-with-intel-oneapi-linux-get-started-with-the-intel-ai-analytics-toolkit) for post-installation steps and scripts.
+   ``` bash
+   conda install ipykernel
+   python -m ipykernel install --user --name usr_modin
+   ```
+   
+   This step is optional if you plan to open the notebook on your local server.
 
-> **Note**: If you have not already done so, set up your CLI
-> environment by sourcing  the `setvars` script located in
-> the root of your oneAPI installation.
->
-> Linux Sudo: . /opt/intel/oneapi/setvars.sh
->
-> Linux User: . ~/intel/oneapi/setvars.sh
->
-> Windows: C:\Program Files(x86)\Intel\oneAPI\setvars.bat
->
->For more information on environment variables, see Use the setvars Script for [Linux or macOS](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html), or [Windows](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html).
 
-### Activate conda environment With Root Access
+## Run the Sample<a name="running-the-sample"></a>
 
-Activate the conda environment with the following command:
+You can run the Jupyter notebook with the sample code on your local
+server or download the sample code from the notebook as a Python file and run it locally or on the Intel DevCloud.
 
-#### Linux
-```
-source activate intel-aikit-modin
-```
+**Note:** You can run this sample on the Intel DevCloud using the Dask and OmniSci engine backends for Modin. To learn how to set the engine backend for Intel Distribution of Modin, visit the [Intel® Distribution of Modin Getting Started Guide](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-distribution-of-modin-getting-started-guide.html). The Ray backend cannot be used on Intel DevCloud at this time. Thank you for your patience.
 
-### Activate conda environment Without Root Access (Optional)
+### Run the Sample in Jupyter Notebook<a name="run-as-jupyter-notebook"></a>
 
-By default, the Intel® oneAPI AI Analytics toolkit is installed in the `oneapi` folder, which requires root privileges to manage it. If you would like to bypass using root access to manage your conda environment, then you can install the Intel® Distribution of Modin* python environment with the following command:
+To open the Jupyter notebook on your local server:
 
-#### Linux
+1. Activate the conda environment.
 
-```
-conda create -y -n modin-conda-forge -c conda-forge modin-all
-conda install -y -n modin-conda-forge -c conda-forge matplotlib
-```
-Then activate your conda environment with the following command:
-```
-conda activate modin-conda-forge
-```
+   ``` bash
+   conda activate aikit-modin
+   ```
 
+2. Start the Jupyter notebook server.
 
-### Install Jupyter Notebook
+   ``` bash
+   jupyter notebook
+   ```
+   
+3. Open the ``IntelModin_GettingStarted.ipynb`` file in the Notebook
+   Dashboard.
 
-Launch Jupyter Notebook in the directory housing the code example:
+4. Run the cells in the Jupyter notebook sequentially by clicking the
+   **Run** button.
 
-```
-conda install jupyter nb_conda_kernels -c conda-forge -y
-```
+   ![Click the Run button in Jupyter Notebook](Jupyter_Run.jpg "Run button in Jupyter Notebook")
 
-#### View in Jupyter Notebook
+### Run the Sample in the Intel® DevCloud for oneAPI JupyterLab
 
+1. Open the following link in your browser: https://jupyter.oneapi.devcloud.intel.com/
 
-Launch Jupyter Notebook in the directory housing the code example:
+2. In the Notebook Dashboard, navigate to the ``IntelModin_GettingStarted.ipynb`` file and open it.
 
-```
-jupyter notebook
-```
+   **Important:** You must edit the cell that imports modin to enable the Dask or OmniSci backend engine. The Ray backend cannot be used on Intel DevCloud at this time. For more information, visit the [Intel® Distribution of Modin Getting Started Guide](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-distribution-of-modin-getting-started-guide.html).
 
-## Running the Sample<a name="running-the-sample"></a>
+3. To change the kernel, click **Kernel** > **Change kernel** > **usr_modin**.
 
-### Run as Jupyter Notebook<a name="run-as-jupyter-notebook"></a>
+4. Run the sample code and read the explanations in the notebook.
 
-Open .ipynb file and run cells in Jupyter Notebook using the "Run" button (see the image using "daal4py Hello World" sample):
+### Run the Python Script Locally
 
-![Click the Run Button in the Jupyter Notebook](Jupyter_Run.jpg "Run Button on Jupyter Notebook")
+1. Convert ``IntelModin_GettingStarted.ipynb`` to a python file in one of the following ways:
 
-#### Intel® DevCloud for oneAPI JupyterLab
+   - Open the notebook in Jupyter and download as a python file. See the image from the daal4py Hello World sample:
 
-Please note that as of right now, this sample cannot be run on Intel® DevCloud for oneAPI JupyterLab due to conflicts between the Intel® DevCloud for oneAPI JupyterLab platform and Modin dependencies. This is a known issue that Intel is currently working on resolving. Thank you for your patience.
+     ![Download as a python script in Jupyter Notebook](Jupyter_Save_Py.jpg "Download as Python script in the Jupyter Notebook")
+	 
+   - Run the following command to convert the notebook file to a Python script:
+   
+     ``` bash
+     jupyter nbconvert --to python IntelModin_GettingStarted.ipynb
+     ```
 
-### Run as Python File
+2. Run the Python script.
 
-Open notebook in Jupyter and download as python file (see the image using "daal4py Hello World" sample):
+   ``` bash
+   ipython IntelModin_GettingStarted.py
+   ```
 
-![Download as python file in the Jupyter Notebook](Jupyter_Save_Py.jpg "Download as python file in the Jupyter Notebook")
+### Run the Sample on the Intel&reg; DevCloud in Batch Mode<a name="run-samples-on-devcloud"></a>
 
-Run the Program
+This sample runs in batch mode, so you must have a script for batch processing.
 
-`python IntelModin_GettingStarted.py`
+1. Convert ``IntelModin_GettingStarted.ipynb`` to a python file.
 
-##### Expected Printed Output:
-Expected Cell Output is shown in IntelModin_GettingStarted.ipynb
+   ``` bash
+   jupyter nbconvert --to python IntelModin_GettingStarted.ipynb
+   ```
 
-### Running Samples on the Intel&reg; DevCloud (Optional)<a name="run-samples-on-devcloud"></a>
+2. Create a shell script file ``run-modin-sample.sh`` to activate the conda environment and run the sample.
 
-<!---Include the next paragraph ONLY if the sample runs in batch mode-->
-### Run in Batch Mode
-This sample runs in batch mode, so you must have a script for batch processing. Once you have a script set up, refer to [Running the Sample](#running-the-sample).
+   ```bash
+   source activate aikit-modin
+   ipython IntelModin_GettingStarted.py
+   ```
+   
+3. Submit a job that requests a compute node to run the sample code.
 
-<!---Include the next paragraph ONLY if the sample DOES NOT RUN in batch mode-->
-### Run in Interactive Mode
-This sample runs in interactive mode. For more information, see [Run as Juypter Notebook](#run-as-jupyter-notebook).
+   ```bash
+   qsub -l nodes=1:xeon:ppn=2 -d . run-modin-sample.sh -o output.txt
+   ```
+   
+   The ``-o output.txt`` option redirects the output of the script to the ``output.txt`` file.
 
-### Request a Compute Node
-In order to run on the DevCloud, you need to request a compute node using node properties such as: `gpu`, `xeon`, `fpga_compile`, `fpga_runtime` and others. For more information about the node properties, execute the `pbsnodes` command.
- This node information must be provided when submitting a job to run your sample in batch mode using the qsub command. When you see the qsub command in the Run section of the [Hello World instructions](https://devcloud.intel.com/oneapi/get_started/aiAnalyticsToolkitSamples/), change the command to fit the node you are using. Nodes which are in bold indicate they are compatible with this sample:
+   <details>
+   <summary>Click here for additional information about requesting a compute node in the Intel DevCloud.</summary>
+   
+   In order to run a script on the DevCloud, you need to request a compute node using node properties such as: `gpu`, `xeon`, `fpga_compile`, `fpga_runtime` and others. For more information about the node properties, execute the `pbsnodes` command.
+   
+   This node information must be provided when submitting a job to run your sample in batch mode using the qsub command. When you see the qsub command in the Run section of the [Hello World instructions](https://devcloud.intel.com/oneapi/get_started/aiAnalyticsToolkitSamples/), change the command to fit the node you are using. Nodes which are in bold indicate they are compatible with this sample:
 
-<!---Mark each compatible Node in BOLD-->
-| Node              | Command                                                 |
-| ----------------- | ------------------------------------------------------- |
-| GPU               | qsub -l nodes=1:gpu:ppn=2 -d . hello-world.sh           |
-| CPU               | qsub -l nodes=1:xeon:ppn=2 -d . hello-world.sh          |
-| FPGA Compile Time | qsub -l nodes=1:fpga\_compile:ppn=2 -d . hello-world.sh |
-| FPGA Runtime      | qsub -l nodes=1:fpga\_runtime:ppn=2 -d . hello-world.sh |
+   <!---Mark each compatible Node in BOLD-->
+   | Node              | Command                                                 |
+   |-------------------|---------------------------------------------------------|
+   | GPU               | qsub -l nodes=1:gpu:ppn=2 -d . hello-world.sh           |
+   | CPU               | qsub -l nodes=1:xeon:ppn=2 -d . hello-world.sh          |
+   | FPGA Compile Time | qsub -l nodes=1:fpga\_compile:ppn=2 -d . hello-world.sh |
+   | FPGA Runtime      | qsub -l nodes=1:fpga\_runtime:ppn=2 -d . hello-world.sh |
+   </details>
 
-### Using Visual Studio Code*  (Optional)
+### Run the Sample in Visual Studio Code*
 
 You can use Visual Studio Code (VS Code) extensions to set your environment, create launch configurations,
 and browse and download samples.
 
 The basic steps to build and run a sample using VS Code include:
- - Download a sample using the extension **Code Sample Browser for Intel oneAPI Toolkits**.
- - Configure the oneAPI environment with the extension **Environment Configurator for Intel oneAPI Toolkits**.
- - Open a Terminal in VS Code (**Terminal>New Terminal**).
- - Run the sample in the VS Code terminal using the instructions below.
- - (Linux only) Debug your GPU application with GDB for Intel® oneAPI toolkits using the Generate Launch Configurations extension.
+
+1. Download a sample using the extension **Code Sample Browser for Intel&reg; oneAPI Toolkits**.
+
+2. Configure the oneAPI environment with the extension **Environment Configurator for Intel(R) oneAPI Toolkits**.
+
+3. Open a Terminal in VS Code by clicking **Terminal** > **New Terminal**.
+
+4. Run the sample in the VS Code terminal using the instructions below.
+
+On Linux, you can debug your GPU application with GDB for Intel® oneAPI toolkits using the **Generate Launch Configurations** extension.
 
 To learn more about the extensions, see
 [Using Visual Studio Code with Intel® oneAPI Toolkits](https://software.intel.com/content/www/us/en/develop/documentation/using-vs-code-with-intel-oneapi/top.html).
 
-After learning how to use the extensions for Intel oneAPI Toolkits, return to this readme for instructions on how to build and run a sample.
+After learning how to use the extensions for Intel oneAPI Toolkits, return to this readme for instructions on how to build and run a sample.
+
+### Expected Printed Output:
+
+Expected cell output is shown in IntelModin_GettingStarted.ipynb.
+
+## Related Samples
+
+Several sample programs are available for you to try, many of which
+can be compiled and run in a similar fashion. Experiment with running
+the various samples on different kinds of compute nodes or adjust
+their source code to experiment with different workloads.
+
+## License
+
+Code samples are licensed under the MIT license. See
+[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
+
+Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).