Skip to content

Commit 9acd4a0

Browse files
authored
Intel Modin Getting Started Sample. Readme edits. (oneapi-src#892)
- Used a new template. - Edited the text. - Added information on how to create a kernel to run the notebook in the DevCloud.
1 parent 9fceb32 commit 9acd4a0

File tree

1 file changed

+145
-98
lines changed
  • AI-and-Analytics/Getting-Started-Samples/IntelModin_GettingStarted

1 file changed

+145
-98
lines changed
Lines changed: 145 additions & 98 deletions
Original file line numberDiff line numberDiff line change
@@ -1,164 +1,211 @@
1-
# `Intel Modin Getting Started` Sample
2-
This Getting Started sample code shows how to use distributed Pandas using the Intel® Distribution of Modin* package. It demonstrates how to use software products that can be found in the [Intel® oneAPI AI Analytics Toolkit](https://software.intel.com/content/www/us/en/develop/tools/oneapi/ai-analytics-toolkit.html).
1+
# Intel® Modin* Get Started Sample
32

4-
| Optimized for | Description
3+
This get started sample code shows how to use distributed Pandas using the Intel® Distribution of Modin* package. It demonstrates how to use software products that can be found in the [Intel® oneAPI AI Analytics Toolkit](https://software.intel.com/content/www/us/en/develop/tools/oneapi/ai-analytics-toolkit.html).
4+
5+
| Property | Description
56
| :--- | :---
6-
| OS | 64-bit Linux: Ubuntu 18.04 or higher
7-
| Hardware | Intel Atom® Processors; Intel® Core™ Processor Family; Intel® Xeon® Processor Family; Intel® Xeon® Scalable Performance Processor Family
8-
| Software | Intel Distribution of Modin*, Intel® oneAPI AI Analytics Toolkit
9-
| What you will learn | Basic Intel Distribution of Modin* programming model for Intel CPU
7+
| Category | Get started sample
8+
| What you will learn | Basic Intel® Distribution of Modin* programming model for Intel processors
109
| Time to complete | 5-8 minutes
1110

11+
1212
## Purpose
13+
1314
Intel Distribution of Modin* uses Ray or Dask to provide an effortless way to speed up your Pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Intel Distribution of Modin* provides seamless integration and compatibility with existing Pandas code.
1415

1516
In this sample, you will run Intel Distribution of Modin*-accelerated Pandas functions and note the performance gain when compared to "stock" (aka standard) Pandas functions.
1617

17-
## Key Implementation Details
18+
| Optimized for | Description
19+
| :--- | :---
20+
| OS | 64-bit Linux: Ubuntu 18.04 or higher
21+
| Hardware | Intel® Atom® processors; Intel® Core™ processor family; Intel® Xeon® processor family; Intel® Xeon® Scalable Performance processor family
22+
| Software | Intel® Distribution of Modin*, Intel® oneAPI AI Analytics Toolkit
1823

19-
This Getting Started sample code is implemented for CPU using the Python language. The example assumes you have Pandas and Modin installed inside a conda environment, similar to what is directed by the [Intel® oneAPI AI Analytics Toolkit](https://www.intel.com/content/www/us/en/develop/documentation/installation-guide-for-intel-oneapi-toolkits-linux/top/installation/install-using-package-managers/conda/install-intel-ai-analytics-toolkit-via-conda.html).
2024

25+
## Key Implementation Details
2126

22-
## License
27+
This get started sample code is implemented for CPU using the Python language. The example assumes you have Pandas and Modin installed inside a conda environment.
2328

24-
Code samples are licensed under the MIT license. See
25-
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
2629

27-
Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
30+
## Environment Setup
2831

32+
1. Install Intel Distribution of Modin in a new conda environment.
2933

30-
## Running Samples on the Intel® DevCloud
31-
If you are running this sample on the DevCloud, see [Running Samples on the Intel® DevCloud](#run-samples-on-devcloud)
34+
<!-- As of right now, you can install Intel Distribution of Modin only via Anaconda. -->
3235

33-
## Building Modin for CPU
36+
``` bash
37+
conda create -n aikit-modin
38+
conda activate aikit-modin
39+
conda install modin-all -c intel -y
40+
```
41+
42+
<!-- You can refer to the oneAPI [main page](https://software.intel.com/en-us/oneapi) for toolkit installation and the Toolkit [Getting Started Guide for Linux](https://software.intel.com/en-us/get-started-with-intel-oneapi-linux-get-started-with-the-intel-ai-analytics-toolkit) for post-installation steps and scripts. -->
3443

44+
45+
2. Install matplotlib.
3546

36-
Intel Distribution of Modin* is ready for use once you finish the Intel Distribution of Modin installation and have run the post installation script.
47+
``` bash
48+
conda install -c intel matplotlib -y
49+
```
50+
51+
3. Install Jupyter Notebook.
3752

38-
For this sample, you will also have to install the matplotlib module.
53+
Skip this step if you are working on the Intel DevCloud.
3954

40-
Please install matplotlib with the command:
55+
``` bash
56+
conda install jupyter nb_conda_kernels -y
57+
```
4158

42-
```
43-
conda install -c intel matplotlib
44-
```
59+
4. Create a new kernel for Jupyter Notebook based on your activated conda environment.
4560

46-
You can refer to the oneAPI [main page](https://software.intel.com/en-us/oneapi) for toolkit installation and the Toolkit [Getting Started Guide for Linux](https://software.intel.com/en-us/get-started-with-intel-oneapi-linux-get-started-with-the-intel-ai-analytics-toolkit) for post-installation steps and scripts.
61+
``` bash
62+
conda install ipykernel
63+
python -m ipykernel install --user --name usr_modin
64+
```
65+
66+
This step is optional if you plan to open the notebook on your local server.
4767

48-
> **Note**: If you have not already done so, set up your CLI
49-
> environment by sourcing the `setvars` script located in
50-
> the root of your oneAPI installation.
51-
>
52-
> Linux Sudo: . /opt/intel/oneapi/setvars.sh
53-
>
54-
> Linux User: . ~/intel/oneapi/setvars.sh
55-
>
56-
> Windows: C:\Program Files(x86)\Intel\oneAPI\setvars.bat
57-
>
58-
>For more information on environment variables, see Use the setvars Script for [Linux or macOS](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html), or [Windows](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html).
5968

60-
### Activate conda environment With Root Access
69+
## Run the Sample<a name="running-the-sample"></a>
6170

62-
Activate the conda environment with the following command:
71+
You can run the Jupyter notebook with the sample code on your local
72+
server or download the sample code from the notebook as a Python file and run it locally or on the Intel DevCloud.
6373

64-
#### Linux
65-
```
66-
source activate intel-aikit-modin
67-
```
74+
**Note:** You can run this sample on the Intel DevCloud using the Dask and OmniSci engine backends for Modin. To learn how to set the engine backend for Intel Distribution of Modin, visit the [Intel® Distribution of Modin Getting Started Guide](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-distribution-of-modin-getting-started-guide.html). The Ray backend cannot be used on Intel DevCloud at this time. Thank you for your patience.
6875

69-
### Activate conda environment Without Root Access (Optional)
76+
### Run the Sample in Jupyter Notebook<a name="run-as-jupyter-notebook"></a>
7077

71-
By default, the Intel® oneAPI AI Analytics toolkit is installed in the `oneapi` folder, which requires root privileges to manage it. If you would like to bypass using root access to manage your conda environment, then you can install the Intel® Distribution of Modin* python environment with the following command:
78+
To open the Jupyter notebook on your local server:
7279

73-
#### Linux
80+
1. Activate the conda environment.
7481

75-
```
76-
conda create -y -n modin-conda-forge -c conda-forge modin-all
77-
conda install -y -n modin-conda-forge -c conda-forge matplotlib
78-
```
79-
Then activate your conda environment with the following command:
80-
```
81-
conda activate modin-conda-forge
82-
```
82+
``` bash
83+
conda activate aikit-modin
84+
```
8385

86+
2. Start the Jupyter notebook server.
8487

85-
### Install Jupyter Notebook
88+
``` bash
89+
jupyter notebook
90+
```
91+
92+
3. Open the ``IntelModin_GettingStarted.ipynb`` file in the Notebook
93+
Dashboard.
8694

87-
Launch Jupyter Notebook in the directory housing the code example:
95+
4. Run the cells in the Jupyter notebook sequentially by clicking the
96+
**Run** button.
8897

89-
```
90-
conda install jupyter nb_conda_kernels -c conda-forge -y
91-
```
98+
![Click the Run button in Jupyter Notebook](Jupyter_Run.jpg "Run button in Jupyter Notebook")
9299

93-
#### View in Jupyter Notebook
100+
### Run the Sample in the Intel® DevCloud for oneAPI JupyterLab
94101

102+
1. Open the following link in your browser: https://jupyter.oneapi.devcloud.intel.com/
95103

96-
Launch Jupyter Notebook in the directory housing the code example:
104+
2. In the Notebook Dashboard, navigate to the ``IntelModin_GettingStarted.ipynb`` file and open it.
97105

98-
```
99-
jupyter notebook
100-
```
106+
**Important:** You must edit the cell that imports modin to enable the Dask or OmniSci backend engine. The Ray backend cannot be used on Intel DevCloud at this time. For more information, visit the [Intel® Distribution of Modin Getting Started Guide](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-distribution-of-modin-getting-started-guide.html).
101107

102-
## Running the Sample<a name="running-the-sample"></a>
108+
3. To change the kernel, click **Kernel** > **Change kernel** > **usr_modin**.
103109

104-
### Run as Jupyter Notebook<a name="run-as-jupyter-notebook"></a>
110+
4. Run the sample code and read the explanations in the notebook.
105111

106-
Open .ipynb file and run cells in Jupyter Notebook using the "Run" button (see the image using "daal4py Hello World" sample):
112+
### Run the Python Script Locally
107113

108-
![Click the Run Button in the Jupyter Notebook](Jupyter_Run.jpg "Run Button on Jupyter Notebook")
114+
1. Convert ``IntelModin_GettingStarted.ipynb`` to a python file in one of the following ways:
109115

110-
#### Intel® DevCloud for oneAPI JupyterLab
116+
- Open the notebook in Jupyter and download as a python file. See the image from the daal4py Hello World sample:
111117

112-
Please note that as of right now, this sample cannot be run on Intel® DevCloud for oneAPI JupyterLab due to conflicts between the Intel® DevCloud for oneAPI JupyterLab platform and Modin dependencies. This is a known issue that Intel is currently working on resolving. Thank you for your patience.
118+
![Download as a python script in Jupyter Notebook](Jupyter_Save_Py.jpg "Download as Python script in the Jupyter Notebook")
119+
120+
- Run the following command to convert the notebook file to a Python script:
121+
122+
``` bash
123+
jupyter nbconvert --to python IntelModin_GettingStarted.ipynb
124+
```
113125

114-
### Run as Python File
126+
2. Run the Python script.
115127

116-
Open notebook in Jupyter and download as python file (see the image using "daal4py Hello World" sample):
128+
``` bash
129+
ipython IntelModin_GettingStarted.py
130+
```
117131

118-
![Download as python file in the Jupyter Notebook](Jupyter_Save_Py.jpg "Download as python file in the Jupyter Notebook")
132+
### Run the Sample on the Intel&reg; DevCloud in Batch Mode<a name="run-samples-on-devcloud"></a>
119133

120-
Run the Program
134+
This sample runs in batch mode, so you must have a script for batch processing.
121135

122-
`python IntelModin_GettingStarted.py`
136+
1. Convert ``IntelModin_GettingStarted.ipynb`` to a python file.
123137

124-
##### Expected Printed Output:
125-
Expected Cell Output is shown in IntelModin_GettingStarted.ipynb
138+
``` bash
139+
jupyter nbconvert --to python IntelModin_GettingStarted.ipynb
140+
```
126141

127-
### Running Samples on the Intel&reg; DevCloud (Optional)<a name="run-samples-on-devcloud"></a>
142+
2. Create a shell script file ``run-modin-sample.sh`` to activate the conda environment and run the sample.
128143

129-
<!---Include the next paragraph ONLY if the sample runs in batch mode-->
130-
### Run in Batch Mode
131-
This sample runs in batch mode, so you must have a script for batch processing. Once you have a script set up, refer to [Running the Sample](#running-the-sample).
144+
```bash
145+
source activate aikit-modin
146+
ipython IntelModin_GettingStarted.py
147+
```
148+
149+
3. Submit a job that requests a compute node to run the sample code.
132150

133-
<!---Include the next paragraph ONLY if the sample DOES NOT RUN in batch mode-->
134-
### Run in Interactive Mode
135-
This sample runs in interactive mode. For more information, see [Run as Juypter Notebook](#run-as-jupyter-notebook).
151+
```bash
152+
qsub -l nodes=1:xeon:ppn=2 -d . run-modin-sample.sh -o output.txt
153+
```
154+
155+
The ``-o output.txt`` option redirects the output of the script to the ``output.txt`` file.
136156

137-
### Request a Compute Node
138-
In order to run on the DevCloud, you need to request a compute node using node properties such as: `gpu`, `xeon`, `fpga_compile`, `fpga_runtime` and others. For more information about the node properties, execute the `pbsnodes` command.
139-
This node information must be provided when submitting a job to run your sample in batch mode using the qsub command. When you see the qsub command in the Run section of the [Hello World instructions](https://devcloud.intel.com/oneapi/get_started/aiAnalyticsToolkitSamples/), change the command to fit the node you are using. Nodes which are in bold indicate they are compatible with this sample:
157+
<details>
158+
<summary>Click here for additional information about requesting a compute node in the Intel DevCloud.</summary>
159+
160+
In order to run a script on the DevCloud, you need to request a compute node using node properties such as: `gpu`, `xeon`, `fpga_compile`, `fpga_runtime` and others. For more information about the node properties, execute the `pbsnodes` command.
161+
162+
This node information must be provided when submitting a job to run your sample in batch mode using the qsub command. When you see the qsub command in the Run section of the [Hello World instructions](https://devcloud.intel.com/oneapi/get_started/aiAnalyticsToolkitSamples/), change the command to fit the node you are using. Nodes which are in bold indicate they are compatible with this sample:
140163

141-
<!---Mark each compatible Node in BOLD-->
142-
| Node | Command |
143-
| ----------------- | ------------------------------------------------------- |
144-
| GPU | qsub -l nodes=1:gpu:ppn=2 -d . hello-world.sh |
145-
| CPU | qsub -l nodes=1:xeon:ppn=2 -d . hello-world.sh |
146-
| FPGA Compile Time | qsub -l nodes=1:fpga\_compile:ppn=2 -d . hello-world.sh |
147-
| FPGA Runtime | qsub -l nodes=1:fpga\_runtime:ppn=2 -d . hello-world.sh |
164+
<!---Mark each compatible Node in BOLD-->
165+
| Node | Command |
166+
|-------------------|---------------------------------------------------------|
167+
| GPU | qsub -l nodes=1:gpu:ppn=2 -d . hello-world.sh |
168+
| CPU | qsub -l nodes=1:xeon:ppn=2 -d . hello-world.sh |
169+
| FPGA Compile Time | qsub -l nodes=1:fpga\_compile:ppn=2 -d . hello-world.sh |
170+
| FPGA Runtime | qsub -l nodes=1:fpga\_runtime:ppn=2 -d . hello-world.sh |
171+
</details>
148172

149-
### Using Visual Studio Code* (Optional)
173+
### Run the Sample in Visual Studio Code*
150174

151175
You can use Visual Studio Code (VS Code) extensions to set your environment, create launch configurations,
152176
and browse and download samples.
153177

154178
The basic steps to build and run a sample using VS Code include:
155-
- Download a sample using the extension **Code Sample Browser for Intel oneAPI Toolkits**.
156-
- Configure the oneAPI environment with the extension **Environment Configurator for Intel oneAPI Toolkits**.
157-
- Open a Terminal in VS Code (**Terminal>New Terminal**).
158-
- Run the sample in the VS Code terminal using the instructions below.
159-
- (Linux only) Debug your GPU application with GDB for Intel® oneAPI toolkits using the Generate Launch Configurations extension.
179+
180+
1. Download a sample using the extension **Code Sample Browser for Intel&reg; oneAPI Toolkits**.
181+
182+
2. Configure the oneAPI environment with the extension **Environment Configurator for Intel(R) oneAPI Toolkits**.
183+
184+
3. Open a Terminal in VS Code by clicking **Terminal** > **New Terminal**.
185+
186+
4. Run the sample in the VS Code terminal using the instructions below.
187+
188+
On Linux, you can debug your GPU application with GDB for Intel® oneAPI toolkits using the **Generate Launch Configurations** extension.
160189

161190
To learn more about the extensions, see
162191
[Using Visual Studio Code with Intel® oneAPI Toolkits](https://software.intel.com/content/www/us/en/develop/documentation/using-vs-code-with-intel-oneapi/top.html).
163192

164-
After learning how to use the extensions for Intel oneAPI Toolkits, return to this readme for instructions on how to build and run a sample.
193+
After learning how to use the extensions for Intel oneAPI Toolkits, return to this readme for instructions on how to build and run a sample.
194+
195+
### Expected Printed Output:
196+
197+
Expected cell output is shown in IntelModin_GettingStarted.ipynb.
198+
199+
## Related Samples
200+
201+
Several sample programs are available for you to try, many of which
202+
can be compiled and run in a similar fashion. Experiment with running
203+
the various samples on different kinds of compute nodes or adjust
204+
their source code to experiment with different workloads.
205+
206+
## License
207+
208+
Code samples are licensed under the MIT license. See
209+
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
210+
211+
Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).

0 commit comments

Comments
 (0)