Skip to content

Commit 38a61de

Browse files
committed
Update README.md
1 parent cd34932 commit 38a61de

File tree

1 file changed

+34
-35
lines changed

1 file changed

+34
-35
lines changed

README.md

Lines changed: 34 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,57 +1,56 @@
1-
# Project Name
1+
# AzureML and Data Science Overview Workshop
22

3-
(short, 1-3 sentenced, description of the project)
3+
The purpose of this workshop is for you to work through a basic end-to-end flow for a data scientist starting to work with AzureML to train a model, test it with an endpoint, then experiment to improve the outcomes. Work through the scenario with the Studio, CLI v2, and the SDK where appropriate. Use our docs and samples to figure out syntax. The scenario has been tested and you’ll see that the product is pretty good overall. Things are not consistent and some experiences aren’t good. With the exception of some internal private preview features, this is what our customers see.
44

5-
## Features
5+
## 0. Getting Started
66

7-
This project framework provides the following features:
7+
You need access to an Azure subscription. If it’s a shared subscription, the owner should provide you with a resource group and assign you as owner so you can wear an IT hat and create a workspace and required resources.
88

9-
* Feature 1
10-
* Feature 2
11-
* ...
9+
This repo [repo link] has the python scripts and data files you need to get started.
1210

13-
## Getting Started
11+
Hint: You need compute quota for AzureML in the region for the workspace for the VM family you want to use. 4 cores of DSv2 should be fine.
1412

15-
### Prerequisites
1613

17-
(ideally very short, if any)
14+
## 1. Train and test Locally
15+
First step is to train a model locally to make sure it works. You can do this with a Compute Instance, a DSVM, or your local machine. Linux is easier to setup than Windows. You can work from Visual Studio Code, a terminal window, or notebook.
1816

19-
- OS
20-
- Library version
21-
- ...
17+
Once you’ve trained a model, try the score.py script to test it using a subset of the training data.
2218

23-
### Installation
19+
Hints: For local training, the train.py script is setup to use the training data csv file in the same directory. It writes the model file to a new folder, deleting an existing folder with the same name if found. The script uses MLFlow logging and will log to an AzureML workspace without having to submit a run. You may need to configure your Python environment with the packages needed for the training script.
2420

25-
(ideally very short)
2621

27-
- npm install [package name]
28-
- mvn install
29-
- ...
22+
## 2. Train in Cloud
23+
Now that you’ve confirmed the training code works, train the model in the cloud using an AzureML job. Try this from the Studio and the v2 CLI. Use the train.py and training data CSV from the repo (but imagine that you’re training on petabytes of data).
3024

31-
### Quickstart
32-
(Add steps to get up and running quickly)
3325

34-
1. git clone [repository clone url]
35-
2. cd [respository name]
36-
3. ...
26+
## 3. Create Managed Real-Time Endpoint
27+
After training a model, create a real-time managed endpoint and test us using the sample JSON file in the repo. Try this with the Studio and the v2 CLI.
3728

3829

39-
## Demo
30+
You could also write a script or app to use the endpoint.
4031

41-
A demo app is included to show how to use the project.
32+
## 4. Create Managed Batch Endpoint
33+
Next create a batch endpoint. There is a scoring CSV in the repo. If you do this in the Studio, you will see that a pipeline was created. Look at the results and how many flights are predicted to be on-time vs delayed.
4234

43-
To run the demo, follow these steps:
4435

45-
(Add steps to start up the demo)
36+
## 5. Experiment
37+
Now we get into the science part to experiment on how we could improve the model. Edit the train.py file and change [param] to True.
4638

47-
1.
48-
2.
49-
3.
39+
Train the model again and compare metrics with the first model. What do you see?
5040

51-
## Resources
41+
Test the new model by creating a new batch endpoint or scoring locally. Are the results different? What metrics do you think are important to measure quality of this prediction?
5242

53-
(Any additional resources or related projects)
5443

55-
- Link to supporting information
56-
- Link to similar sample
57-
- ...
44+
## 6. Explore the Data
45+
The data imbalance hyperparameter in LightGBM helps in the case where the training data has more on-time flights and so biased the outcome in the model. This is something a data scientist needs to be aware of.
46+
47+
Try different ways to explore the data. Open it in Excel, important it as a tabular dataset, or use Python tools. Think about how we can help with this.
48+
49+
50+
## 7. Reflect and Discuss
51+
This was a simple exercise, but used the breadth of AzureML for ML Pros working with python scripts using Studio, CLI, VS Code or other tools. You had to create a basic workspace (no VNET today), create and use computes, train models, look at metrics, create and test endpoints, and experiment with the training code and data.
52+
Was it easy to get started and figure out what to do? Did you need to look at documentation or samples? We’re things consistent across the Studio, v2 CLI and SDK? Did you have to troubleshoot any error messages? Was the Studio experience intuitive? What are you going to personally improve from what you experienced in this workshop?
53+
54+
55+
## 8. Bonus: Improve the Model
56+
Now think about this as a Kaggle competition. Have some fun and try out different ways to improve the model quality. You could look at AutoML, Flaml, or other approaches. Share your best model with the team.

0 commit comments

Comments
 (0)