|
239 | 239 | "metadata": {},
|
240 | 240 | "source": [
|
241 | 241 | "## Training and Evaluating Models\n",
|
242 |
| - "In this section, you will choose 3 supervised learning models that are appropriate for this problem and available in `scikit-learn`. You will first discuss the reasoning behind choosing these three models by considering what you know about the data and each model's strengths and weaknesses. You will then fit the model to varying sizes of training data (100 data points, 200 data points, and 300 data points) and measure the F<sub>1</sub> score. You will need to produce three tables (one for each model) that shows the training set size, training time, prediction time, F<sub>1</sub> score on the training set, and F<sub>1</sub> score on the testing set." |
| 242 | + "In this section, you will choose 3 supervised learning models that are appropriate for this problem and available in `scikit-learn`. You will first discuss the reasoning behind choosing these three models by considering what you know about the data and each model's strengths and weaknesses. You will then fit the model to varying sizes of training data (100 data points, 200 data points, and 300 data points) and measure the F<sub>1</sub> score. You will need to produce three tables (one for each model) that shows the training set size, training time, prediction time, F<sub>1</sub> score on the training set, and F<sub>1</sub> score on the testing set.\n", |
| 243 | + "\n", |
| 244 | + "**The following supervised learning models are currently available in** [`scikit-learn`](http://scikit-learn.org/stable/supervised_learning.html) **that you may choose from:**\n", |
| 245 | + "- Gaussian Naive Bayes (GaussianNB)\n", |
| 246 | + "- Decision Trees\n", |
| 247 | + "- Ensemble Methods (Bagging, AdaBoost, Random Forest, Gradient Boosting)\n", |
| 248 | + "- K-Nearest Neighbors (KNeighbors)\n", |
| 249 | + "- Stochastic Gradient Descent (SGDC)\n", |
| 250 | + "- Support Vector Machines (SVM)\n", |
| 251 | + "- Logistic Regression" |
243 | 252 | ]
|
244 | 253 | },
|
245 | 254 | {
|
246 | 255 | "cell_type": "markdown",
|
247 | 256 | "metadata": {},
|
248 | 257 | "source": [
|
249 | 258 | "### Question 2 - Model Application\n",
|
250 |
| - "*List three supervised learning models that are appropriate for this problem. What are the general applications of each model? What are their strengths and weaknesses? Given what you know about the data, why did you choose these models to be applied?*" |
| 259 | + "*List three supervised learning models that are appropriate for this problem. For each model chosen*\n", |
| 260 | + "- Describe one real-world application in industry where the model can be applied. *(You may need to do a small bit of research for this — give references!)* \n", |
| 261 | + "- What are the strengths of the model; when does it perform well? \n", |
| 262 | + "- What are the weaknesses of the model; when does it perform poorly?\n", |
| 263 | + "- What makes this model a good candidate for the problem, given what you know about the data?" |
251 | 264 | ]
|
252 | 265 | },
|
253 | 266 | {
|
|
413 | 426 | "cell_type": "markdown",
|
414 | 427 | "metadata": {},
|
415 | 428 | "source": [
|
416 |
| - "### Question 3 - Chosing the Best Model\n", |
| 429 | + "### Question 3 - Choosing the Best Model\n", |
417 | 430 | "*Based on the experiments you performed earlier, in one to two paragraphs, explain to the board of supervisors what single model you chose as the best model. Which model is generally the most appropriate based on the available data, limited resources, cost, and performance?*"
|
418 | 431 | ]
|
419 | 432 | },
|
|
429 | 442 | "metadata": {},
|
430 | 443 | "source": [
|
431 | 444 | "### Question 4 - Model in Layman's Terms\n",
|
432 |
| - "*In one to two paragraphs, explain to the board of directors in layman's terms how the final model chosen is supposed to work. For example if you've chosen to use a decision tree or a support vector machine, how does the model go about making a prediction?*" |
| 445 | + "*In one to two paragraphs, explain to the board of directors in layman's terms how the final model chosen is supposed to work. Be sure that you are describing the major qualities of the model, such as how the model is trained and how the model makes a prediction. Avoid using advanced mathematical or technical jargon, such as describing equations or discussing the algorithm implementation.*" |
433 | 446 | ]
|
434 | 447 | },
|
435 | 448 | {
|
|
0 commit comments