|
326 | 326 | },
|
327 | 327 | "source": [
|
328 | 328 | "### Question 4\n",
|
329 |
| - "*Are there any data points considered outliers for more than one feature? Should these data points be removed from the dataset? If any data points were added to the `outliers` list to be removed, explain why.* " |
| 329 | + "*Are there any data points considered outliers for more than one feature based on the definition above? Should these data points be removed from the dataset? If any data points were added to the `outliers` list to be removed, explain why.* " |
330 | 330 | ]
|
331 | 331 | },
|
332 | 332 | {
|
|
667 | 667 | },
|
668 | 668 | "source": [
|
669 | 669 | "### Question 10\n",
|
670 |
| - "*Companies often run [A/B tests](https://en.wikipedia.org/wiki/A/B_testing) when making small changes to their products or services. If the wholesale distributor wanted to change its delivery service from 5 days a week to 3 days a week, how would you use the structure of the data to help them decide on a group of customers to test?* \n", |
671 |
| - "**Hint:** Would such a change in the delivery service affect all customers equally? How could the distributor identify who it affects the most?" |
| 670 | + "Companies often run [A/B tests](https://en.wikipedia.org/wiki/A/B_testing) when making small changes to their products or services to determine whether that change affects its customers positively or negatively. The wholesale distributor wants to consider changing its delivery service from 5 days a week to 3 days a week, but will only do so if it affects their customers positively. *How would you use the customer segments you found above to perform an A/B Test for this change?* \n", |
| 671 | + "**Hint:** Can we assume the change affects all customers equally? How can we determine which group of customers it affects the most?" |
672 | 672 | ]
|
673 | 673 | },
|
674 | 674 | {
|
|
683 | 683 | "metadata": {},
|
684 | 684 | "source": [
|
685 | 685 | "### Question 11\n",
|
686 |
| - "*Assume the wholesale distributor wanted to predict a new feature for each customer based on the purchasing information available. How could the wholesale distributor use the structure of the clustering data you've found to assist a supervised learning analysis?* \n", |
| 686 | + "Additional structure is derived from originally unlabelled data when using clustering techniques. Since each customer has a segment it best identifies with (depending on the clustering algorithm applied), we can consider *'customer segment'* as an **engineered feature** for the data. Assume the wholesale distributor recently acquired ten new customers and has made estimates for each customer's annual spending of the six product categories. Knowing these estimates, the wholesale distributor wants to classify each new customer to one of the customer segments to determine the most appropriate delivery service. \n", |
| 687 | + "*Describe a supervised learning strategy you could use to make classification predictions for the ten new customers.* \n", |
687 | 688 | "**Hint:** What other input feature could the supervised learner use besides the six product features to help make a prediction?"
|
688 | 689 | ]
|
689 | 690 | },
|
|
700 | 701 | "source": [
|
701 | 702 | "### Visualizing Underlying Distributions\n",
|
702 | 703 | "\n",
|
703 |
| - "At the beginning of this project, it was discussed that the `'Channel'` and `'Region'` features would be excluded from the dataset so that the customer product categories were emphasized in the analysis. By reintroducing the `'Channel'` feature to the dataset, an interesting structure emerges when considering the same PCA dimensionality reduction applied earlier on to the original dataset.\n", |
| 704 | + "At the beginning of this project, it was discussed that the `'Channel'` and `'Region'` features would be excluded from the dataset so that the customer product categories were emphasized in the analysis. By reintroducing the `'Channel'` feature to the dataset, an interesting structure emerges when considering the same PCA dimensionality reduction applied earlier to the original dataset.\n", |
704 | 705 | "\n",
|
705 | 706 | "Run the code block below to see how each data point is labeled either `'HoReCa'` (Hotel/Restaurant/Cafe) or `'Retail'` the reduced space. In addition, you will find the sample points are circled in the plot, which will identify their labeling."
|
706 | 707 | ]
|
|
0 commit comments