GitHub - thomasnield/anaconda_intro_to

Branches Tags
Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
media		media
.gitignore		.gitignore
1_creating_arrays.ipynb		1_creating_arrays.ipynb
2_math_operations_and_iterating.ipynb		2_math_operations_and_iterating.ipynb
3_loading_external_data.ipynb		3_loading_external_data.ipynb
4_slicing_reshaping_copying.ipynb		4_slicing_reshaping_copying.ipynb
5_joining_splitting.ipynb		5_joining_splitting.ipynb
6_searching_sorting_filtering.ipynb		6_searching_sorting_filtering.ipynb
7_random_data_and_sampling.ipynb		7_random_data_and_sampling.ipynb
8_applied_examples.ipynb		8_applied_examples.ipynb
Repository files navigation

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f378a2a9-fe33-46d2-9b2d-afa90de41663",
   "metadata": {},
   "source": [
    "# Slicing, Reshaping, and Copying"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "876d3274-6813-42f3-95a9-aeccacec06b7",
   "metadata": {},
   "source": [
    "In this section we will cover how to slice, reshape, and copy arrays. We will also learn a few practical patterns to apply these to a few tasks. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "94ff8a78-498d-4b2c-8bc3-21b22b84d3c6",
   "metadata": {},
   "source": [
    "## Slicing"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b559f935-d1dc-4157-a7c3-38d712045bf6",
   "metadata": {},
   "source": [
    "Just like you can do with Python collections, you can slice parts of an array. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dff91b76-9255-4bb3-9d86-395cf0747687",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np \n",
    "\n",
    "a = np.array([0,10,20,30,40,50])\n",
    "\n",
    "a[2:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c89d4721-1e81-44af-b25c-9661d6a71138",
   "metadata": {},
   "source": [
    "The range `2:5` means grab the elements from index 2 through 5, but excluding the 5. Another way you can think of it as indexes between each element like below. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f301ac73-4f41-4826-bfdc-bfd073414c35",
   "metadata": {},
   "source": [
    "![](media/npgmtJPW.svg)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5036c55-fb5c-4b4d-86d1-3e58629b75de",
   "metadata": {},
   "source": [
    "You can also use negative indices to index from the end. Below I grab everything up until the last two elements. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42d63e44-2c28-4a6d-9b35-7e5f1c8a58be",
   "metadata": {},
   "outputs": [],
   "source": [
    "a[0:-2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c22d519f-b1e5-45ec-b5b1-82a68ca2100a",
   "metadata": {},
   "source": [
    "### Slicing in Higher Dimensions"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b9e5111f-c22a-48fd-aa8a-a6d3a81f0b88",
   "metadata": {},
   "source": [
    "A common task you will do in machine learning is separate the input variables from the output variables. Below, we bring in the wine quality dataset and extract the input variables `X` and separate the output variable `Y` knowing it is the last column. \n",
    "\n",
    "We can specify multiple slicing ranges for each axis. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "198a3745-37e0-4482-a1d1-365f60e5099e",
   "metadata": {},
   "outputs": [],
   "source": [
    "url = r\"https://raw.githubusercontent.com/thomasnield/machine-learning-demo-data/master/regression/winequality-red.csv\" \n",
    "\n",
    "data = np.loadtxt(fname=url, delimiter=\",\", skiprows=1)\n",
    "X = data[:,0:-1]\n",
    "Y = data[:,-1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e8bfbcb5-db83-41cd-a2e8-bace5902210f",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "X"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7bf016f0-6a8a-45d5-b3a4-06ee98b45198",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "Y"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5078b043-544c-409a-b5d1-2ab19518b2fd",
   "metadata": {},
   "source": [
    "We can also slice in three or more dimensions. Below we slice three layers of an image with red, green, and blue channels. But we only grab the top-left corner of four values of the first two channel layers. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a48185c3-aaa9-4395-a78e-8dcffc5359c9",
   "metadata": {},
   "outputs": [],
   "source": [
    "my_image = np.array([\n",
    "    [[0, 1, 3],\n",
    "     [6, 2, 6], \n",
    "     [1, 5, 4]], \n",
    "    \n",
    "    [[8, 3, 19],\n",
    "     [33, 34, 11], \n",
    "     [13, 14, 89]], \n",
    "    \n",
    "    [[14, 68, 17],\n",
    "     [66, 84, 92], \n",
    "     [4, 2, 58]]\n",
    "])\n",
    "\n",
    "my_image[0:2, 0:2, 0:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06fedc0f-8049-4760-adeb-e7b7bb853378",
   "metadata": {},
   "source": [
    "If you wanted to get the top-left four values on all three layers, use a `:` to include all layers on the first axis. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "659b87a2-57f3-41b6-89bc-253c4ba247e5",
   "metadata": {},
   "outputs": [],
   "source": [
    "my_image[:, 0:2, 0:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f51ac3c-4924-47be-97d1-bf5a70c345fc",
   "metadata": {},
   "source": [
    "Let's bring back the image of the dog. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a6476f8c-7ed5-4237-97dc-2fc259f9dd4a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import urllib.request\n",
    "from PIL import Image \n",
    "\n",
    "# Download the image file \n",
    "url = r\"https://img.freepik.com/free-photo/isolated-happy-smiling-dog-white-background-portrait-4_1562-693.jpg?w=1800&t=st=1690429756~exp=1690430356~hmac=62b3f269479c67fe46d3fa008d9a999d8c6979b69054b84cfd4304d5b41e41a7\"\n",
    "urllib.request.urlretrieve(url, \"happy_dog.png\")\n",
    "\n",
    "# Load the image file\n",
    "image = Image.open(\"happy_dog.png\")\n",
    "\n",
    "# Convert the image to a NumPy array\n",
    "image_array = np.array(image)\n",
    "                       \n",
    "# Print the shape of the array\n",
    "print(image_array.shape)\n",
    "\n",
    "# Save the array as a PNG image\n",
    "plt.imshow(image_array)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67adb50f-44df-4a4f-9435-27f334c639fd",
   "metadata": {},
   "source": [
    "The axes are managed a little bit differently here, but if we wanted to grab the green layer only and display it, we can make the last axis `1` but include everything from the other two axes. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6dc649f9-d40c-499e-8341-6679068090d0",
   "metadata": {},
   "outputs": [],
   "source": [
    "green_layer = image_array[:, :, 1]\n",
    "\n",
    "plt.imshow(green_layer, cmap='Greens')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb9c9377-e3c6-43af-b633-146c0f078287",
   "metadata": {},
   "source": [
    "## Reshaping Arrays"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12ed5452-3b30-499a-ac6c-39759e68ce6c",
   "metadata": {},
   "source": [
    "We can reshape arrays so they conform to a different shape. Let's say we have a simple array of the integers 1 through 9. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b51c0b34-0470-442a-b4f2-4f965c9d75bb",
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.arange(1,10)\n",
    "a"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca08dac2-1d99-4de0-9378-2c0d6e4a395e",
   "metadata": {},
   "source": [
    "We can reshape this to a 3x3 array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4eaed482-fe45-4de1-8797-4c1d928a087d",
   "metadata": {},
   "outputs": [],
   "source": [
    "reshaped_a = a.reshape([3,3])\n",
    "reshaped_a"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38af15bb-0879-43c7-afd8-80afc04ed6ef",
   "metadata": {},
   "source": [
    "We can also shape a 3x3x3 cube matrix of the numbers 1 through 27. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a27bbf81-e8d8-430e-a5ee-3083628770c6",
   "metadata": {},
   "outputs": [],
   "source": [
    "b = np.arange(1,28)\n",
    "b.reshape([3,3,3])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "427587f6-5ef2-48d7-ad08-01b913adb0ef",
   "metadata": {},
   "source": [
    "Or shape it into a 3x9 matrix. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a80e7194-1903-4982-98f3-c0a5477ec237",
   "metadata": {},
   "outputs": [],
   "source": [
    "b.reshape([3,9])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf75de11-09e5-4a11-967c-dc4e93f41ad5",
   "metadata": {},
   "source": [
    "Now let's go back to `a`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "81d7351b-aacb-456f-aa81-51b4b55341ba",
   "metadata": {},
   "outputs": [],
   "source": [
    "a"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97a94226-2083-4cdf-8f74-de5d6c75114a",
   "metadata": {},
   "source": [
    "What happens if we call `reshape(-1,1)`? Something very interesting happens. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "54df026c-a99b-49e5-ae6c-f60740645a4a",
   "metadata": {},
   "outputs": [],
   "source": [
    "a.reshape(-1,1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3187d8e7-06fa-4446-beb7-4ae3019e1e86",
   "metadata": {},
   "source": [
    "We effectively transposed the single row to become a single column. Let's declare another array with a 2x3 shape. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ec683574-1149-406d-8ca3-f0301b849ce2",
   "metadata": {},
   "outputs": [],
   "source": [
    "c = np.arange(1,7).reshape(2,3)\n",
    "c"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1fb53298-727d-410f-b75e-798aae22c101",
   "metadata": {},
   "source": [
    "It is common to have to `transpose()` or call `.T` on a matrix to invert the rows into columns, and vice versa. For example, certain neural network operations have to do this so matrix multiplication matches the number of columns of one matrix to the number of rows in another matrix. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5b11ee1-c0ae-46a8-84e6-a21b75db0b24",
   "metadata": {},
   "outputs": [],
   "source": [
    "c.T"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad01562a-12de-4904-abb5-a71f3cbe827a",
   "metadata": {},
   "source": [
    "You can also call `flatten()` to squash a multidimensional matrix back into one dimension. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee3d03c9-4db6-4c2a-b34c-d0c38794f74b",
   "metadata": {},
   "outputs": [],
   "source": [
    "c.flatten()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "288c86b5-eca4-42c3-b64d-3b1bd1abfa04",
   "metadata": {},
   "source": [
    "## Copy versus View "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a81ebc36-d063-4d73-a555-fb49c93b176b",
   "metadata": {},
   "source": [
    "Let's say we have this 3x3 array. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7151297e-cabd-46ca-81cc-fb250f1aaa96",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "a = np.arange(1,10).reshape([3,3])\n",
    "a"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3d2ace4-d6ff-4912-9e2a-f3551f53b75b",
   "metadata": {},
   "source": [
    "Now let's say I grab the first row and save it to variable. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "239fa154-82ee-4210-aa67-e69ee6ae37a4",
   "metadata": {},
   "outputs": [],
   "source": [
    "z = a[0]\n",
    "z"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "745aa5ec-4e96-40b5-994e-4875135dda35",
   "metadata": {},
   "source": [
    "I then proceed to re-assign the last value in that row to `100`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7fc35cc6-e08e-402e-99b9-fcbed270089d",
   "metadata": {},
   "outputs": [],
   "source": [
    "z[2] = 100\n",
    "z"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf8ee971-a1ed-4227-834e-b8f533bf2882",
   "metadata": {},
   "source": [
    "But uh-oh... ask yourself this. Did we change only `z`? Or has the original `a` matrix changed too? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7f6f803-88f3-4491-8d78-44845cdb17de",
   "metadata": {},
   "outputs": [],
   "source": [
    "a"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f62918c-4243-415a-8096-331129f454fd",
   "metadata": {},
   "source": [
    "This is one of the reasons you have to be really careful when you start reassigning values to arrays. By default, slicing is going to provide \"views\" into an array and pass those changes to the parent array. \n",
    "\n",
    "For this reason, you may want to consider using `copy()` to create a new array. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6dacace6-7a64-4f0b-aa8c-4f1a1dea924a",
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.arange(1,10).reshape([3,3])\n",
    "z = a[0].copy()\n",
    "z[2] = 100 "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ab7b44b-01e6-4fbb-9d8a-354ddcac05d3",
   "metadata": {},
   "source": [
    "Okay, that's better. Now we only modified the copied data in `z`, and did not affect the original array `a`. Sometimes you may want that change to flow backwards through a view. But most of the time, you likely do not want that. Mutability is generally an undesirable feature in programming. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "acd2c407-23b2-473f-85e8-9d453c6b46ce",
   "metadata": {},
   "outputs": [],
   "source": [
    "z"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "612f9588-66ea-473d-b4db-9501a172bf43",
   "metadata": {},
   "outputs": [],
   "source": [
    "a"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "476ad7ec-a654-473e-bd07-642f8225e5e7",
   "metadata": {},
   "source": [
    "That being said, mathematical operations like adding or square root functions will return a new array and not modify the original. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e06d3527-eeaa-4e65-9729-063e6e04673f",
   "metadata": {},
   "source": [
    "## Exercise"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ecccbf7d-318d-4fdb-9afb-b2c639eed938",
   "metadata": {},
   "source": [
    "Rehsape the numeric array below so it is 3x3x3 dimensions to resemble an image with 9 pixels... each containing red, green, and blue values. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "45d53b39-3506-4c0c-a4c0-965c3c256352",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np \n",
    "\n",
    "my_image = np.array([ 0,  1,  3,  6,  2,  6,  1,  5,  4,  8,  3, 19, 33, 34, 11, 13, 14,\n",
    "       89, 14, 68, 17, 66, 84, 92,  4,  2, 58])\n",
    "\n",
    "# PUT YOUR CODE BELOW\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d161f622-0af0-478a-b79b-7853cee98582",
   "metadata": {},
   "source": [
    "### SCROLL DOWN FOR ANSWER\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "|<br>\n",
    "v "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6b68f5d-ad6f-4b3b-b155-dd69605edf1c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np \n",
    "\n",
    "my_image = np.array([ 0,  1,  3,  6,  2,  6,  1,  5,  4,  8,  3, 19, 33, 34, 11, 13, 14,\n",
    "       89, 14, 68, 17, 66, 84, 92,  4,  2, 58])\n",
    "\n",
    "# PUT YOUR CODE BELOW\n",
    "\n",
    "my_image.reshape([3,3,3])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}