Ccs346 Eda Unit 1
Ccs346 Eda Unit 1
COURSE OBJECTIVES:
To outline an overview of exploratory data analysis.
To implement data visualization using Matplotlib.
To perform univariate data exploration and analysis.
To apply bivariate data exploration and analysis.
To use Data exploration and visualization techniques for multivariate and time series data.
COURSE OUTCOMES:
At the end of this course, the students will be able to:
CO1: Understand the fundamentals of exploratory data analysis.
CO2: Implement the data visualization using Matplotlib.
CO3: Perform univariate data exploration and analysis.
CO4: Apply bivariate data exploration and analysis.
CO5: Use Data exploration and visualization techniques for multivariate and time series data.
TOTAL: 60 PERIODS
TEXT BOOKS:
1. Suresh Kumar Mukhiya, Usman Ahmed, “Hands-On Exploratory Data Analysis with Python”,
Packt Publishing, 2020. (Unit 1)
2. Jake Vander Plas, "Python Data Science Handbook: Essential Tools for Working with Data",
First Edition, O Reilly, 2017. (Unit 2)
3. Catherine Marsh, Jane Elliott, “Exploring Data: An Introduction to Data Analysis for Social
Scientists”, Wiley Publications, 2nd Edition, 2008. (Unit 3,4,5)
REFERENCES:
1. Eric Pimpler, Data Visualization and Exploration with R, GeoSpatial Training service, 2017.
2. Claus O. Wilke, “Fundamentals of Data Visualization”, O’reilly publications, 2019.
3. Matthew O. Ward, Georges Grinstein, Daniel Keim, “Interactive Data Visualization:
Foundations, Techniques, and Applications”, 2nd Edition, CRC press, 2015.
Visual Aids for EDA
As data scientists, two important goals in our work would be to extract knowledge from the
data and to present the data to stakeholders. Presenting results to stakeholders is very
complex in the sense that our audience may not have enough technical know-how to
understand programming jargon and other technicalities. Hence, visual aids are very useful
tools. In this chapter, we will focus on different types of visual aids that can be used with
our datasets. We are going to learn about different types of techniques that can be used in
the visualization of data.
In this chapter, we will cover the following topics:
Line chart
Bar chart
Scatter plot
Area plot and stacked plot
Pie chart
Table chart
Polar chart
Histogram
Lollipop chart
Choosing the best chart
Other libraries to explore
Line chart
Line plots or line graphs are a fundamental type of chart used to represent data points
connected by straight lines. They are widely used to illustrate trends or changes in
data over time or across categories. Line plots are easy to understand, versatile, and
can be used to visualize different types of data, making them useful tools in data
analysis and communication.
When it comes to creating line plots in Python, you have two primary libraries to choose
from: `Matplotlib` and `Seaborn`.
Using “Matplotlib”:
`Matplotlib` is a highly customizable library that can produce a wide range of plots, including
line plots. With Matplotlib, you can specify the appearance of your line plots using a variety
of options such as line style, color, marker, and label.
A single-line plot is used to display the relationship between two variables, where one
variable is plotted on the x-axis and the other on the y-axis. This type of plot is best used for
displaying trends over time, as it allows you to see how one variable changes in response to
the other over a continuous period.
1
Steps involved
Let's look at the process of creating the line chart:
1. Load and prepare the dataset.
2. Import the matplotlib library. It can be done with this command:
import matplotlib.pyplot as plt
3. Plot the graph:
plt.plot(df)
4. Display it on the screen:
plt.show()
#Loading Library
import matplotlib.pyplot as plt
#Assigning labels
plt.xlabel('X-axis label')
plt.ylabel('Y-axis label')
Line plots have some limitations that need to be considered when using them for data
visualization. These include:
2
1. Limited data types: Line plots are not suitable for all types of data. For example,
they may not work well with data that has multiple categories or data with nonlinear
relationships.
2. Can be misleading: If the scale of the y-axis is not carefully chosen, line plots can be
misleading. It is important to choose appropriate scales to avoid misinterpretation of
the data.
3. Lack of context: Line plots only show the relationship between two variables, and do
not provide context about other factors that may be influencing the data.
4. Limited visual impact: Line plots may not be as visually impactful as other types of
data visualizations, such as bar charts or scatter plots.
5. Difficulty comparing multiple datasets: When using multiple line plots to compare
different datasets, it can be difficult to visually compare the lines if they are not
plotted on the same scale or with the same y-axis limits
___________________________________________________________________________
_______
Bar charts
This is one of the most common types of visualization that almost everyone must have
encountered. Bars can be drawn horizontally or vertically to represent categorical
variables.
Bar charts are frequently used to distinguish objects between distinct collections in order to
track variations over time. In most cases, bar charts are very convenient when the changes
are large. In order to learn about bar charts, let's assume a pharmacy in Norway keeps track
of the amount of Zoloft sold every month. Zoloft is a medicine prescribed to patients
suffering from depression. We can use the calendar Python library to keep track of the
months of the year (1 to 12) corresponding to January to December:
3
#Horizontal Bar
colors = ['green', 'blue', 'purple', 'brown', 'teal']
plt.barh(country, gdp_per_capita, color=colors)
plt.title('Country Vs GDP Per Capita', fontsize=14)
plt.xlabel('Country', fontsize=14)
plt.ylabel('GDP Per Capita', fontsize=14)
plt.grid(True)
plt.show()
Scatter plot
Scatter plots are also called scatter graphs, scatter charts, scattergrams, and scatter diagrams.
They use a Cartesian coordinates system to display values of typically two variables for a
set of data.
When one continuous variable is dependent on another variable, which is under the control of
the observer When both continuous variables are independent.
There are two important concepts—independent variable and dependent variable. In
statistical modelling or mathematical modelling, the values of dependent variables rely on the
values of independent variables. The dependent variable is the outcome variable being
studied. The independent variables are also referred to as regressors. The takeaway message
here is that scatter plots are used when we need to show the relationship between two
variables, and hence are sometimes referred to as correlation plots
4
plt.colorbar()
plt.show()
You can combine a colormap with different sizes of the dots. This is best visualized if the
dots are transparent:
Example
Create random arrays with 100 values for x-points, y-points, colors and sizes:
x = np.random.randint(100, size=(100))
y = np.random.randint(100, size=(100))
colors = np.random.randint(100, size=(100))
sizes = 10 * np.random.randint(100, size=(100))
plt.colorbar()
plt.show()
5
# import required modules
import matplotlib.pyplot as plt
# adjust coordinates
x = [1,2,3,4,5]
y1 = [2,4,6,8,10]
y2 = [3,6,9,12,15]
# depict illustration
plt.scatter(x, y1)
plt.scatter(x,y2)
# apply legend()
plt.legend(["x*2" , "x*3"])
plt.show()
6
5. Create the labels for the axes:
plt.xlabel('Septal Length')
plt.ylabel('Petal length')
6. Display the plot on the screen:
plt.show()
When using matplotlib, the stackplot function will allow you to create a stacked area plot in
Python. The function has two ways to input data, the fist one is stackplot(x, y), being x an
array for the values for the X-axis and y a multidimensional array representing the unstacked
values for the series and the second one is stackplot(x, y1, y2, ..., yn) where in this case y1,
y2, ..., yn are the individual unstacked arrays for each series, being n the number of series or
areas. See the example below for clarification.
import numpy as np
import matplotlib.pyplot as plt
# Data
x = np.arange(2015, 2021, 1)
series1 = [2, 3, 5, 3, 5, 6]
series2 = [1, 3, 5, 2, 5, 3]
series3 = [4, 1, 2, 4, 6, 1]
y = np.vstack([series1, series2, series3])
ax.stackplot(x, y)
7
Axis limits
You might have noticed that there is a gap between the areas and the vertical lines of the box
of the plot. If you want, you can set the axis limits with the following line to remove the gaps.
import numpy as np
import matplotlib.pyplot as plt
# Data
x = np.arange(2015, 2021, 1)
series1 = [2, 3, 5, 3, 5, 6]
series2 = [1, 3, 5, 2, 5, 3]
series3 = [4, 1, 2, 4, 6, 1]
y = np.vstack([series1, series2, series3])
ax.stackplot(x, y)
plt.show()
Adding a legend
Note that the stackplot function provides an argument named labels. You can pass an array of
labels for each area to this argument in case you want to add a legend to the chart with
ax.legend.
import numpy as np
import matplotlib.pyplot as plt
# Data
x = np.arange(2015, 2021, 1)
8
series1 = [2, 3, 5, 3, 5, 6]
series2 = [1, 3, 5, 2, 5, 3]
series3 = [4, 1, 2, 4, 6, 1]
y = np.vstack([series1, series2, series3])
# Axis limits
ax.set(xlim = (min(x), max(x)), xticks = x)
# plt.show()
Color customization
The colors argument can be used to modify the default color palette of the area chart. You
can pass as many colors as areas to this argument, as in the example below. Recall that the
transparency of the areas can be set with alpha.
import numpy as np
import matplotlib.pyplot as plt
# Data
x = np.arange(2015, 2021, 1)
series1 = [2, 3, 5, 3, 5, 6]
series2 = [1, 3, 5, 2, 5, 3]
series3 = [4, 1, 2, 4, 6, 1]
y = np.vstack([series1, series2, series3])
# Array of colors
cols = ['#FDF5E6', '#FFEBCD', '#DEB887']
9
colors = cols, alpha = 0.9)
# Legend
ax.legend(loc = 'upper left')
# Axis limits
ax.set(xlim = (min(x), max(x)), xticks = x)
plt.show()
Baseline methods
The stackplot function provides several methods to customize the baseline. By default, the
baseline is zero, e.g. baseline = 'zero'.
Setting baseline = 'sym' will create a symmetric stacked area chart around zero. This is
sometimes called “ThemeRiver”.
import numpy as np
import matplotlib.pyplot as plt
# Data
x = np.arange(2015, 2021, 1)
series1 = [2, 3, 5, 3, 5, 6]
series2 = [1, 3, 5, 2, 5, 3]
series3 = [4, 1, 2, 4, 6, 1]
y = np.vstack([series1, series2, series3])
# Axis limits
ax.set(xlim = (min(x), max(x)), xticks = x)
10
# plt.show()
Another plot we're going to familiarize ourselves with is the area chart. It is based on the
line chart. The main difference lies in the X-axis. In an area chart, the part between the X-axis
and the line is filled with color. Area charts and line charts are good for visualizing data that
change over time. In this topic, we will learn to create area charts with matplotlib.
As you remember from the introductory topic, the first step is always to import matplotlib to
your code:
You are ready to plot your first area chart! We'll do it with the help of plt.fill_between(x, y).
As you can see, this function takes two arguments – two arrays of numeric values. Let's say
you want to plot the number of carrots your hamster Bonnie chomps each month of the year.
We create a variable called months that stores numbers from 1 to 12 and a carrots variable
that contains a list of 12 values: carrots consumed in one month.
The X-axis values come first — first months, than carrots, not the other way around.
months = range(1, 13)
carrots = [14, 13, 10, 15, 17, 15, 15, 13, 12, 10, 14, 11]
plt.fill_between(months, carrots)
The plot is already cool. But as usual, we can work on the clarity. What do the values on the
X- and Y-axes represent? Couldn't it be better to have all numbers from 1 to 12 on the X-axis
rather than 2, 4, 6, 8, 10, and 12? Let's address these issues.
Color is specified in the plt.fill_between() function. Use the color attribute and a str as its
value. This list of named colors can help you. "Named" means you can type the color name
as a value; for other colors, you need the RGB code.
The plot title and labels for both axes are on separate lines; use the following functions:
plt.xlabel(), plt.ylabel(), plt.title(), and str as arguments.
plt.xlabel("Months")
plt.ylabel("Number of carrots")
plt.title("Bonnie's monthly carrot intake")
Having taken a look at the graph, we don't require additional explanations on what all these
numbers mean. As we've mentioned, it'd be nice to change the numbers on the axes. Let's
move on to that!
To change the numbers on the axes, we need to use separate functions, just like for titles.
These functions are plt.xticks() and plt.yticks() that take arrays of numeric values. We want
to display the numbers from 1 to 12 on the X-axis (you may want to create a list or use
range(1, 13) to save time) and numbers from 0 to 20 with a step of 5 on the Y-axis (range(0,
21, 5)):
plt.xticks(range(1, 13))
plt.yticks(range(0, 21, 5))
12
Much better, isn't it? You can see that your hamster had at least 10 carrots per month, and the
average value was somewhere between 10 and 15. It's a good thing to know when you're
planning your shopping list!
Imagine that you have two hamsters. Bonnie has a friend named Clyde. It infers two carrot
datasets that you want to plot on the same graph to compare the data. When you plot several
datasets on one area chart, it turns to a stacked area chart. You can use it to display big
stacks of data and see how much each stacked group (in our case, each hamster) contributes
to the total.
To create this type of area chart, let's refer to another function: plt.stackplot(x, y1, y2). Note
that the X-axis data still comes first and is followed by your datasets (two or more). Take a
look at the data plot of carrot consumption for Bonnie and Clyde:
We have changed the values for plt.yticks() to accommodate the new data. Let's have a look
at the result:
13
You can see the difference between the two datasets, but which is Bonnie and which is
Clyde? To clarify that, we need to add a legend. You can do it in two steps: first, add the
labels argument to the plt.stackplot() function, and then add the plt.legend() function without
arguments. You can change the colors in this kind of area chart too — just pass a list of str to
the colors argument (note that it's colors, not color):
Sometimes, you want to represent only the difference between two datasets, not their
cumulative total. To do it, you need to plot two separate lines using plt.plot(x, y) and add
plt.fill_between(). In this example, we also add plt.grid() to add a grid in the background that
facilitates interpretation:
plt.xlabel('Months')
plt.ylabel('Number of carrots')
plt.title("The difference in monthly carrot intake")
plt.xticks(range(1, 13))
plt.yticks(range(0, 21, 5))
plt.legend()
plt.grid()
14
We can see that Bonnie and Clyde had the same number of carrots in July, but Bonnie
chomped more in April, May, and June.
Pie Chart
Given a set of categories or groups with their corresponding values you can make use of the
pie function from matplotlib to create a pie chart in Python. Pass the labels and the values as
input to the function to create a pie chart counterclockwise, as in the example below. Note
that by default the area of the slices will be calculated as each value divided by the sum of
values.
import matplotlib.pyplot as plt
# Data
labels = ["G1", "G2", "G3", "G4", "G5"]
value = [12, 22, 16, 38, 12]
# Pie chart
fig, ax = plt.subplots()
ax.pie(value, labels = labels)
# plt.show()
Partial pie
If your data doesn’t sum up to one and you don’t want to normalize your data you can set
normalize = False, so a partial pie chart will be created.
15
import matplotlib.pyplot as plt
# Data
labels = ["G1", "G2", "G3", "G4", "G5"]
value = [0.1, 0.2, 0.1, 0.2, 0.1]
# Pie chart
fig, ax = plt.subplots()
ax.pie(value, labels = labels, normalize = False)
# plt.show()
As stated before, the pie chart will be created by default counterclockwise. To set a clockwise
direction set the argument counterclock as False.
# Data
labels = ["G1", "G2", "G3", "G4", "G5"]
value = [12, 22, 16, 38, 12]
# Pie chart
fig, ax = plt.subplots()
ax.pie(value, labels = labels, counterclock = False)
# plt.show()
Start angle
The pie will rotate counterclockwise from the X-axis by default. You can change the start
angle with startangle. As an example, if you set this argument to 90 the first slice will start to
rotate counterclokwise perpendicular to the X-axis.
16
# Data
labels = ["G1", "G2", "G3", "G4", "G5"]
value = [12, 22, 16, 38, 12]
# Pie chart
fig, ax = plt.subplots()
ax.pie(value, labels = labels, startangle = 90)
# plt.show()
Size (radius)
The size of the pie can be controlled with the radius argument, which defaults to 1.
# Data
labels = ["G1", "G2", "G3", "G4", "G5"]
value = [12, 22, 16, 38, 12]
# Pie chart
fig, ax = plt.subplots()
ax.pie(value, labels = labels, radius = 0.5)
# plt.show()
Explode
Note that you can also explode (offset) one or some slices of the pie passing an array of the
length of the data to explode.
17
import matplotlib.pyplot as plt
# Data
labels = ["G1", "G2", "G3", "G4", "G5"]
value = [12, 22, 16, 38, 12]
explode = [0, 0, 0, 0.1, 0]
# Pie chart
fig, ax = plt.subplots()
ax.pie(value, labels = labels, explode = explode)
# plt.show()
Add a shadow
The pie function also allows adding a shadow to the pie setting the shadow argument to True.
# Data
labels = ["G1", "G2", "G3", "G4", "G5"]
value = [12, 22, 16, 38, 12]
# Pie chart
fig, ax = plt.subplots()
ax.pie(value, labels = labels, shadow = True)
# plt.show()
You might have noticed that the default pie doesn’t display the typical frame of the charts
created with matplotlib. In case you want to add it you can set frame = True.
18
# Data
labels = ["G1", "G2", "G3", "G4", "G5"]
value = [12, 22, 16, 38, 12]
# Pie chart
fig, ax = plt.subplots()
ax.pie(value, labels = labels, frame = True)
# plt.show()
In addition to the group labels you can also display the count or the percentages for each slice
with the autopct argument, as shown below.
# Data
labels = ["G1", "G2", "G3", "G4", "G5"]
value = [12, 22, 16, 38, 12]
# Pie chart
fig, ax = plt.subplots()
ax.pie(value, labels = labels, autopct = '%1.1f%%')
# plt.show()
19
Note that you can customize the distance of these labels from the origin and display them
instead of the group labels. The default value is 0.6.
# Data
labels = ["G1", "G2", "G3", "G4", "G5"]
value = [12, 22, 16, 38, 12]
# Pie chart
fig, ax = plt.subplots()
ax.pie(value, autopct = '%1.1f%%', pctdistance = 1.1)
# plt.show()
The colors argument allows customizing the fill color for each slice. You can input an array
of ordered colors to change the color for each category.
# Data
labels = ["G1", "G2", "G3", "G4", "G5"]
value = [12, 22, 16, 38, 12]
colors = ["#B9DDF1", "#9FCAE6", "#73A4CA", "#497AA7", "#2E5B88"]
# Pie chart
fig, ax = plt.subplots()
ax.pie(value, labels = labels, colors = colors)
# plt.show()
Border color
20
In case you want to add a border you can use the wedgeprops argument and set a line width
and a border color with a dict, as in the example below.
# Data
labels = ["G1", "G2", "G3", "G4", "G5"]
value = [12, 22, 16, 38, 12]
colors = ["#B9DDF1", "#9FCAE6", "#73A4CA", "#497AA7", "#2E5B88"]
# Pie chart
fig, ax = plt.subplots()
ax.pie(value, labels = labels, colors = colors,
wedgeprops = {"linewidth": 1, "edgecolor": "white"})
# plt.show()
In this example, we create a database of average scores of subjects for 5 consecutive years.
We import packages and plotline plots for each consecutive year. A table can be added to
Axes using matplotlib.pyplot.table(). We can plot the table by taking columns on the x-axis
and the y-axis for values.
Syntax
21
[97, 92, 95, 94, 96],
[98, 95, 93, 95, 94],
[96, 94, 94, 92, 95],
[95, 90, 91, 94, 98]]
plt.ylabel("marks".format(value_increment))
plt.xticks([])
plt.title('average marks in each consecutive year')
plt.show()
22
23
NumPy– Introduction
NumPy (Numerical Python) is an open source Python library that’s used in almost every field of
science and engineering. It’s the universal standard for working with numerical data in Python, and
it’s at the core of the scientific Python and PyData ecosystems. NumPy users include everyone from
beginning coders to experienced researchers doing state-of-the-art scientific and industrial research
and development. The NumPy API is used extensively in Pandas, SciPy, Matplotlib, scikit-learn, scikit-
image and most other data science and scientific Python packages.
The NumPy library contains multidimensional array and matrix data structures (you’ll find more
information about this in later sections). It provides ndarray, a homogeneous n-dimensional array
object, with methods to efficiently operate on it. NumPy can be used to perform a wide variety of
mathematical operations on arrays. It adds powerful data structures to Python that guarantee
efficient calculations with arrays and matrices and it supplies an enormous library of high-level
mathematical functions that operate on these arrays and matrices.
The best way to enable NumPy is to use an installable binary package specific to your operating
system. These binaries contain full SciPy stack (inclusive of NumPy, SciPy, matplotlib, IPython, SymPy
and nose packages along with core Python).
NumPy– Introduction
NumPy is a Python package. It stands for 'Numerical Python'. It is a library consisting
of multidimensional array objects and a collection of routines for processing of
array.
To test whether NumPy module is properly installed, try to import it from python prompt.
import numpy
1
Alternatively, NumPy package is imported using the following syntax:
import numpy as np
numpy.array
It creates an ndarray from any object exposing array interface, or from any method that returns an
array.
Any object exposing the array interface method returns an array, or any (nested)
object
sequence
By default, returned array forced to be a base class array. If true, sub-classes passed
subok
through
2
ndimin Specifies minimum dimensions of resultant array
import array
L = list(range(10))
A = array.array('i', L)
A
Output:
array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Example
Input: [1, 7, 0, 6, 2, 5, 6]
Output: [1 7 0 6 2 5 6]
Explanation: Given Python List is converted into NumPy Array
Convert Python List to Numpy Arrays
In Python, lists can be converted into arrays by using two methods from the NumPy library:
Using numpy.array()
Using numpy.asarray()
# importing library
import numpy
# initializing list
lst = [1, 7, 0, 6, 2, 5, 6]
# displaying list
print ("List: ", lst)
3
# displaying array
print ("Array: ", arr)
Output
List: [1, 7, 0, 6, 2, 5, 6]
Array: [1 7 0 6 2 5 6]
NumPy stands for Numerical Python. It is a Python library used for working with an array. In Python,
we use the list for purpose of the array but it’s slow to process. NumPy array is a powerful N-
dimensional array object and its use in linear algebra, Fourier transform, and random number
capabilities. It provides an array object much faster than traditional Python lists.
Types of Array:
1. One Dimensional Array
2. Multi-Dimensional Array
Example:
# creating list
list = [1, 2, 3, 4]
4
Two Dimensional Array
Example:
# creating list
list_1 = [1, 2, 3, 4]
list_2 = [5, 6, 7, 8]
list_3 = [9, 10, 11, 12]
Output
Numpy multi dimensional array in python
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
numpy.empty() function
The numpy.empty() function is used to create a new array of given shape and type, without
initializing entries. It is typically used for large arrays when performance is critical, and the values will
be filled in later.
5
Syntax:
Parameters:
Required
Name Description /
Optional
shape Shape of the empty array, e.g., (2, 3) or 2. Required
dtype Desired output data-type for the array, e.g, numpy.int8. Default is numpy.float64. optional
Whether to store multi-dimensional data in row-major ('C' for C-style) or column-
order major ('F' for Fortran-style) order in memory. optional argument representing the optional
memory layout of the array.
Return value:
[ndarray] Array of uninitialized (arbitrary) data of the given shape, dtype, and order. Object arrays
will be initialized to None.
Output
array([ 6.95033087e-310, 1.69970835e-316])
np.empty(32)
Ouput
array([ 6.95033087e-310, 1.65350412e-316, 6.95032869e-310,
6.95032869e-310, 6.95033051e-310, 6.95033014e-310,
6.95033165e-310, 6.95033167e-310, 6.95033163e-310,
6.95032955e-310, 6.95033162e-310, 6.95033166e-310,
6.95033160e-310, 6.95033163e-310, 6.95033162e-310,
6.95033167e-310, 6.95033167e-310, 6.95033167e-310,
6.95033167e-310, 6.95033158e-310, 6.95033160e-310,
6.95033164e-310, 6.95033162e-310, 6.95033051e-310,
6.95033161e-310, 6.95033051e-310, 6.95033013e-310,
6.95033166e-310, 6.95033161e-310, 2.97403466e+289,
7.55774284e+091, 1.31611495e+294])
np.empty([2, 3])
Ouput
6
array([[ 6.95033087e-310, 1.68240973e-316, 6.95032825e-310],
[ 6.95032825e-310, 6.95032825e-310, 6.95032825e-310]])
The above code demonstrates the use of np.empty() function in NumPy to create empty arrays of
different sizes and data types. The np.empty() function creates an array without initializing its
values, which means that the values of the array are undefined and may vary each time the function
is called.
In the second example, an empty 2D array of size (2, 2) is created with the specified data type float.
The resulting array contains four undefined floating-point values. The values shown in the output are
also machine-dependent and may vary each time the function is called.
import numpy as np
Syntax:
7
Parameters:
Required
Name Description /
Optional
N Number of rows in the output. Required
M Number of columns in the output. If None, defaults to N. optional
Index of the diagonal: 0 (the default) refers to the main diagonal, a positive value
k optional
refers to an upper diagonal, and a negative value to a lower diagonal.
dtype Data-type of the returned array. optional
Whether the output should be stored in row-major (C-style) or column-major
order optional
(Fortran-style) order in memory
Return value:
[ndarray of shape (N,M)] An array where all elements are equal to zero, except for the k-th diagonal,
whose values are equal to one.
import numpy as np
np.eye(2)
Ouput
array([[ 1., 0.],
[ 0., 1.]])
np.eye(2,3)
Output
array([[ 1., 0., 0.],
[ 0., 1., 0.]])
Ouput
np.eye(3, 3)
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
8
In linear algebra, an identity matrix is a square matrix with ones on the main diagonal and zeros
elsewhere.
In the first example, np.eye(2) creates a 2x2 identity matrix where both the rows and columns are
equal to 2.
In the second example, np.eye(2,3) creates a 2x3 identity matrix where the first argument specifies
the number of rows and the second argument specifies the number of columns.
In the third example, np.eye(3,3) creates a 3x3 identity matrix where both the rows and columns are
equal to 3.
Output:
[[0 1 0 0 0]
[0 0 1 0 0]
[0 0 0 1 0]
[0 0 0 0 1]
[0 0 0 0 0]]
In the above example, we create a sparse identity matrix with dimensions (5, 5) and a diagonal offset
of 1. This means that the diagonal elements are shifted one position to the right, resulting in a matrix
with 1's on the first upper diagonal and 0's elsewhere.
NumPy: numpy.ones() function
numpy.ones() function
The numpy.ones() function is used to create a new array of given shape and type, filled with ones.
The ones() function is useful in situations where we need to create an array of ones with a specific
shape and data type, for example in matrix operations or in initializing an array with default values.
Syntax:
numpy.ones(shape, dtype=None, order='C')
Parameters:
Required
Name Description /
Optional
shape Shape of the new array, e.g., (2, 3) or 2. Required
dtype The desired data-type for the array, e.g., numpy.int8. Default is numpy.float64. optional
9
Whether to store multi-dimensional data in row-major (C-style) or column-major
order optional
(Fortran-style) order in memory
Return value:
[ndarray] Array of ones with the given shape, dtype, and order.
import numpy as np
np.ones(7)
Ouput
array([ 1., 1., 1., 1., 1., 1., 1.])
np.ones((2, 1))
Output
array([[ 1.],
[ 1.]])
np.ones(7,)
Output
array([ 1., 1., 1., 1., 1., 1., 1.])
x = (2, 3)
Output
np.ones(x)
array([[ 1., 1., 1.],
[ 1., 1., 1.]])
In the above code:
np.ones(7): This creates a 1-dimensional array of length 7 with all elements set to 1.
np.ones((2, 1)): This creates a 2-dimensional array with 2 rows and 1 column, with all elements set to
1.
np.ones(7,): This is equivalent to np.ones(7) and creates a 1-dimensional array of length 7 with all
elements set to 1.
x = (2, 3) and np.ones(x): This creates a 2-dimensional array with 2 rows and 3 columns, with all
elements set to 1.
10
NumPy: numpy.zeros() function
numpy.zeros() function
The numpy.zeros() function is used to create an array of specified shape and data type, filled with
zeros. The function is commonly used to initialize an array of a specific size and type, before filling it
with actual values obtained from some calculations or data sources. It is also used as a placeholder
to allocate memory for later use.
Syntax:
Parameters:
Required
Name Description /
Optional
shape Shape of the new array, e.g., (2, 3) or 2. Required
dtype The desired data-type for the array, e.g., numpy.int8. Default is numpy.float64. optional
Whether to store multi-dimensional data in row-major (C-style) or column-major
order optional
(Fortran-style) order in memory
Return value:
[ndarray] Array of zeros with the given shape, dtype, and order.
11
import numpy as np
a = (3,2)
np.zeros(a)
Output
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
In the above code a tuple (3, 2) is created and assigned to variable 'a'. The np.zeros()
function is called with 'a' as its argument, which creates a numpy array of zeros with a shape
of (3, 2).
Ouput
array([ 0., 0., 0., 0., 0., 0.])
np.zeros((6,), dtype=int)
Output
array([0, 0, 0, 0, 0, 0])
np.zeros((3, 1))
Output
array([[ 0.],
[ 0.],
[ 0.]])
In the above code the first line, np.zeros(6) creates a one-dimensional array of size 6 with all
elements set to 0, and its data type is float.
In the second line, np.zeros((6,), dtype=int) creates a one-dimensional array of size 6 with all
elements set to 0, and its data type is integer.
In the third line, np.zeros((3, 1)) creates a two-dimensional array of size 3x1 with all elements set to
0, and its data type is float.
12
NumPy: numpy.full() function
numpy.full() function
The numpy.full() function is used to create a new array of the specified shape and type, filled with a
specified value.
Syntax:
Parameters:
Required
Name Description /
Optional
shape Shape of the new array, e.g., (2, 3) or 2. Required
fill_value Fill value. Required
The desired data-type for the array The default, None, means
dtype optional
np.array(fill_value).dtype.
Whether to store multidimensional data in C- or Fortran-contiguous (row- or
order optional
column-wise) order in memory
Return value:
[ndarray] Array of fill_value with the given shape, dtype, and order.
import numpy as np
np.full((3, 3), np.inf)
Output
array([[ inf, inf, inf],
[ inf, inf, inf],
[ inf, inf, inf]])
13
Output
array([[ 10.1, 10.1, 10.1],
[ 10.1, 10.1, 10.1],
[ 10.1, 10.1, 10.1]])
The above code creates arrays filled with a constant value using the numpy.full() function. In the first
example, np.full((3, 3), np.inf) creates a 3x3 numpy array filled with np.inf (infinity). np.inf is a special
floating-point value that represents infinity, and is often used in calculations involving limits and
asymptotes.
In the second example, np.full((3, 3), 10.1) creates a 3x3 numpy array filled with the value 10.1.
Here, the dtype parameter is omitted, so numpy infers the data type of the array from the given
value.
Example: Create an array filled with a single value using np.full()
import numpy as np
np.full((3,3), 55, dtype=int)
Output
array([[55, 55, 55],
[55, 55, 55],
[55, 55, 55]])
In the above code, np.full((3,3), 55, dtype=int) creates a 3x3 numpy array filled with the integer
value 55. The dtype parameter is explicitly set to int, so the resulting array has integer data type.
The ndarray object consists of contiguous one-dimensional segment of computer memory,
combined with an indexing scheme that maps each item to a location in the memory block. The
memory block holds the elements in a row-major order (C style) or a column-major order (FORTRAN
or MatLab style).
NumPy – Data Types
NumPy supports a much greater variety of numerical types than Python does.
The following table shows different scalar data types defined in NumPy.
14
uint8 Unsigned integer (0 to 255)
uint16 Unsigned integer (0 to 65535)
uint32 Unsigned integer (0 to 4294967295)
uint64 Unsigned integer (0 to 18446744073709551615)
float_ Shorthand for float64
float16 Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
float32 Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
float64 Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
complex_ Shorthand for complex128
Complex number, represented by two 32-bit floats (real and imaginary
complex64 components)
Complex number, represented by two 64-bit floats (real and imaginary
complex128
components)
DataTypeObjects(dtype)
A data type object describes interpretation of fixed block of memory corresponding to an array,
depending on the following aspects:
Type of data (integer, float or Python object)
Size of data
In case of structured type, the names of fields, data type of each field and part of the
memory block taken by each field
15
# using array-scalar type
import numpy as np
dt=np.dtype(np.int32)
print dt
int32
Example 2
int32
Example 3
>i4
The following examples show the use of structured data type. Here, the field name
and the corresponding scalar data type is to be declared.
Example 4
16
[('age', 'i1')]
Example 5
Example 6
[10 20 30]
Example 7
The following examples define a structured data type called student with a string
field 'name', an integer field 'age' and a float field 'marks'. This dtype is applied to
ndarray object.
import numpy as np
student=np.dtype([('name','S20'), ('age', 'i1'), ('marks', 'f4')])
print student
17
Example 8
import numpy as np
student=np.dtype([('name','S20'), ('age', 'i1'), ('marks', 'f4')])
a = np.array([('abc', 21, 50),('xyz', 18, 75)], dtype=student)
print a
Each built-in data type has a character code that uniquely identifies it.
'b': boolean
'i': (signed) integer
'u': unsigned integer
'f': floating-point
'c': complex-floating point
'm': timedelta
'M': datetime
'O': (Python) objects
'S', 'a': (byte-)string
'U': Unicode
'V': raw data (void)
Concatenation of arrays
Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines
np.concatenate, np.vstack, and np.hstack. np.concatenate takes a tuple or list of arrays as its first
argument, as we can see here:
import numpy as np
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])
output
array([1, 2, 3, 3, 2, 1])
You can also concatenate more than two arrays at once:
z = [99, 99, 99]
print(np.concatenate([x, y, z]))
Output
[ 1 2 3 3 2 1 99 99 99]
18
It can also be used for two-dimensional arrays:
grid = np.array([[1, 2, 3],
[4, 5, 6]])
# concatenate along the first axis
np.concatenate([grid, grid])
Output
array([[1, 2, 3],
[4, 5, 6],
[1, 2, 3],
[4, 5, 6]])
For working with arrays of mixed dimensions, it can be clearer to use the np.vstack (vertical stack)
and np.hstack (horizontal stack) functions:
numpy.vstack() function is used to stack the sequence of input arrays vertically to make a single
array.
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
[6, 5, 4]])
numpy.hstack() function is used to stack the sequence of input arrays horizontally (i.e. column wise)
to make a single array.
numpy.dstack
numpy.dstack(tup)[source]
Stack arrays in sequence depth wise (along third axis).
19
This is equivalent to concatenation along the third axis after 2-D arrays of shape (M,N) have been
reshaped to (M,N,1) and 1-D arrays of shape (N,) have been reshaped to (1,N,1). Rebuilds arrays
divided by dsplit.
This function makes most sense for arrays with up to 3 dimensions. For instance, for pixel-data with
a height (first axis), width (second axis), and r/g/b channels (third axis). The functions concatenate,
stack and block provide more general stacking and concatenation operations.
Parameters:
tupsequence of arrays
The arrays must have the same shape along all but the third axis. 1-D or 2-D arrays must
have the same shape.
Returns:
stackedndarray
The array formed by stacking the given arrays, will be at least 3-D.
import numpy as np
a = np.array((1,2,3))
b = np.array((2,3,4))
np.dstack((a,b))
Output
array([[[1, 2],
[2, 3],
[3, 4]]])
a = np.array([[1],[2],[3]])
b = np.array([[2],[3],[4]])
np.dstack((a,b))
Output
array([[[1, 2]],
[[2, 3]],
[[3, 4]]])
numpy.column_stack
numpy.column_stack(tup)[source]
Stack 1-D arrays as columns into a 2-D array.
Take a sequence of 1-D arrays and stack them as columns to make a single 2-D array. 2-D arrays are
stacked as-is, just like with hstack. 1-D arrays are turned into 2-D columns first.
Parameters:
tupsequence of 1-D or 2-D arrays.
Arrays to stack. All of them must have the same first dimension.
Returns:
stacked2-D array
The array formed by stacking the given arrays.
a = np.array((1,2,3))
print("Array'a':",a)
20
b = np.array((2,3,4))
print("Array'b':",b)
np.column_stack((a,b))
Output
Array'a': [1 2 3]
Array'b': [2 3 4]
array([[1, 2],
[2, 3],
[3, 4]])
Splitting of arrays
The opposite of concatenation is splitting, which is implemented by the functions np.split, np.hsplit,
and np.vsplit. For each of these, we can pass a list of indices giving the split points:
import numpy as np
[1 2 3] [99 99] [3 2 1]
Notice that N split-points, leads to N + 1 subarrays. The related functions np.hsplit and np.vsplit are
similar:
grid
Output
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
import numpy as np
print("upper:",upper)
print("lower:",lower)
Output
upper: [[0 1 2 3]
[4 5 6 7]]
21
lower: [[ 8 9 10 11]
[12 13 14 15]
print(left)
print(right)
[[ 0 1]
[ 4 5]
[ 8 9]
[12 13]]
[[ 2 3]
[ 6 7]
[10 11]
[14 15]]
import numpy as np
a=np.array([1,2,3])
print a
[1, 2, 3]
Example 2
How to create two dimensional array using Numpy?
22
# more than one dimensions
import numpy as np
a = np.array([[1, 2], [3, 4]])
print a
[[1, 2]
[3, 4]]
Example 3
# minimum dimensions
import numpy as np
a=np.array([1, 2, 3,4,5], ndmin=2)
print a
[[1, 2, 3, 4, 5]]
Example 4
How to use dtype attribute in Numpy?
# dtype parameter
import numpy as np
a = np.array([1, 2, 3], dtype=complex)
print a
import numpy as np
np.random.seed(0) # seed for reproducibility
x1 = np.random.randint(10, size=6) # One-dimensional array
x2 = np.random.randint(10, size=(3, 4)) # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5)) # Three-dimensional array
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
23
print("x3 size: ", x3.size)
print("dtype:", x3.dtype)
Output
x3 ndim: 3
x3 shape: (3, 4, 5)
x3 size: 60
dtype: int32
(2, 3)
Example 2
# this resizes the ndarray
import numpy as np
a=np.array([[1,2,3],[4,5,6]])
a.shape=(3,2)
print a
Indexing is used to access individual elements. It is also possible to extract entire rows, columns, or
planes from multi-dimensional arrays with numpy indexing. Indexing starts from 0. Let's see an
array example below to understand the concept of indexing:
Element of array 2 3 11 9 6 4 10 12
24
Element of array 2 3 11 9 6 4 10 12
Index 0 1 2 3 4 5 6 7
Indexing in 1 dimension
import numpy as np
arr1=np.arange(4)
print("Array arr11:",arr1)
print("Element at index 0 of arr1 is:",arr1[0])
print("Element at index 1 of arr1 is:",arr1[1])
Output
Array arr11: [0 1 2 3]
Element at index 0 of arr1 is: 0
Element at index 1 of arr1 is: 1
Explanation: In the above code example, an array of shape 4 is created using the np.arange function.
The elements at index 0 and 1 of the array are printed as output.
numpy.arange() function
Syntax:
Parameters:
Required
Name Description /
Optional
start Start of interval. The interval includes this value. The default start value is 0. Optional
End of interval. The interval does not include this value, except in some cases
stop Required
where step is not an integer and floating point round-off affects the length of out.
Spacing between values. For any output out, this is the distance between two
step adjacent values, out[i+1] - out[i]. The default step size is 1. If step is specified as a Optional
position argument, start must also be given.
The type of the output array. If dtype is not given, infer the data type from the
dtytpe Optional
other input arguments.
Return value:
25
import numpy as np
np.arange(5)
Output
array([0, 1, 2, 3, 4])
np.arange(5.0)
Output
array([ 0., 1., 2., 3., 4.])
np.arange(5,9)
Output
array([5, 6, 7, 8])
Output
np.arange(5,9,3)
array([5, 8])
In the above example the first line of the code creates an array of integers from 0 to 4 using
np.arange(5). The arange() function takes only one argument, which is the stop value, and defaults
to start value 0 and step size of 1.
The second line of the code creates an array of floating-point numbers from 0.0 to 4.0 using
np.arange(5.0). Here, 5.0 is provided as the stop value, indicating that the range should go up to (but
not include) 5.0. Since floating-point numbers are used, the resulting array contains floating-point
values.
Both arrays have the same length and contain evenly spaced values.
Indexing in 2 Dimensions
Let's look at the example below to understand how numpy indexing is done in a 2-D array:
Import numpy as np
arr=np.arange(12)
arr1=arr.reshape(3,4)
print("Array arr1:\n",arr1)
print("Element at 0th row and 0th column of arr1 is:",arr1[0,0])
print("Element at 1st row and 2nd column of arr1 is:",arr1[1,2])
26
Output
Array arr1:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Element at 0th row and 0th column of arr1 is: 0
Element at 1st row and 2nd column of arr1 is: 6
import numpy as np
arr=np.arange(12)
arr1=arr.reshape(3,4)
print("Array arr1:\n",arr1)
print("\n")
print("1st row :\n",arr1[1])
Ouput
Array arr1:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
1st row :
[4 5 6 7]
Explanation: As discussed above, both rows and columns are used for indexing as two dimensions.
In the above code example, a 2-D array is created using the np.arange function, which is used for
creating the 1-D array, and the np.reshape function, which is used for transforming a 1-D array into 2
rows and 4 columns.
Here, 1 in a 2-D array stands for the row at index 1 of an array, i.e., [4 5 6 7]. As a result, the row at
index 1 is printed as output.
Indexing in 3 Dimensions
There are three dimensions in a 3-D array, suppose we have three dimensions as (i, j, k), where i
stands for the 1st dimension, j stands for the 2nd dimension and, k stands for the 3rd dimension.
Let's look at the given examples for a better understanding. Remember: Indexing starts from zero.
import numpy as np
arr=np.arange(12)
arr1=arr.reshape(2,2,3)
print("Array arr1:\n",arr1)
print("Element:",arr1[1,0,2])
27
Output
Array arr1:
[[[ 0 1 2]
[ 3 4 5]]
[[ 6 7 8]
[ 9 10 11]]]
Element: 8
import numpy as np
Output
After modifying first element: [12 4 6 8 10]
After modifying third element: [12 4 14 8 10]
In the above example, we have modified elements of the numbers array using array indexing.
numbers[0] = 12 - modifies the first element of numbers and sets its value to 12
numbers[2] = 14 - modifies the third element of numbers and sets its value to 14
28
NumPy Negative Array Indexing
NumPy allows negative indexing for its array. The index of -1 refers to the last item, -2 to the second
import numpy as np
Output
9
7
import numpy as np
Output
[ 2 3 5 7 13]
[ 2 3 5 17 13]
As slicing is performed on Python lists, in the same way, it is performed on NumPy arrays.
Syntax: arr_name[start:stop:step]
29
Start: Starting index
Stop: Ending index
Step: Difference between the indexes.
arr = np.arange(6)
print("array arr:",arr)
Output
array arr: [0 1 2 3 4 5]
sliced element of the array: [1 2 3 4]
Explanation: As indexing starts from 0, we want elements that start from the 1st index and stop
before index 5.
Element of array 0 1 2 3 4 5
Index 0 1 2 3 4 5
Slicing a 2D Array
In a 2-D array, we have to specify start:stop 2 times. One for the row and 2nd one for the column.
Code:
import numpy as np
arr=np.arange(12)
arr1=arr.reshape(3,4)
print("Array arr1:\n",arr1)
print("\n")
print("elements of 1st row and 1st column upto last column :\n",arr1[1:,1:4])
Output
Array arr1:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Explanation: The 1st number represents the row, so slicing starts from the 1st row and goes till the
last as no ending index is mentioned. Then elements from the 1st column to the 3rd column are
sliced and printed as output.
Here, rows and columns index are mentioned for better understanding.
30
Rows ↓, cols → 0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
Slicing a 3D Array
We have to use start:stop:step 3 times. The 1st one is for the planes or layers, 2nd one is for the
rows and the last one is for columns.
Code:
import numpy as np
[[ 4 5 6]
[15 16 17]
[24 25 26]]
[[ 7 8 9]
[18 19 20]
[27 28 29]]]
sliced array:
[[[11 13]
[21 22]]
[[15 16]
[24 25]]]
As 1st set represents the plane or layer, as no start value is mentioned, slicing begins from
the start. So, it selects the first two planes.
A 3-D array contains more than one array in a single array in layers, which are called planes.
Note: the end value is always excluded. Like, if [1:4], this is the case, we start from 1 and go to the 3
indexes.
31
The 2nd set is for rows, rows from index 1 to the last index are sliced as no stopping value is
mentioned (the last 2 rows of each selected plane).
The third set is for columns, as no starting value is mentioned, slicing begins from the starts
and goes to the columns of index 2 (the first 2 columns).
In this case, there are 3 planes mentioned in the above example. Let's take a quick look at
planes in the above array.
1st plane:
1 2 3
11 13 14
21 22 23
2nd plane:
4 5 6
15 16 17
24 25 26
3rd plane:
7 8 9
18 19 20
27 28 29
Full Slices
It is used to select all the planes, columns, or rows. Let's look at the examples below:
Code:
import numpy as np
arr = np.arange(12)
arr1=arr.reshape(3,4)
print(arr1)
print("\n")
print("2-D array sliced from first row to last:\n",arr1[1:3])
Output
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Explanation: [1:3] 1 is for the 2nd row which has index 1. 3 is for columns up to index 3.
32
Rows ↓, cols → 0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
import numpy as np
arr = np.array([[[1, 2, 3], [11, 13, 14], [21, 22, 23]],
[[4, 5, 6], [15, 16,17], [24, 25, 26]],
[[7, 8, 9], [18, 19, 20], [27, 28, 29]]])
print("3-D array:\n",arr)
print("\n")
print("3-D array sliced from first row to last:\n",arr[1:,:,:])
Output
3-D array:
[[[ 1 2 3]
[11 13 14]
[21 22 23]]
[[ 4 5 6]
[15 16 17]
[24 25 26]]
[[ 7 8 9]
[18 19 20]
[27 28 29]]]
3-D array sliced from the first row to the last:
[[[ 4 5 6]
[15 16 17]
[24 25 26]]
[[ 7 8 9]
[18 19 20]
[27 28 29]]]
Explanation: [1:,:,:] = This selects from the 2nd plane to the end of the 3-D array because no
stopping value is given. There are three planes in the 3-D array: 1st plane:
1 2 3
11 13 14
21 22 23
2nd plane:
4 5 6
15 16 17
33
4 5 6
24 25 26
3rd plane:
7 8 9
18 19 20
27 28 29
Negative Slicing and Indexing
Negative indexing begins when the array sequence ends, i.e. the last element will be the first
element with an index of -1 in negative indexing, and the slicing occurs by using this negative
indexing.
import numpy as np
arr = np.array([10,20,30,40,50,60,70,80,90])
print("Element at index 2 or -7 of an array arr:",arr[-7])
print("Sliced Element from index -8 or 2 and -3 or 6 of an array arr:",arr[-8:-3])
Output
Element at index 2 or -7 of an array arr: 30
Sliced element from index -8 or 2 and -3 or 6 of an array arr: [20 30 40 50 60]
Slices vs Indexing
Slicing in Python refers to extracting a subset or specific part of the sequence list, tuple, or string in a
specific range. While indexing refers to accessing a single element from an array, it is used to get
slices of arrays.
Let's see in the following table how slicing is different from indexing in Python.
Slicing Indexing
Accesses a substring or sub-part of an array during A single item from an array is returned
slicing and returns a new tuple/list through indexing
Out-of-range indices are handled smoothly when used IndexError will be thrown if you try to use
for slicing. an index that is too large.
If a single element is assigned to slicing, it will return We can assign a single assignment or
TypeError. Only iterables are accepted. iterable
The length of the list can be changed or even the list can The length of the list cannot be changed by
be cleared in slicing item assignment.
Import numpy as np
arr=np.arange(12)
arr1=arr.reshape(3,4)
print("Array arr1:\n",arr1)
34
Output
Array arr1:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Output
[[0 1]
[4 5]]
Now if we modify this subarray, we'll see that the original array is changed! Observe:
arr1_sub[0, 0] = 99
print(arr1_sub)
Output
[[99 1]
[ 4 0]]
print(arr1)
Output
[[99 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Some users may find this surprising, but it can be advantageous: for example, when working with
large datasets, we can access and process pieces of these datasets without the need to copy the
underlying data buffer.
Output
[[99 1]
[ 4 5]]
35
If we now modify this subarray, the original array is not affected:
arr1_sub_copy[0, 0] = 42
print(arr1_sub_copy)
[[42 1]
[ 4 0]]
print(arr1)
[[99 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Reshaping of Arrays
Another useful type of operation is reshaping of arrays, which can be done with the reshape
method. For example, if you want to put the numbers 1 through 9 in a 3x3 grid, you can do the
following:
Import numpy as np
grid = np.arange(1, 10).reshape(3, 3)
print(grid)
Output
[[1 2 3]
[4 5 6]
[7 8 9]]
Note that for this to work, the size of the initial array must match the size of the reshaped array, and
in most cases the reshape method will return a no-copy view of the initial array.
Another common reshaping pattern is the conversion of a one-dimensional array into a two-
dimensional row or column matrix. You can do this with the reshape method, or more easily by
making use of the newaxis keyword within a slice operation:
36
In[42]: # column vector via newaxis x[:, np.newaxis]
Computations on arrays
NumPy ufuncs
What is Vectorization?
Converting iterative statements into a vector based operation is called vectorization.
It is faster as modern CPUs are optimized for such operations.
Add the Elements of Two Lists
list 1: [1, 2, 3, 4]
list 2: [4, 5, 6, 7]
One way of doing it is to iterate over both of the lists and then sum each elements.
Example
Without ufunc, we can use Python's built-in zip() method:
x = [1, 2, 3, 4]
y = [4, 5, 6, 7]
z = []
for i, j in zip(x, y):
z.append(i + j)
print(z)
[5, 7, 9, 11]
Definition and Usage
The zip() function returns a zip object, which is an iterator of tuples where the first item in each
passed iterator is paired together, and then the second item in each passed iterator are paired
together etc.
37
If the passed iterators have different lengths, the iterator with the least items decides the length of
the new iterator.
Syntax
zip(iterator1, iterator2, iterator3 ...)
Parameter Values
Parameter Description
iterator1, iterator2,
Iterator objects that will be joined together
iterator3 ...
Example
Join two tuples together:
a = ("John", "Charles", "Mike")
b = ("Jenny", "Christy", "Monica")
x = zip(a, b)
(('John', 'Jenny'), ('Charles', 'Christy'), ('Mike', 'Monica'))
NumPy has a ufunc for this, called add(x, y) that will produce the same result.
Example
With ufunc, we can use the add() function:
import numpy as np
x = [1, 2, 3, 4]
y = [4, 5, 6, 7]
z = np.add(x, y)
print(z)
[ 5 7 9 11]
How To Create Your Own ufunc
To create your own ufunc, you have to define a function, like you do with normal functions in
Python, then you add it to your NumPy ufunc library with the frompyfunc() method.
Example
Create your own ufunc for addition:
import numpy as np
def myadd(x, y):
return x+y
myadd = np.frompyfunc(myadd, 2, 1)
print(myadd([1, 2, 3, 4], [5, 6, 7, 8]))
[6 8 10 12]
38
Check if a Function is a ufunc
Check the type of a function to check if it is a ufunc or not.
Example
Check if a function is a ufunc:
import numpy as np
print(type(np.add))
<class 'numpy.ufunc'>
If it is not a ufunc, it will return another type, like this built-in NumPy function for joining two or
more arrays:
Example
Check the type of another function: concatenate():
import numpy as np
print(type(np.concatenate))
<class 'builtin_function_or_method'>
If the function is not recognized at all, it will return an error:
Example
Check the type of something that does not exist. This will produce an error:
import numpy as np
print(type(np.blahblah))
Traceback (most recent call last):
File "./prog.py", line 3, in <module>
AttributeError: module 'numpy' has no attribute 'blahblah'
Simple Arithmetic
You could use arithmetic operators + - * / directly between NumPy arrays, but this section discusses
an extension of the same where we have functions that can take any array-like objects e.g. lists,
tuples etc. and perform arithmetic conditionally.
Arithmetic condition: It means that we can define conditions where the arithmetic operation should
happen.
All of the discussed arithmetic functions take a where parameter in which we can specify that
condition.
Addition
The add() function sums the content of two arrays, and return the results in a new array.
Example
Add the values in arr1 to the values in arr2:
import numpy as np
39
arr2 = np.array([20, 21, 22, 23, 24, 25])
newarr = np.add(arr1, arr2)
print(newarr)
[30 32 34 36 38 40]
Subtraction
The subtract() function subtracts the values from one array with the values from another array, and
return the results in a new array.
Example
Subtract the values in arr2 from the values in arr1:
import numpy as np
arr1 = np.array([10, 20, 30, 40, 50, 60])
arr2 = np.array([20, 21, 22, 23, 24, 25])
newarr = np.subtract(arr1, arr2)
print(newarr)
[-10 -1 8 17 26 35]
Multiplication
The multiply () function multiplies the values from one array with the values from another array, and
return the results in a new array.
Division
The divide() function divides the values from one array with the values from another array, and
return the results in a new array.
Example
Divide the values in arr1 with the values in arr2:
import numpy as np
arr1 = np.array([10, 20, 30, 40, 50, 60])
arr2 = np.array([3, 5, 10, 8, 2, 33])
newarr = np.divide(arr1, arr2)
print(newarr)
[ 3.33333333 4. 3. 5. 25. 1.81818182]
The example above will return [3.33333333 4. 3. 5. 25. 1.81818182] which is the result of 10/3, 20/5,
30/10 etc.
Power
The power() function rises the values from the first array to the power of the values of the second
array, and return the results in a new array.
40
Example
Raise the valules in arr1 to the power of values in arr2:
import numpy as np
arr1 = np.array([10, 20, 30, 40, 50, 60])
arr2 = np.array([3, 5, 6, 8, 2, 33])
newarr = np.power(arr1, arr2)
print(newarr)
[ 1000 3200000 729000000 6553600000000 2500 0]
The example above will return [1000 3200000 729000000 6553600000000 2500 0] which is the
result of 10*10*10, 20*20*20*20*20, 30*30*30*30*30*30 etc.
Remainder
Both the mod() and the remainder() functions return the remainder of the values in the first array
corresponding to the values in the second array, and return the results in a new array.
Example
Return the remainders:
import numpy as np
arr1 = np.array([10, 20, 30, 40, 50, 60])
arr2 = np.array([3, 7, 9, 8, 2, 33])
newarr = np.mod(arr1, arr2)
print(newarr)
[ 1 6 3 0 0 27]
The example above will return [1 6 3 0 0 27] which is the remainders when you divide 10 with 3
(10%3), 20 with 7 (20%7) 30 with 9 (30%9) etc.
You get the same result when using the remainder () function:
Example
Return the remainders:
import numpy as np
arr1 = np.array([10, 20, 30, 40, 50, 60])
arr2 = np.array([3, 7, 9, 8, 2, 33])
newarr = np.remainder(arr1, arr2)
print(newarr)
[ 1 6 3 0 0 27]
Quotient and Mod
The divmod() function return both the quotient and the the mod. The return value is two arrays, the
first array contains the quotient and second array contains the mod.
Example
Return the quotient and mod:
import numpy as np
arr1 = np.array([10, 20, 30, 40, 50, 60])
arr2 = np.array([3, 7, 9, 8, 2, 33])
newarr = np.divmod(arr1, arr2)
41
print(newarr)
(array([ 3, 2, 3, 5, 25, 1]), array([ 1, 6, 3, 0, 0, 27]))
The example above will return:
(array([3, 2, 3, 5, 25, 1]), array([1, 6, 3, 0, 0, 27]))
The first array represents the quotients, (the integer value when you divide 10 with 3, 20 with 7, 30
with 9 etc.
The second array represents the remainders of the same divisions.
Absolute Values
Both the absolute() and the abs() functions do the same absolute operation element-wise but we
should use absolute() to avoid confusion with python's inbuilt math.abs()
import numpy as np
arr = np.array([-1, -2, 1, 2, 3, -4])
newarr = np.absolute(arr)
print(newarr)
[1 2 1 2 3 4]
The example above will return [1 2 1 2 3 4].
Rounding Decimals
Rounding Decimals
There are primarily five ways of rounding off decimals in NumPy:
truncation
fix
rounding
floor
ceil
Truncation
Remove the decimals, and return the float number closest to zero. Use the trunc() and fix()
functions.
Example
Truncate elements of following array:
import numpy as np
arr = np.trunc([-3.1666, 3.6667])
print(arr)
Output
[-3. 3.]
Example
Same example, using fix():
import numpy as np
arr = np.fix([-3.1666, 3.6667])
print(arr)
Output
[-3. 3.]
42
Rounding
The around() function increments preceding digit or decimal by 1 if >=5 else do nothing.
E.g. round off to 1 decimal point, 3.16666 is 3.2
Example
Round off 3.1666 to 2 decimal places:
import numpy as np
arr = np.around(3.1666, 2)
print(arr)
Output 3.17
Floor
The floor() function rounds off decimal to nearest lower integer.
E.g. floor of 3.166 is 3.
Example
Floor the elements of following array:
import numpy as np
arr = np.floor([-3.1666, 3.6667])
print(arr)
Ceil
The ceil() function rounds off decimal to nearest upper integer.
E.g. ceil of 3.166 is 4.
Example
Ceil the elements of following array:
import numpy as np
arr = np.ceil([-3.1666, 3.6667])
print(arr)
43
[1. 0.8660254 0.70710678 0.58778525]
import numpy as np
x = np.cos(np.pi/2)
print(x)
6.123233995736766e-17
import numpy as np
x = np.tan(np.pi/2)
print(x)
1.633123935319537e+16
import numpy as np
x = np.arcsin(1.0)
print(x)
1.5707963267948966
44
Finding Angles
Finding angles from values of sine, cos, tan. E.g. sin, cos and tan inverse (arcsin, arccos, arctan).
NumPy provides ufuncs arcsin(), arccos() and arctan() that produce radian values for corresponding
Example:Find the angle for all of the sine values in the array.
import numpy as np
arr = np.array([1, -1, 0.1])
x = np.arcsin(arr)
print(x)
[ 1.57079633 -1.57079633 0.10016742]
sin, cos and tan values given.Angles of Each Value in Arrays
numpy.exp() in Python
numpy.exp(array, out = None, where = True, casting = ‘same_kind’, order = ‘K’, dtype = None) :
This mathematical function helps user to calculate exponential of all the elements in the input array.
Parameters :
import numpy as np
in_array = [1, 3, 5]
print ("Input array : ", in_array)
out_array = np.exp(in_array)
print ("Output array : ", out_array)
Input array : [1, 3, 5]
Output array : [ 2.71828183 20.08553692 148.4131591 ]
NumPy Hyperbolic Functions
There are functions for calculation of hyperbolic functions which are the analogs of the
trigonometric functions. There are functions for the calculation of hyperbolic and inverse hyperbolic
sine, cosine, and tangent.
1. np.sinh()- This function returns the hyperbolic sine of the array elements.
import numpy as np
arr = np.array([30,60,90])
#hyperbolic sine function
print(np.sinh(arr * np.pi / 180))
Output
45
[0.54785347 1.24936705 2.3012989 ]
2. np.cosh()- This function returns the hyperbolic cosine of the array elements.
import numpy as np
arr = np.array([30,60,90])
#hyperbolic cosine function
print(np.cosh(arr * np.pi / 180))
Output
import numpy as np
arr = np.array([30,60,90])
#hyperbolic tangent function
print(np.tanh(arr * np.pi / 180))
Output
import numpy as np
arr = np.array([150,60,90])
#hyperbolic inverse sine function
print(np.arcsinh(arr * np.pi / 180))
Output
import numpy as np
arr = np.array([150,60,90])
#hyperbolic inverse cosine function
print(np.arccosh(arr * np.pi / 180))
Output
[1.61690509 0.30604211 1.02322748]
6. np.arctanh()- This function returns the hyperbolic inverse tan of the array elements.
import numpy as np
arr = np.array([1,2,3])
#hyperbolic inverse tangent function
print(np.arctanh(arr * np.pi / 180))
Output
46
1. np.around()- This function is used to round off a decimal number to desired number of positions.
The function takes two parameters: the input number and the precision of decimal places.
import numpy as np
arr = np.array([20.8999,67.89899,54.63409])
print(np.around(arr,1))
Output
import numpy as np
arr = np.array([20.8,67.99,54.09])
print(np.floor(arr))
Output
[20. 67. 54.]
3. np.ceil()- This function returns the ceiling value of the input decimal value. Ceiling value is the
smallest integer number greater than the input value.
import numpy as np
arr = np.array([20.8,67.99,54.09])
print(np.ceil(arr))
Output
import numpy as np
arr = np.array([1,8,4])
#exponential function
print(np.exp(arr))
Output
import numpy as np
arr = np.array([6,8,4])
#logarithmic funtion
print(np.log(arr))
Output
import numpy as np
arr = np.array([1+2j])
47
#real test funtion
print(np.isreal(arr))
Output
[False]
2. np.conj()- This function is useful for calculation of conjugate of complex numbers.
import numpy as np
arr = np.array([1+2j])
#conjugate funtion
print(np.conj(arr))
Output
[1.-2.j]
NumPy Logs
Logs
NumPy provides functions to perform log at the base 2, e and 10.
We will also explore how we can take log for any base by creating a custom ufunc.
All of the log functions will place -inf or inf in the elements if the log can not be computed.
Log at Base 2
Use the log2() function to perform log at the base 2.
Log at Base 10
Use the log10() function to perform log at the base 10.
48
1.94591015 2.07944154 2.19722458]
Log at Any Base
NumPy does not provide any function to take log at any base, so we can use the frompyfunc()
function along with inbuilt function math.log() with two input parameters and one output
parameter:
Example
from math import log
import numpy as np
nplog = np.frompyfunc(log, 2, 1)
print(nplog(100, 15))
1.7005483074552052
Specialized ufuncs NumPy has many more ufuncs available, including hyperbolic trig functions,
bitwise arithmetic, comparison operators, conversions from radians to degrees, rounding and
remainders, and much more. A look through the NumPy documentation reveals a lot of interesting
functionality.
Advanced Ufunc Features Many NumPy users make use of ufuncs without ever learning their full set
of features. We’ll outline a few specialized features of ufuncs here.
Specifying output For large calculations, it is sometimes useful to be able to specify the array where
the result of the calculation will be stored. Rather than creating a temporary array, you can use this
to write computation results directly to the memory location where you’dlike them to be. For all
ufuncs, you can do this using the out argument of the function:
In[24]: x = np.arange(5)
y = np.empty(5)
np.multiply(x, 10, out=y)
print(y)
In[25]: y = np.zeros(10)
np.power(2, x, out=y[::2])
print(y)
[ 1. 0. 2. 0. 4. 0. 8. 0. 16. 0.]
If we had instead written y[::2] = 2 ** x, this would have resulted in the creation of a temporary
array to hold the results of 2 ** x, followed by a second operation copying those values into the y
array. This doesn’t make much of a difference for such a small computation, but for very large arrays
the memory savings from careful use of the out argument can be significant.
Aggregate and Statistical Functions in Numpy – Python
First, we have to import Numpy as import numpy as np. To make a Numpy array, you can just use
the np.array() function. The aggregate and statistical functions are given below:
49
2. np.prod(m): Used to find out the product(multiplication) of the values of m.
3. np.mean(m): It returns the mean of the input array m.
4. np.std(m): It returns the standard deviation of the given input array m.
5. np.var(m): Used to find out the variance of the data given in the form of array m.
6. np.min(m): It returns the minimum value among the elements of the given array m.
7. np.max(m): It returns the maximum value among the elements of the given array m.
8. np.argmin(m): It returns the index of the minimum value among the elements of the array m.
9. np.argmax(m): It returns the index of the maximum value among the elements of the array m.
10. np.median(m): It returns the median of the elements of the array m.
The code using the above all the function is given below:
import numpy as np
a=np.array([1,2,3,4,5])
print("a :",a)
sum=np.sum(a)
print("sum :",sum)
product=np.prod(a)
print("product :",product)
mean=np.mean(a)
print("mean :",mean)
standard_deviation=np.std(a)
print("standard_deviation :",standard_deviation)
variance=np.var(a)
print("variance :",variance)
minimum=np.min(a)
print("minimum value :",minimum)
maximum=np.max(a)
print("maximum value :",maximum)
minimum_index=np.argmin(a)
print("minimum index :",minimum_index)
maximum_index=np.argmax(a)
print("maximum-index :",maximum_index)
median=np.median(a)
print("median :",median)
Output is:
a : [1 2 3 4 5]
sum : 15
product : 120
mean : 3.0
standard_deviation : 1.4142135623730951
variance : 2.0
minimum value : 1
maximum value : 5
minimum index : 0
maximum-index : 4
median : 3.0
50
Python numpy sum
Python numpy sum function calculates the sum of values in an array.
arr1.sum()
arr2.sum()
arr3.sum()
This Python numpy sum function allows you to use an optional argument called an axis. This Python
numpy Aggregate Function helps to calculate the sum of a given axis.
For example, axis = 0 returns the sum of each column in an Numpy array.
arr2.sum(axis = 0)
arr3.sum(axis = 0)
axis = 1 returns the sum of each row in an array
arr2.sum(axis = 1)
arr3.sum(axis = 1)
51
You don’t have to use this axis name inside those Python array sum parentheses. I mean,
arr2.sum(axis = 1) is same as arr2.sum(1).
arr2.sum(0)
arr2.sum(1)
arr3.sum(0)
arr3.sum(1)
np.average(arr1)
np.average(arr2)
np.average(arr3)
Average of x and Y axis
52
np.average(arr2, axis = 0)
np.average(arr2, axis = 1)
Calculate numpy array Average without using the axis name.
np.average(arr3, 0)
np.average(arr3, 1)
np.prod([])
np.prod(arr1)
np.prod(arr2) # any number multiply by zero gives zero
This time we are using a two-dimensional array.
Next, we would like to calculate the product of all the numbers on the X-axis and Y-axis separately.
np.prod(x, axis = 0)
np.prod(x, axis = 1)
np.prod(y, axis = 0)
np.prod(y, axis = 1)
Find the numpy array product without using the axis name.
np.prod(x, 1)
np.prod(y, 1)
prod(y, 1)
53
Python numpy min
The Python numpymin function returns the minimum value in an array or a given axis.
arr1.min()
arr2.min()
arr3.min()
We are finding the numpy array minimum value in the X and Y-axis.
arr2.min(axis = 0)
arr2.min(axis = 1)
arr3.min(0)
arr3.min(1)
54
Python Array minimum
Unlike the min function, this Python array minimum function accepts two arrays. Next, numpy array
minimum performs one to one comparison of each array item in one array with other and returns an
array of minimum values.
This time we are applying the Python array minimum function on randomly generated 5 * 5
matrixes.
import numpy as np
x = np.random.randint(1, 10, size = (5, 5))
print(x)
print()
print('\n-----Minimum Array----')
55
print(np.minimum(x, y))
Same as min
Broadcasting is a mechanism that permits NumPy to operate with arrays of different shapes when
performing arithmetic operations:
Secondly, two dimensions are also compatible when one of the dimensions of the array is 1. Check
the example given here:
56
# Rule 2: Two dimensions are also compatible when one of them is 1
# Initialize `x`
x = np.ones((3,4))
print(x)
# Check shape of `x`
print(x.shape)
# Initialize `y`
y = np.arange(4)
print(y)
# Check shape of `y`
print(y.shape)
# Subtract `x` and `y`
print(x - y)
Lastly, there is a third rule that says two arrays can be broadcast together if they are compatible in
all of the dimensions. Check the example given here:
# Rule 3: Arrays can be broadcast together if they are compatible in all dimensions
x = np.ones((6,8))
y = np.random.random((10, 1, 8))
print(x + y)
The dimensions of x(6,8) and y(10,1,8) are different. However, it is possible to add them. Why is
that? Also, change y(10,2,8) or y(10,1,4) and it will give ValueError. Can you find out why? (Hint:
check rule 1).
57