Mdad - Numpy ML
Mdad - Numpy ML
Google---NumPy--getting started
Installation
Python Anaconda distributions
Numpy
Jupyter Notebook
0-D Arrays
0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array.
Example
Create a 0-D array with value 42
import numpy as np
arr = np.array(42)
print(arr)
1-D Arrays
An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.
Example
import numpy as np
print(arr)
2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array.
NumPy has a whole sub module dedicated towards matrix operations called numpy.mat
Example
Create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6:
import numpy as np
print(arr)
3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.
Example
Create a 3-D array with two 2-D arrays, both containing two arrays with the values 1,2,3 and 4,5,6:
import numpy as np
The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the
second has index 1 etc.
import numpy as np
print(arr[0])
PANDAS
The name "Pandas" has a reference to both "Panel Data", and "Python Data
Analysis" and was created by Wes McKinney in 2008.
Pandas can clean messy data sets, and make them readable and relevant.
Pandas are also able to delete rows that are not relevant, or contains wrong
values, like empty or NULL values. This is called cleaning the data.
Installation of Pandas
If you have Python and PIP already installed on a system, then installation of
Pandas is very easy.
If this command fails, then use a python distribution that already has Pandas
installed like, Anaconda, Spyder etc.
Pandas as pd
Pandas is usually imported under the pd alias.
alias: In Python alias are an alternate name for referring to the same thing.
import pandas as pd
Example
import pandas as pd
print(pd.__version__)
What is a Series?
A Pandas Series is like a column in a table.
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
create Labels
With the index argument, you can name your own labels.
Example
Create your own labels:
import pandas as pd
a = [1, 7, 2]
print(myvar)
When you have created labels, you can access an item by referring to the label.
Example
Return the value of "y":
print(myvar["y"])
Example
Create a simple Pandas Series from a dictionary:
import pandas as pd
myvar = pd.Series(calories)
print(myvar)
To select only some of the items in the dictionary, use the index argument and
specify only the items you want to include in the Series.
Example
Create a Series using only data from "day1" and "day2":
import pandas as pd
print(myvar)
DataFrames
Data sets in Pandas are usually multi-dimensional tables, called DataFrames.
Example
Create a DataFrame from two Series:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
myvar = pd.DataFrame(data)
print(myvar)
Pandas DataFrames
What is a DataFrame?
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array,
or a table with rows and columns.
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
print(df)
Result
calories duration
0 420 50
1 380 40
2 390 45
Locate Row
As you can see from the result above, the DataFrame is like a table with rows and
columns.
Pandas use the loc attribute to return one or more specified row(s)
Example
Return row 0:
Result
calories 420
duration 50
Name: 0, dtype: int64
Example
Return row 0 and 1:
Result
calories duration
0 420 50
1 380 40
import pandas as pd
df = pd.read_csv('data.csv')
print(df.to_string())
If you have a large DataFrame with many rows, Pandas will only return the first 5 rows, and the last
5 rows:
Example
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
max_rows
The number of rows returned is defined in Pandas option settings.
You can check your system's maximum rows with the pd.options.display.max_rows statement.
Example
import pandas as pd
print(pd.options.display.max_rows)
wIn my system the number is 60, which means that if the DataFrame contains more than 60 rows,
the print(df) statement will return only the headers and the first and last 5 rows.
You can change the maximum rows number with the same statement.
Example
import pandas as pd
pd.options.display.max_rows = 9999
df = pd.read_csv('data.csv')
print(df)
The head() method returns the headers and a specified number of rows, starting
from the top.
Example
Get a quick overview by printing the first 10 rows of the DataFrame:
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head(10))
Note: if the number of rows is not specified, the head() method will return the top
5 rows.
Example
Print the first 5 rows of the DataFrame:
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())
There is also a tail() method for viewing the last rows of the DataFrame.
The tail() method returns the headers and a specified number of rows, starting
from the bottom.
Example
Print the last 5 rows of the DataFrame:
print(df.tail())
cleaning Data
Pandas - Cleaning Empty
Cells
Empty Cells
Empty cells can potentially give you a wrong result when you analyze data.
Remove Rows
One way to deal with empty cells is to remove rows that contain empty cells.
This is usually OK, since data sets can be very big, and removing a few rows will
not have a big impact on the result.
Example
Return a new Data Frame with no empty cells:
import pandas as pd
df = pd.read_csv('data.csv')
new_df = df.dropna()
print(new_df.to_string())
By default, the dropna() method returns a new DataFrame, and will not change
the original.
Example
Remove all rows with NULL values:
import pandas as pd
df = pd.read_csv('data.csv')
df.dropna(inplace = True)
print(df.to_string())
Note: Now, the dropna(inplace = True) will NOT return a new DataFrame, but
it will remove all rows containing NULL values from the original DataFrame.
Replace Empty Values
Another way of dealing with empty cells is to insert a new value instead.
This way you do not have to delete entire rows just because of some empty cells.
Example
Replace NULL values with the number 130:
import pandas as pd
df = pd.read_csv('data.csv')
To only replace empty values for one column, specify the column name for the
DataFrame:
Example
Replace NULL values in the "Calories" columns with the number 130:
import pandas as pd
df = pd.read_csv('data.csv')
error
import pandas as pd
df = pd.read_csv('C:\\Users\\user\\Desktop\\data.csv')
df["Calories"] = df["Calories"].fillna(111)
print(df.to_string())
Pandas uses the mean() median() and mode() methods to calculate the
respective values for a specified column:
Example
Calculate the MEAN, and replace any empty values with it:
import pandas as pd
df = pd.read_csv('data.csv')
x = df["Calories"].mean()
Mean = the average value (the sum of all values divided by number of values).
Example
Calculate the MEDIAN, and replace any empty values with it:
import pandas as pd
df = pd.read_csv('data.csv')
x = df["Calories"].median()
Median = the value in the middle, after you have sorted all values ascending.
Example
Calculate the MODE, and replace any empty values with it:
import pandas as pd
df = pd.read_csv('data.csv')
x = df["Calories"].mode()[0]
Wrong Data
"Wrong data" does not have to be "empty cells" or "wrong format", it can just be
wrong, like if someone registered "199" instead of "1.99".
Sometimes you can spot wrong data by looking at the data set, because you have
an expectation of what it should be.
If you take a look at our data set, you can see that in row 7, the duration is 450,
but for all the other rows the duration is between 30 and 60.It doesn't have to be
wrong, but taking in consideration that this is the data set of someone's workout
sessions, we conclude with the fact that this person did not work out in 450
minutes.
How can we fix wrong values, like the one for "Duration" in row 7?
replacing Values
One way to fix wrong values is to replace them with something else.
In our example, it is most likely a typo, and the value should be "45" instead of
"450", and we could just insert "45" in row 7:
Example
Set "Duration" = 45 in row 7:
df.loc[7, 'Duration'] = 45
For small data sets you might be able to replace the wrong data one by one, but
not for big data sets.
To replace wrong data for larger data sets you can create some rules, e.g. set
some boundaries for legal values, and replace any values that are outside of the
boundaries.
Example
Loop through all values in the "Duration" column.
for x in df.index:
if df.loc[x, "Duration"] > 120:
df.loc[x, "Duration"] = 120
Removing Rows
Another way of handling wrong data is to remove the rows that contains wrong
data.
This way you do not have to find out what to replace them with, and there is a
good chance you do not need them to do your analyses.
Example
Delete rows where "Duration" is higher than 120:
for x in df.index:
if df.loc[x, "Duration"] > 120:
df.drop(x, inplace = True)
Exercise2
# importing pandas module
import pandas as pd
df = pd.read_csv("C:\\Users\\user\\Desktop\\nba.csv")
df.head(10)
sort pandas
# importing pandas module
import pandas as pd
df = pd.read_csv("C:\\Users\\user\\Desktop\\nba.csv")
print(df.to_string())
Pandas - Removing Duplicates
Discovering Duplicates
Duplicate rows are rows that have been registered more than one time.
By taking a look at our test data set, we can assume that row 11 and 12 are
duplicates.
Example
Returns True for every row that is a duplicate, otherwise False:
print(df.duplicated())
What is Matplotlib?
Matplotlib is a low level graph plotting library in python that serves as a
visualization utility.
If this command fails, then use a python distribution that already has Matplotlib
installed, like Anaconda, Spyder etc.
Import Matplotlib
Once Matplotlib is installed, import it in your applications by adding
the import module statement:
import matplotlib
print(matplotlib.__version__)
Matplotlib Pyplot
Pyplot
Most of the Matplotlib utilities lies under the pyplot submodule, and are usually
imported under the plt alias:
Example
Draw a line in a diagram from position (0,0) to position (6,250):
plt.plot(xpoints, ypoints)
plt.show()
Matplotlib Plotting
Plotting x and y points
The plot() function is used to draw points (markers) in a diagram.
If we need to plot a line from (1, 3) to (8, 10), we have to pass two arrays [1, 8]
and [3, 10] to the plot function.
ExampleGet
Draw a line in a diagram from position (1, 3) to position (8, 10):
plt.plot(xpoints, ypoints)
plt.show()
Result:
Example
Draw two points in the diagram, one at position (1, 3) and one in position (8, 10):
Result:
You will learn more about markers in the next chapter.
Multiple Points
You can plot as many points as you like, just make sure you have the same
number of points in both axis.
Example
Draw a line in a diagram from position (1, 3) to (2, 8) then to (6, 1) and finally to
position (8, 10):
plt.plot(xpoints, ypoints)
plt.show()
Result:
Default X-Points
If we do not specify the points on the x-axis, they will get the default values 0, 1,
2, 3 etc., depending on the length of the y-points.
So, if we take the same example as above, and leave out the x-points, the
diagram will look like this:
Example
Plotting without x-points:
plt.plot(ypoints)
plt.show()
Result:
Example
Mark each point with a circle:
Result:
Example
Mark each point with a star:
...
plt.plot(ypoints, marker = '*')
...
Result:
Marker Reference
You can choose any of these markers:
Marker Description
'o' Circle
'*' Star
'.' Point
',' Pixel
'x' X
'X' X (filled)
'+' Plus
'D' Diamond
'p' Pentagon
'H' Hexagon
'h' Hexagon
'^' Triangle Up
'2' Tri Up
'_' Hline
This parameter is also called fmt, and is written with this syntax:
marker|line|color
Example
Mark each point with a circle:
plt.plot(ypoints, 'o:r')
plt.show()
Result:
The marker value can be anything from the Marker Reference above.
Line Reference
Line Syntax Description
'-' Solid line
Note: If you leave out the line value in the fmt parameter, no line will be plotted.
Color Reference
Color Syntax
'r' Red
'g' Green
'b' Blue
'c' Cyan
'm' Magenta
'y' Yellow
'k' Black
'w' White
Marker Size
You can use the keyword argument markersize or the shorter version, ms to set
the size of the markers:
Example
Set the size of the markers to 20:
Result:
Marker Color
You can use the keyword argument markeredgecolor or the shorter mec to set
the color of the edge of the markers:
Example
Set the EDGE color to red:
Result:
You can use the keyword argument markerfacecolor or the shorter mfc to set
the color inside the edge of the markers:
Example
Set the FACE color to red:
Result:
Use both the mec and mfc arguments to color the entire marker:
Example
Set the color of both the edge and the face to red:
Result:
Example
Mark each point with a beautiful green color:
...
plt.plot(ypoints, marker = 'o', ms = 20, mec = '#4CAF50', mfc
= '#4CAF50')
...
Result:
Example
Mark each point with the color named "hotpink":
...
plt.plot(ypoints, marker = 'o', ms = 20, mec = 'hotpink', mfc
= 'hotpink')
...
Result:
Matplotlib Line
Linestyle
You can use the keyword argument linestyle, or shorter ls, to change the style of
the plotted line:
Result:
Try it Yourself »
Example
Use a dashed line:
Result:
Try it Yourself »
ADVERTISEMENT
Shorter Syntax
The line style can be written in a shorter syntax:
Example
Shorter syntax:
plt.plot(ypoints, ls = ':')
Result:
Line Styles
You can choose any of these styles:
Style Or
'dotted' ':'
'dashed' '--'
'dashdot' '-.'
Example
Set the line color to red:
Result:
Example
Plot with a beautiful green line:
...
plt.plot(ypoints, c = '#4CAF50')
...
Result:
Example
Plot with the color named "hotpink":
...
plt.plot(ypoints, c = 'hotpink')
...
Result:
Line Width
You can use the keyword argument linewidth or the shorter lw to change the width
of the line.
Example
Plot with a 20.5pt wide line:
Result:
Multiple Lines
You can plot as many lines as you like by simply adding more plt.plot() functions:
Example
Draw two lines by specifying a plt.plot() function for each line:
y1 = np.array([3, 8, 1, 10])
y2 = np.array([6, 2, 7, 11])
plt.plot(y1)
plt.plot(y2)
plt.show()
Result:
You can also plot many lines by adding the points for the x- and y-axis for each
line in the same plt.plot() function.
(In the examples above we only specified the points on the y-axis, meaning that
the points on the x-axis got the the default values (0, 1, 2, 3).)
Example
Draw two lines by specifiyng the x- and y-point values for both lines:
x1 = np.array([0, 1, 2, 3])
y1 = np.array([3, 8, 1, 10])
x2 = np.array([0, 1, 2, 3])
y2 = np.array([6, 2, 7, 11])
Result:
Matplotlib Labels and Title
Create Labels for a Plot
With Pyplot, you can use the xlabel() and ylabel() functions to set a label for the
x- and y-axis.
Example
Add labels to the x- and y-axis:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")
plt.show()
Result:
Create a Title for a Plot
With Pyplot, you can use the title() function to set a title for the plot.
Example
Add a plot title and labels for the x- and y-axis:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.show()
Result:
Set Font Properties for Title and
Labels
You can use the fontdict parameter in xlabel(), ylabel(), and title() to set font
properties for the title and labels.
Example
Set font properties for the title and labels:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
font1 = {'family':'serif','color':'blue','size':20}
font2 = {'family':'serif','color':'darkred','size':15}
plt.plot(x, y)
plt.show()
Result:
Position the Title
You can use the loc parameter in title() to position the title.
Legal values are: 'left', 'right', and 'center'. Default value is 'center'.
Example
Position the title to the left:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.show()
Result:
Matplotlib Adding Grid
Lines
Add Grid Lines to a Plot
With Pyplot, you can use the grid() function to add grid lines to the plot.
plt.plot(x, y)
plt.grid()
plt.show()
Specify Which Grid Lines to Display
You can use the axis parameter in the grid() function to specify which grid lines to
display.Legal values are: 'x', 'y', and 'both'. Default value is 'both'.
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.grid(axis = 'x')
plt.show()
Result:
Example
Display only grid lines for the y-axis:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.grid(axis = 'y')
plt.show()
Result:
Set Line Properties for the Grid
You can also set the line properties of the grid, like this: grid(color = 'color',
linestyle = 'linestyle', linewidth = number).
Example
Set the line properties of the grid:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.show()
Result:
Matplotlib Subplot
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(1, 2, 1)
plt.plot(x,y)
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.show()
Result:
plt.subplot(1, 2, 1)
#the figure has 1 row, 2 columns, and this plot is the first plot.
plt.subplot(1, 2, 2)
#the figure has 1 row, 2 columns, and this plot is the second plot.
So, if we want a figure with 2 rows an 1 column (meaning that the two plots will be
displayed on top of each other instead of side-by-side), we can write the syntax
like this:
Example
Draw 2 plots on top of each other:
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(2, 1, 1)
plt.plot(x,y)
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(2, 1, 2)
plt.plot(x,y)
plt.show()
Result:
You can draw as many plots you like on one figure, just describe the number of
rows, columns, and the index of the plot.
Example
Draw 6 plots:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(2, 3, 1)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(2, 3, 2)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(2, 3, 3)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(2, 3, 4)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(2, 3, 5)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(2, 3, 6)
plt.plot(x,y)
plt.show()
Result:
Title
You can add a title to each plot with the title() function:
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(1, 2, 1)
plt.plot(x,y)
plt.title("SALES")
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.title("INCOME")
plt.show()
Result:
Super Title
You can add a title to the entire figure with the suptitle() function:
Example
Add a title for the entire figure:
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(1, 2, 1)
plt.plot(x,y)
plt.title("SALES")
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.title("INCOME")
plt.suptitle("MY SHOP")
plt.show()
Result:
Matplotlib Scatter
Creating Scatter Plots
With Pyplot, you can use the scatter() function to draw a scatter plot.
The scatter() function plots one dot for each observation. It needs two arrays of
the same length, one for the values of the x-axis, and one for values on the y-axis:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)
plt.show()
Result:
The observation in the example above is the result of 13 cars passing by.
It seems that the newer the car, the faster it drives, but that could be a
coincidence, after all we only registered 13 cars.
Compare Plots
In the example above, there seems to be a relationship between speed and age,
but what if we plot the observations from another day as well? Will the scatter plot
tell us something else?
Example
Draw two plots on the same figure:
plt.show()
Result:
Note: The two plots are plotted with two different colors, by default blue and
orange, you will learn how to change colors later in this chapter.
By comparing the two plots, I think it is safe to say that they both gives us the
same conclusion: the newer the car, the faster it drives.
Colors
You can set your own color for each scatter plot with the color or the c argument:
Example
Set your own color of the markers:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y, color = 'hotpink')
x = np.array([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12])
y = np.array([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85])
plt.scatter(x, y, color = '#88c999')
plt.show()
Result:
Note: You cannot use the color argument for this, only the c argument.
Example
Set your own color of the markers:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors =
np.array(["red","green","blue","yellow","pink","black","orange","purp
le","beige","brown","gray","cyan","magenta"])
plt.scatter(x, y, c=colors)
plt.show()
Result:
ColorMap
The Matplotlib module has a number of available colormaps.
A colormap is like a list of colors, where each color has a value that ranges from 0
to 100.
This colormap is called 'viridis' and as you can see it ranges from 0, which is a
purple color, up to 100, which is a yellow color.
In addition you have to create an array with values (from 0 to 100), one value for
each point in the scatter plot:
Example
Create a color array, and specify a colormap in the scatter plot:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors =
np.array([0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90, 100])
plt.show()
Result:
Example
Include the actual colormap:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors =
np.array([0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90, 100])
plt.colorbar()
plt.show()
Result:
Available ColorMaps
You can choose any of the built-in colormaps:
Name Reverse
Accent Accent_r
Blues Blues_r
BrBG BrBG_r
BuGn BuGn_r
BuPu BuPu_r
CMRmap CMRmap_r
Dark2 Dark2_r
GnBu GnBu_r
Greens Greens_r
Greys Greys_r
OrRd OrRd_r
Oranges Oranges_r
PRGn PRGn_r
Paired Paired_r
Pastel1 Pastel1_r
Pastel2 Pastel2_r
PiYG PiYG_r
PuBu PuBu_r
PuBuGn PuBuGn_r
PuOr PuOr_r
PuRd PuRd_r
Purples Purples_r
RdBu RdBu_r
RdGy RdGy_r
RdPu RdPu_r
RdYlBu RdYlBu_r
RdYlGn RdYlGn_r
Reds Reds_r
Set1 Set1_r
Set2 Set2_r
Set3 Set3_r
Spectral Spectral_r
Wistia Wistia_r
YlGn YlGn_r
YlGnBu YlGnBu_r
YlOrBr YlOrBr_r
YlOrRd YlOrRd_r
afmhot afmhot_r
autumn autumn_r
binary binary_r
bone bone_r
brg brg_r
bwr bwr_r
cividis cividis_r
cool cool_r
coolwarm coolwarm_r
copper copper_r
cubehelix cubehelix_r
flag flag_r
gist_earth gist_earth_r
gist_gray gist_gray_r
gist_heat gist_heat_r
gist_ncar gist_ncar_r
gist_rainbow gist_rainbow_r
gist_stern gist_stern_r
gist_yarg gist_yarg_r
gnuplot gnuplot_r
gnuplot2 gnuplot2_r
gray gray_r
hot hot_r
hsv hsv_r
inferno inferno_r
jet jet_r
magma magma_r
nipy_spectral nipy_spectral_r
ocean ocean_r
pink pink_r
plasma plasma_r
prism prism_r
rainbow rainbow_r
seismic seismic_r
spring spring_r
summer summer_r
tab10 tab10_r
tab20 tab20_r
tab20b tab20b_r
tab20c tab20c_r
terrain terrain_r
twilight twilight_r
twilight_shifted twilight_shifted_r
viridis viridis_r
winter winter_r
Size
You can change the size of the dots with the s argument.
Just like colors, make sure the array for sizes has the same length as the arrays for
the x- and y-axis:
Example
Set your own size for the markers:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
sizes = np.array([20,50,100,200,500,1000,60,90,10,300,600,800,75])
plt.scatter(x, y, s=sizes)
plt.show()
Result:
Alpha
You can adjust the transparency of the dots with the alpha argument.
Just like colors, make sure the array for sizes has the same length as the arrays for
the x- and y-axis:
Example
Set your own size for the markers:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
sizes = np.array([20,50,100,200,500,1000,60,90,10,300,600,800,75])
plt.show()
Result:
Combine Color Size and Alpha
You can combine a colormap with different sizes of the dots. This is best visualized
if the dots are transparent:
Example
Create random arrays with 100 values for x-points, y-points, colors and sizes:
x = np.random.randint(100, size=(100))
y = np.random.randint(100, size=(100))
colors = np.random.randint(100, size=(100))
sizes = 10 * np.random.randint(100, size=(100))
plt.colorbar()
plt.show()
Result:
Matplotlib Bars
Creating Bars
With Pyplot, you can use the bar() function to draw bar graphs:
Example
Draw 4 bars:
plt.bar(x,y)
plt.show()
Result:
The bar() function takes arguments that describes the layout of the bars.
The categories and their values represented by the first and second argument as
arrays.
Example
x = ["APPLES", "BANANAS"]
y = [400, 350]
plt.bar(x, y)
Horizontal Bars
If you want the bars to be displayed horizontally instead of vertically, use
the barh() function:
Example
Draw 4 horizontal bars:
plt.barh(x, y)
plt.show()
Result:
Bar Color
The bar() and barh() take the keyword argument color to set the color of the bars:
Example
Draw 4 red bars:
Result:
Color Names
You can use any of the 140 supported color names.
Example
Draw 4 "hot pink" bars:
Result:
Color Hex
Or you can use Hexadecimal color values:
Example
Draw 4 bars with a beautiful green color:
Result:
Bar Width
The bar() takes the keyword argument width to set the width of the bars:
Example
Draw 4 very thin bars:
Result:
The default width value is 0.8
Bar Height
The barh() takes the keyword argument height to set the height of the bars:
Example
Draw 4 very thin bars:
Result:
The default height value is 0.8
Matplotlib Histograms
Histogram
A histogram is a graph showing frequency distributions.
Example: Say you ask for the height of 250 people, you might end up with a
histogram like this:
You can read from the histogram that there are approximately:
Create Histogram
In Matplotlib, we use the hist() function to create histograms.
The hist() function will use an array of numbers to create a histogram, the array
is sent into the function as an argument.
For simplicity we use NumPy to randomly generate an array with 250 values,
where the values will concentrate around 170, and the standard deviation is 10.
Learn more about Normal Data Distribution in our Machine Learning Tutorial.
Example
A Normal Data Distribution by NumPy:
import numpy as np
print(x)
Result:
This will generate a random result, and could look like this:
The hist() function will read the array and produce a histogram:
Example
A simple histogram:
plt.hist(x)
plt.show()
Result:
plt.pie(y)
plt.show()
Result:
As you can see the pie chart draws one piece (called a wedge) for each value in
the array (in this case [35, 25, 25, 15]).
By default the plotting of the first wedge starts from the x-axis and
moves counterclockwise:
Note: The size of each wedge is determined by comparing the value with all the
other values, by using this formula:
ADVERTISEMENT
Labels
Add labels to the pie chart with the labels parameter.
The labels parameter must be an array with one label for each wedge:
Example
A simple pie chart:
Result:
Start Angle
As mentioned the default start angle is at the x-axis, but you can change the start
angle by specifying a startangle parameter.
Result:
Explode
Maybe you want one of the wedges to stand out? The explode parameter allows
you to do that.
The explode parameter, if specified, and not None, must be an array with one value
for each wedge.
Each value represents how far from the center each wedge is displayed:
Example
Pull the "Apples" wedge 0.2 from the center of the pie:
Shadow
Add a shadow to the pie chart by setting the shadows parameter to True:
Example
Add a shadow:
Result:
Colors
You can set the color of each wedge with the colors parameter.
The colors parameter, if specified, must be an array with one value for each
wedge:
Example
Specify a new color for each wedge:
Result:
You can use Hexadecimal color values, any of the 140 supported color names, or
one of these shortcuts:
'r' - Red
'g' - Green
'b' - Blue
'c' - Cyan
'm' - Magenta
'y' - Yellow
'k' - Black
'w' - White
Legend
To add a list of explanation for each wedge, use the legend() function:
Example
Add a legend:
Result:
Example
Add a legend with a header:
Result: