0% found this document useful (0 votes)

120 views

Datascienece

Data visualization is an important part of analyzing large amounts of collected data. Matplotlib is a Python library that can be used to create various types of visualizations, including histograms, pie charts, line plots, boxplots, and violin plots. These visualizations help identify patterns, relationships, and outliers in data to gain insights. Key steps in creating visualizations with Matplotlib include importing libraries, preparing data, using plotting functions, and customizing aspects like labels, titles, and legends.

Uploaded by

ajus ady

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

120 views

Datascienece

Uploaded by

ajus ady

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Data Visualization using Matplotlib

Badreesh Shetty Follow

Nov 12, 2018 · 11 min read

Data Visualization is an important part of business activities as organizations nowadays

collect a huge amount of data. Sensors all over the world are collecting climate data,
user data through clicks, car data for prediction of steering wheels etc. All of these data
collected hold key insights for businesses and visualizations make these insights easy to
interpret.

Data is only as good as it’s presented.

Why are visualizations important?

Visualizations are the easiest way to analyze and absorb information. Visuals help to
easily understand the complex problem. They help in identifying patterns, relationships,
and outliers in data. It helps in understanding business problems better and quickly. It
helps to build a compelling story based on visuals. Insights gathered from the visuals
help in building strategies for businesses. It is also a precursor to many high-level data
analysis for Exploratory Data Analysis(EDA) and Machine Learning(ML).

Human beings are visual creatures. Countless studies show how our brain is wired for the
visual, and processes everything faster when it is through the eye.

. . .

“Even if your role does not directly involve the nuts

and bolts of data science, it is useful to know what
data visualization can do and how it is realized in the
real world.”
- Ramie Jacobson
Data visualizations in python can be done via many packages. We’ll be discussing of
matplotlib package. It can be used in Python scripts, Jupyter notebook, and web
application servers.

Matplotlib
Matplotlib is a 2-D plotting library that helps in visualizing figures. Matplotlib emulates
Matlab like graphs and visualizations. Matlab is not free, is difficult to scale and as a
programming language is tedious. So, matplotlib in Python is used as it is a robust, free
and easy library for data visualization.
Anatomy of Matplotlib Figure

Anatomy of Matpotlib

The figure contains the overall window where plotting happens, contained within the
figure are where actual graphs are plotted. Every Axes has an x-axis and y-axis for
plotting. And contained within the axes are titles, ticks, labels associated with each axis.
An essential figure of matplotlib is that we can more than axes in a figure which helps in
building multiple plots, as shown below. In matplotlib, pyplot is used to create figures
and change the characteristics of figures.

Installing Matplotlib
Type !pip install matplotlib in the Jupyter Notebook or if it doesn’t work in cmd type
conda install -c conda-forge matplotlib . This should work in most cases.

Things to follow
Plotting of Matplotlib is quite easy. Generally, while plotting they follow the same steps
in each and every plot. Matplotlib has a module called pyplot which aids in plotting
figure. The Jupyter notebook is used for running the plots. We import matplotlib.pyplot

as plt for making it call the package module.

Importing required libraries and dataset to plot using Pandas pd.read_csv()

Extracting important parts for plots using conditions on Pandas Dataframes.

plt.plot() for plotting line chart similarly in place of plot other functions are used

for plotting. All plotting functions require data and it is provided in the function
through parameters.

plot.xlabel , plt.ylabel for labeling x and y-axis respectively.

plt.xticks , plt.yticks for labeling x and y-axis observation tick points

respectively.

plt.legend() for signifying the observation variables.

plt.title() for setting the title of the plot.

plot.show() for displaying the plot.

Histogram
A histogram takes in a series of data and divides the data into a number of bins. It then
plots the frequency data points in each bin (i.e. the interval of points). It is useful in
understanding the count of data ranges.

When to use: We should use histogram when we need the count of the variable in a
plot.

eg: Number of particular games sold in a store.

From above we can see the histogram for GrandCanyon visitors in years. plt.hist()

takes the first argument as numeric data in the horizontal axis i.e GrandCanyon
visitor.bins=10 is used to create 10 bins between values of visitors in GrandCanyon.
From above, we can see the components that make a histogram, n as the max values in
each bin of histogram i.e 5,9, and so on.

The cumulative property gives us the end added value and helps us understand the
increase in value at each bin.

Range helps us in understanding value distribution between specified values.

Multiple histograms are useful in understanding the distribution between 2 entity
variables. We can see that GrandCanyon has comparably more visitors than
BryceCanyon.

Implementation: Histogram

Pie Chart
It is a circular plot which is divided into slices to illustrate numerical proportion. The
slice of a pie chart is to show the proportion of parts out of a whole.

When to use: Pie chart should be used seldom used as It is difficult to compare sections
of the chart. Bar plot is used instead as comparing sections is easy.

eg: Market share in Films.

Note: Pie Charts is not a good chart to illustrate information.

Above, plt.pie() takes the numeric data as 1st argument i.e Percentage and labels to
display as second argument i.e Sector. Ultimately, it shows the distribution of data in
proportion to the pie.

From above we can the components that make a pie chart and it returns wedge object,
text in labels and so on.

A pie chart can be easily customized and from above color and label values are
formatted.
From above explode is used to separate out points from the pie. Similar to a pizza piece
being cut.

Implementation: Pie Chart

Time Series by line plot

Time series is a line plot and it is basically connecting data points with a straight line. It
is useful in understanding the trend over time. It can explain the correlation between
points by the trend. An upward trend means positive correlation and downward trend
means a negative correlation. It mostly used in forecasting, monitoring models.

When to use: Time Series should be used when single or multiple variables are to be
plotted over time.

eg: Stock Market Analysis of Companies, Weather Forecasting.

First, Convert Date to pandas DateTime for easier plotting of data.

From above, fig.add_axes is used for plotting the canvas. Check this What are the
differences between add_axes and add_subplot? to understand axes and subplots.
plt.plot() takes the 1st argument as numeric data i.e Date and 2nd argument is to
numeric stock data. AAPL Stock is considered as ax1 which is the outer figure and on ax2
IBM Stock is considered for plotting which is inset.
In the earlier figure,add_axes is used to used to add an axes to a figure whereas from
above add_subplot adds multiple subplots to a figure. fig.add_subplot(237) cannot be
done as there are only 6 subplots possible.

We can see that the tech company stocks are following an upward trend showing positive
results for traders to invest in stocks.

Implementation: Time Series

Boxplot and Violinplot

Boxplot
Boxplot gives a nice summary of the data. It helps in understanding our distribution
better.

When to use: It should be used when we require to use the overall statistical
information on the distribution of the data. It can be used to detect outliers in the data.

eg: Credit Score of Customer. We can get the max, min and much more information
about the mark.

Understanding Boxplot

Source: How to Read and Use a Box-and-Whisker Plot

From the above diagram, the line that divides the box into 2 parts represents the median
of the data. The end of the box shows the upper quartile(75%)and the start of the box
represents the lower quartile(25%). Upper Quartile is also called 3rd quartile and
similarly, Lower Quartile is also called as 1st quartile. The region between lower quartile
and the upper quartile is called as Inter Quartile Range(IQR) and it is used to
approximate the 50% spread in the middle data(75–25=50%). The maximum is the
highest value in data, similarly minimum is the lowest value in data, it is also called as
caps. The points outside the boxes and between the maximum and maximum are called
as whiskers, they show the range of values in data. The extreme points are outliers to the
data. A commonly used rule is that a value is an outlier if it’s less than lower quartile-1.5
* IQR or high than the upper quartile + 1.5* IQR.

bp contains the boxplot components like boxes, whiskers, medians, caps. Seaborn
another plotting library makes it easier to build custom plots than matplotlib.
patch_artist makes the customization possible. notch makes the median look more

prominent.

A caveat of using boxplot is the number of observations in the unique value is not
defined, Jitter Plot in Seaborn can overcome this caveat or Violinplot is also useful
Violin plot
Violin plot is a better chart than boxplot as it gives a much broader understanding of the
distribution. It resembles a violin and dense areas point the more distribution of data
otherwise hidden by box plots

When to use: Its an extension to boxplot. It should be used when we require a better
intuitive understanding of data.

The density of points in the middle seems more as students tend to score around average
mostly in the subjects.

Implementation: Boxplot & Violinplot

TwinAxis
TwinAxis helps in visualizing plotting 2 plots w.r.t to the y-axis and same x-axis.

When to use: It should when we require 2 plots or grouped data in the same direction.

Eg: Population, GDP data in the same x-axis (Date).

Plotting 2 Plots w.r.t the y-axis and same x-axis

Extracting important details i.e Date for the x-axis, TempAvgF, and WindAvgMPH for the
different y-axis.

As we can there is only 1 axis, twinx() is used for twinning the x-axis and left y-axis is
used for Temp and the right y-axis is used for WindMPH.

Plotting the same data in different units and the same x-axis
The function is defined for calculating different unit of data i.e convert from Fahrenheit
to Celsius.

We can see that to the left y-axis Temp in Fahrenheit is plotted and to the right x-axis
Temp in Celsius is plotted.

Implementation: TwinAxis

Stack Plot and Stem Plot

Stack Plot
Stack plot visualizes data in stacks and shows the distribution of data over time.

When to use: It is used for checking multiple variable area plots in a single plot.

Eg: It is useful in understanding the change of distribution in multiple variables over an

interval.

As stack plot requires stacking, it is done in using np.vstack()

plt.stackplot takes in 1st argument numeric data i.e year and 2nd argument the
vertically stacked data i.e the Nationalparks.

Percentage Stacked plot

Similar to stack plot but each data is converted into a percentage of distribution it holds.

data_prec is used to divide the overall percentage into individual percentage

distributions. s= np_data.sum(axis=1) calculates sum along columns,
np_data.divide(s,axis=0) divides data along rows.

Stem Plot
Stemplot even takes negative values, so the difference is taken of data and is plotted over
time.

When to use: It is similar to a stack plot but the difference helps in comparing the data
points.

diff() is used to find the difference between previous data and is stored in another copy
of the data. The first data point is NaN (Not a Number) as it doesn’t contain any previous
data for calculating the difference.

(31n)Subplots are created to accommodate 3 rows 1 column subplots in the figure.

plt.stem() takes the 1st argument as numeric data i.e year and 2nd argument as
numeric data of the National Park visitors.
Implementation: Stack Plot & Stem Plot

Bar Plot
Bar Plot shows the distribution of data over several groups. It is commonly confused with
a histogram which only takes numerical data for plotting. It helps in comparing multiple
numeric values.

When to use: It is used when to compare between several groups.

Eg: Student marks in an exam.

plt.bar() takes the 1st argument as labels in numeric format and 2nd argument for the
value it represents w.r.t to the plots.

Implementation: Bar Plot

Scatter Plot
Scatter plot helps in visualizing 2 numeric variables. It helps in identifying the
relationship of the data with each variable i.e correlation or trend patterns. It also helps
in detecting outliers in the plot.

When to use: It is used in Machine learning concepts like regression, where x and y are
continuous variables. It is also used in clustering scatters or outlier detection.
plt.scatter() takes 2 numeric arguments for scattering data points in the plot. It is
similar to line plot except without the connected straight lines. By corr we mean
correlation and it means that how correlated GDP is with life expectancy, as we can see
that it is positive it means as GDP of a country increases, life expectancy too increases.

By taking the log of GDP, we can there is a much better correlation as we can fit points
better, it converts GDP in log scale i.e log($1000)=3.

3D Scatterplot
3D Scatterplot helps in visualizing 3 numerical variables in a three- dimensional plot.

It is similar to scatter except we add 3 numerical variables this time. By looking at the
plot we can make an inference that as the year and GDP increases, life expectancy too
increases.

Implementation: Scatter Plot

Find the above code in this Github Repo.

Conclusion
In summary, we learned how to build data visualization plots using one numeric variable
and multiple variables. We can now easily build plots for understanding our data
intuitively through visualizations.

Data Visualization Data Science Matplotlib Jupyter Notebook Data Analysis

About Help Legal

Hungarian Algorithm For Excel - VBA
No ratings yet
Hungarian Algorithm For Excel - VBA
9 pages
Class 1 Data Visualization in Python using matplotlib
No ratings yet
Class 1 Data Visualization in Python using matplotlib
13 pages
Data Visualisation
No ratings yet
Data Visualisation
5 pages
Data Visualization With Matplotlib
No ratings yet
Data Visualization With Matplotlib
20 pages
Data Visualization Using Matplotlib and Seaborn
No ratings yet
Data Visualization Using Matplotlib and Seaborn
28 pages
Introduction To Matplotlib Using Python For Beginners
No ratings yet
Introduction To Matplotlib Using Python For Beginners
14 pages
Unit 4 (2) Python
No ratings yet
Unit 4 (2) Python
27 pages
Practical Guide To Matplotlib For Data Science
100% (1)
Practical Guide To Matplotlib For Data Science
35 pages
Data Visualization
No ratings yet
Data Visualization
17 pages
CHAPTER-2 Data Visualization
No ratings yet
CHAPTER-2 Data Visualization
4 pages
01-Matplotlib
No ratings yet
01-Matplotlib
2 pages
Data Visualization
No ratings yet
Data Visualization
48 pages
Matplot Lib Practicals
No ratings yet
Matplot Lib Practicals
24 pages
Unit 4 Data Visualization using Matplotlib - Copy
No ratings yet
Unit 4 Data Visualization using Matplotlib - Copy
42 pages
Description of Data Visualization Tools
No ratings yet
Description of Data Visualization Tools
15 pages
DataVisualization - 1 Surya Sir
No ratings yet
DataVisualization - 1 Surya Sir
51 pages
Data Visualization
No ratings yet
Data Visualization
26 pages
19_Matplotlib
No ratings yet
19_Matplotlib
26 pages
Data Visualization using Matplotlib in Python
No ratings yet
Data Visualization using Matplotlib in Python
15 pages
matplotlib_cheetsheet
No ratings yet
matplotlib_cheetsheet
9 pages
Unit 1 - Chap 2 - Data Visualisation
No ratings yet
Unit 1 - Chap 2 - Data Visualisation
29 pages
Chapter1.3 - Data Visualization
No ratings yet
Chapter1.3 - Data Visualization
27 pages
Data Visulation
No ratings yet
Data Visulation
8 pages
IEAS W Data Visualization
No ratings yet
IEAS W Data Visualization
27 pages
Data Visualization
No ratings yet
Data Visualization
35 pages
5 Plotting With Matplotlib
No ratings yet
5 Plotting With Matplotlib
27 pages
Data Visualization - Matplotlib PDF
100% (1)
Data Visualization - Matplotlib PDF
15 pages
Matplotlib in Python
No ratings yet
Matplotlib in Python
23 pages
Introduction Tom at Plot Lib
No ratings yet
Introduction Tom at Plot Lib
38 pages
Matplotlib
No ratings yet
Matplotlib
9 pages
Data Visualization
No ratings yet
Data Visualization
28 pages
Python Data visualization 1
No ratings yet
Python Data visualization 1
16 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
22 pages
Practical Guide To Matplotlib For Data Science - 1689973407325
No ratings yet
Practical Guide To Matplotlib For Data Science - 1689973407325
35 pages
UNIT-5 Important Q-A
No ratings yet
UNIT-5 Important Q-A
22 pages
Unit 6 Data Visualization-1
No ratings yet
Unit 6 Data Visualization-1
30 pages
Unit II lecturer notes
No ratings yet
Unit II lecturer notes
28 pages
Notes9_Class_10_Data Visualization using MatPlotlib Notes
No ratings yet
Notes9_Class_10_Data Visualization using MatPlotlib Notes
5 pages
FDS Unit 5 jpr
No ratings yet
FDS Unit 5 jpr
64 pages
2.5. Introduction To Matplotlib 1
No ratings yet
2.5. Introduction To Matplotlib 1
45 pages
Day2Part2. DataVisualization
No ratings yet
Day2Part2. DataVisualization
29 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
12 pages
Mod 5
No ratings yet
Mod 5
61 pages
Python Plots
No ratings yet
Python Plots
47 pages
Matplotlib
No ratings yet
Matplotlib
13 pages
lec19
No ratings yet
lec19
14 pages
DEV Lecture Notes Unit II
No ratings yet
DEV Lecture Notes Unit II
57 pages
Data Visualization I 240217 192738
No ratings yet
Data Visualization I 240217 192738
40 pages
Data Visualization - 1 by Matplot Lib
No ratings yet
Data Visualization - 1 by Matplot Lib
19 pages
Data Visualization Using Python
No ratings yet
Data Visualization Using Python
44 pages
Matplotlib
No ratings yet
Matplotlib
30 pages
Dev Lecture Notes UNIT-2
No ratings yet
Dev Lecture Notes UNIT-2
57 pages
Data Visualisation With Matplotlib - by June Tao Ching - Sep, 2020 - Towards Data Science
No ratings yet
Data Visualisation With Matplotlib - by June Tao Ching - Sep, 2020 - Towards Data Science
7 pages
Advanced Python Chap 3 Part 1
No ratings yet
Advanced Python Chap 3 Part 1
49 pages
Matplotlib in Python
No ratings yet
Matplotlib in Python
43 pages
Data Visualization Using Matplotlib
No ratings yet
Data Visualization Using Matplotlib
10 pages
32-Basic Charting-24-05-2023
No ratings yet
32-Basic Charting-24-05-2023
15 pages
Data Visualization Python Tutorial
No ratings yet
Data Visualization Python Tutorial
9 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
24 pages
Unit 4 Plotting Final
No ratings yet
Unit 4 Plotting Final
51 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
1 - GSA - UMTS900 - Market - Update - Dubai - UMTS900 - Workshop - 171208 PDF
No ratings yet
1 - GSA - UMTS900 - Market - Update - Dubai - UMTS900 - Workshop - 171208 PDF
35 pages
U2000 Poster U2000 Overview
100% (1)
U2000 Poster U2000 Overview
4 pages
Local Connect NodeB
No ratings yet
Local Connect NodeB
5 pages
Configure IPasolink400
No ratings yet
Configure IPasolink400
31 pages
Detail Design Subsystem Design Background and The Dynamic Part
No ratings yet
Detail Design Subsystem Design Background and The Dynamic Part
28 pages
Happy Forgings Limited: EMS Manual ISO 14001: 2015 Support ISO Clause:7 Sec - No.7 Page 1 of 3 Rev - No.00 Date. 01.12.2016
No ratings yet
Happy Forgings Limited: EMS Manual ISO 14001: 2015 Support ISO Clause:7 Sec - No.7 Page 1 of 3 Rev - No.00 Date. 01.12.2016
3 pages
54658.en - Exp.report - Econsultancy 2017 Digital Trends in Healthcare Pharma
No ratings yet
54658.en - Exp.report - Econsultancy 2017 Digital Trends in Healthcare Pharma
22 pages
Algebra 1
No ratings yet
Algebra 1
2 pages
Unit-3: Non-Linear Data Structure
No ratings yet
Unit-3: Non-Linear Data Structure
23 pages
1 Structures 5
No ratings yet
1 Structures 5
474 pages
GSRTC
No ratings yet
GSRTC
1 page
Fasc 9 C
No ratings yet
Fasc 9 C
26 pages
Unit V
No ratings yet
Unit V
31 pages
Course Curriculum and Syllabus ETEDownload
No ratings yet
Course Curriculum and Syllabus ETEDownload
61 pages
Affiliate Marketing Course EMarketing Institute Ebook 2018 Edition
100% (5)
Affiliate Marketing Course EMarketing Institute Ebook 2018 Edition
164 pages
Ushtrime Ne Gjuhen C
0% (1)
Ushtrime Ne Gjuhen C
9 pages
Huawei CBS A A GDR Solution PDF
100% (1)
Huawei CBS A A GDR Solution PDF
33 pages
1707036663579
No ratings yet
1707036663579
1 page
Successful Instrument and Control System Design
No ratings yet
Successful Instrument and Control System Design
16 pages
Usql Tutorial PDF
No ratings yet
Usql Tutorial PDF
160 pages
Pipe Support Assemblies Tutorial
67% (3)
Pipe Support Assemblies Tutorial
140 pages
Java Lab 2-1 Manual It & Cse2009-10
No ratings yet
Java Lab 2-1 Manual It & Cse2009-10
79 pages
Software Cost Estimation Tools1
No ratings yet
Software Cost Estimation Tools1
7 pages
SE Unipune Syllabus
No ratings yet
SE Unipune Syllabus
52 pages
CATALOGO
No ratings yet
CATALOGO
84 pages
SP Ix Ai
No ratings yet
SP Ix Ai
2 pages
D Blocks
No ratings yet
D Blocks
2 pages
Files & Document Storage: External Fortran MCQ Device
No ratings yet
Files & Document Storage: External Fortran MCQ Device
7 pages
SIMD Machines:: Pipeline System
No ratings yet
SIMD Machines:: Pipeline System
35 pages
Audit Checkpoint For ISO 9001 2015 & 14001 2015
100% (2)
Audit Checkpoint For ISO 9001 2015 & 14001 2015
2 pages
ODV Guia
No ratings yet
ODV Guia
156 pages
Chapter 01 MCQ
100% (1)
Chapter 01 MCQ
12 pages
Iso 27001:2005
0% (1)
Iso 27001:2005
116 pages