Distribution Visualization
Contents
• Quantitative Data Graphs
• Histograms
• Frequency Polygons
• Ogives
• Dot Plots
• Stem-and-Leaf Plots
• Qualitative Data Graphs
• Pie Charts
• Bar Graphs
• Pareto Charts
• Graphical Depiction of Two-Variable Numerical Data
• Scatter Plots
Visualization
• Data organization
• Raw Data
• Classification
• Tabulation
• Frequency Distribution
• Summary Statistics
• Visualization
• Convey the data to the viewers in pictorial form.
• It is easier for most people to comprehend the meaning of the data presented
graphically than data presented numerically in tables or frequency distributions.
• This is especially true if the users have little or no statistical knowledge.
• Informative data visualizations may reveal novel insights.
Quantitative Data Graphs
• Quantitative data defines a subject and is expressed as a number (it
can be quantified) that can be analyzed. There are two types of
quantitative data- continuous and discrete.
• Visualization tools-
• Histograms
• Frequency Polygons
• Ogives
• Dot Plots
• Stem-and-Leaf Plots
Histograms
• A histogram is a graph that shows the frequency of numerical data
using rectangles.
• The height of a rectangle (the vertical axis) represents the distribution
frequency of a variable (the amount, or how often that variable
appears).
• Both histograms and bar charts provide a visual display using
columns, and people often use the terms interchangeably.
• Technically, however, a histogram represents the frequency
distribution of variables in a data set. A bar graph typically represents
a graphical comparison of discrete or categorical variables.
• Creating histogram-
• Draw and label the x and y axes. The x-axis is always the horizontal axis, and
the y-axis is always the vertical axis.
• Represent the frequency on the y-axis and the class boundaries on the x-axis.
• Using the frequencies as the heights, draw vertical bars for each class.
Given the distribution of the miles that 20 randomly selected runners ran during
a given week.
Class Interval Frequency
5.5-10.5 1
10.5-15.5 2
15.5-20.5 3
20.5-25.5 5
25.5-30.5 4
30.5-35.5 3
35.5-40.5 2
Frequency Polygons
• The frequency polygon is a graph that displays the data by using lines that
connect points plotted for the frequencies at the midpoints of the classes.
• The frequencies are represented by the heights of the points.
• Creating frequency polygons-
• Find the midpoints of each class. Recall that midpoints are found by adding the
upper and lower boundaries and dividing by 2.
• Draw the x and y axes. Label the x-axis with the midpoint of each class, and then use
a suitable scale on the y-axis for the frequencies.
• Using the midpoints for the x values and the frequencies as the y values, plot the
points.
• Connect adjacent points with line segments. Draw a line back to the x-axis at the
beginning and end of the graph, at the same distance that the previous and next
midpoints would be located
Given the distribution of the miles that 20 randomly selected runners ran during
a given week.
Class Interval Frequency
5.5-10.5 1
10.5-15.5 2
15.5-20.5 3
20.5-25.5 5
25.5-30.5 4
30.5-35.5 3
35.5-40.5 2
Ogives
• Another type of graph that can be used represents the cumulative
frequencies for the classes.
• This type of graph is called the cumulative frequency graph or ogive.
• The cumulative frequency is the sum of the frequencies accumulated up to
the upper boundary of a class in the distribution.
• Creating ogive-
• Find the cumulative frequency for each class.
• Draw the x and y axes. Label the x-axis with the class boundaries. Use an appropriate
scale for the y-axis to represent the cumulative frequencies.
• Plot the cumulative frequency at each upper-class boundary.
• Starting with the first upper-class boundary, connect adjacent points with line
segments. Then extend the graph to the first-lower class boundary, on the x-axis.
Given the distribution of the miles that 20 randomly selected runners ran during
a given week.
Class Interval Frequency
5.5-10.5 1
10.5-15.5 2
15.5-20.5 3
20.5-25.5 5
25.5-30.5 4
30.5-35.5 3
35.5-40.5 2
• Draw histogram, frequency polygon and ogive for following data.
Dot Plots
• A dot plot, also known as a strip plot or dot chart, is a simple form of
data visualization that consists of data points plotted as dots on a
graph with an x- and y-axis.
• These types of charts are used to graphically depict certain data
trends or groupings.
• It is similar to a simplified histogram or a bar graph as the height of
the bar formed with dots represents the numerical value of each
variable.
• Dot plots are used to represent small amounts of data.
• There are two types of dot plot
• Wilkinson Dot Plot
• Cleveland Dot Plot
Stem-and-Leaf Plots
• The stem and leaf plot is a method of organizing data and is a combination
of sorting and graphing.
• It has the advantage over a grouped frequency distribution of retaining the
actual data while showing them in graphical form.
• A stem and leaf plot is a data plot that uses part of the data value as the
stem and part of the data value as the leaf to form groups or classes.
• At an outpatient testing centre, the number of cardiograms performed
each day for 20 days is shown. Construct a stem and leaf plot for the data.
25 31 20 32 13
14 43 02 57 23
36 32 33 32 44
32 52 44 51 45
• Arrange the data in order:
02, 13, 14, 20, 23, 25, 31, 32, 32, 32,
32, 33, 36, 43, 44, 44, 45, 51, 52, 57
• Separate the data according to the first digit, as shown.
02 13, 14 20, 23, 25 31, 32, 32, 32, 32, 33, 36
43, 44, 44, 45 51, 52, 57
• A display can be made by using the leading digit as the stem and the
trailing digit as the leaf.
It's useful for displaying the distribution of data, identifying patterns, and finding key
statistics like the minimum, maximum, and mode.
Qualitative Data Graphs
• Qualitative data describes a subject, and cannot be expressed as a
number.
• Visualization tools-
• Pie Charts
• Bar Graphs
• Pareto Charts
Pie Charts
• Pie graphs are used extensively in statistics. The purpose of the pie
graph is to show the relationship of the parts to the whole by visually
comparing the sizes of the sections.
• Percentages or proportions can be used. The variable is nominal or
categorical.
• A pie graph is a circle that is divided into sections or wedges
according to the percentage of frequencies in each category of the
distribution.
• Since there are 360 degrees in a circle, the frequency for each class must
be converted into a proportional part of the circle.
• This conversion is done by using the formula
Degrees = f*360/n
• where f = frequency for each class and n = sum of the frequencies.
• Each frequency must also be converted to a percentage. This conversion is
done by using the formula
% = f*100/n
• draw the graph using the appropriate degree measures found in the first
step, and label each section with the name and percentages.
• This frequency distribution shows the number of pounds of each
snack food eaten during the Super Bowl. Construct a pie graph for the
data.
Bar Graphs
• A bar chart is a statistical approach to represent given data using vertical
and horizontal rectangular bars.
• A bar chart is a representation of numerical data in the pictorial form of
rectangles (or bars) having uniform width and varying heights.
• The length of each bar is proportional to the value they represent.
• The bar charts have three major characteristics such as:
• The bar charts are used to compare the different data among different groups.
• Bar charts show the relationship with the help of two axes. On one axis it represents
the categories and on another axis, it represents the discrete values.
• Over a period of time bar charts shows the major changes in available data.
• Bar Charts are mainly classified into two types:
• Horizontal Bar Charts: When the given data is represented via horizontal bars
on a graph (chart) paper such graphs are known as horizontal bar charts. In
this type, the categories of the data are marked on the y-axis and the values
on the x-axis.
• Vertical Bar Charts: When the given data is represented via vertical bars on a
graph (chart) paper it is known as a vertical bar chart. In this type, the
categories of the data are marked on the x-axis and the values on the y-axis.
Pareto Charts
• Graphs such as the histogram, frequency polygon, and ogive showed
how data can be represented when the variable displayed on the
horizontal axis is quantitative.
• On the other hand, when the variable displayed on the horizontal axis
is qualitative or categorical, a Pareto chart can be used.
• A Pareto chart is used to represent a frequency distribution for a
categorical variable, and the frequencies are displayed by the heights
of vertical bars, which are arranged in order from highest to lowest.
• Arrange the data from the largest to smallest according to frequency.
• Draw and label the x and y axes.
• Draw the bars corresponding to the frequencies.
• Suggestions for Drawing Pareto Charts-
• Make the bars the same width.
• Arrange the data from largest to smallest according to frequency.
• Make the units that are used for the frequency equal in size.
• The table shown here is the average cost per mile for passenger
vehicles on state turnpikes. Construct and analyze a Pareto chart for
the data.
Scatter Plots
• Scatter plots are the graphs that present the relationship between
two variables in a data set.
• It represents data points on a two-dimensional plane or on a
Cartesian system. The independent variable or attribute is plotted on
the X-axis, while the dependent variable is plotted on the Y-axis.
• These plots are often called scatter graphs or scatter diagrams.
• The scatter diagram graphs numerical data pairs, with one variable on
each axis, showing their relationship.
• Scatter plots are used in either of the following situations.
• When we have paired numerical data
• When there are multiple values of the dependent variable for a unique value
of an independent variable
• In determining the relationship between variables in some scenarios, such as
identifying potential root causes of problems, checking whether two products
that appear to be related both occur with the exact cause and so on.
• Draw a scatter plot for the given data that shows the number of
games played and scores obtained in each instance.