0% found this document useful (0 votes)
18 views

UNIT-5 Important Q-A

The document covers various aspects of data visualization using Python, particularly focusing on the Matplotlib library. It explains key concepts such as data visualization, subplots, Pearson's correlation, and different types of plots like line plots, scatter plots, and histograms. Additionally, it discusses the features and challenges of Matplotlib, along with code examples for creating visualizations.

Uploaded by

lokeshajw2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

UNIT-5 Important Q-A

The document covers various aspects of data visualization using Python, particularly focusing on the Matplotlib library. It explains key concepts such as data visualization, subplots, Pearson's correlation, and different types of plots like line plots, scatter plots, and histograms. Additionally, it discusses the features and challenges of Matplotlib, along with code examples for creating visualizations.

Uploaded by

lokeshajw2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

UNIT- V -PYTHON FOR DATA VISUALIZATION

PART-A
1. What is Data Visualisation?

Data Visualization is the graphical representation of information and data. It helps in


communicating data clearly and effectively using visual elements like charts, graphs, and
maps.

2. Define Matplotlib.
Matplotlib is a popular Python library used for creating static, animated, and interactive
visualizations in Python.
Key Features of Matplotlib:

Produces high-quality 2D plots like line graphs, bar charts, histograms, scatter plots, pie
charts, and more.

Highly customizable – you can control figure size, colors, labels, axes, and styles.

Works well with NumPy, Pandas, and other scientific libraries.

Often used in data science, machine learning, engineering, and academic research.

3. Identify the argument used in plt.plot() to change the line style?

4. What is Subplots?

Subplots in Matplotlib refer to the technique of displaying multiple plots in a single figure
window. It's useful when you want to compare multiple graphs side by side or stack them
in rows/columns.

5. Define Pearson’s correlation.


Pearson’s correlation (also called Pearson correlation coefficient or Pearson’s r), is a
statistical measure that calculates the strength and direction of the linear relationship between
two continuous variables.

✅Definition:
Pearson’s r measures how closely two variables move together in a straight-line pattern.

Range: −1 to +1

+1: Perfect positive linear correlation

0: No linear correlation

−1: Perfect negative linear correlation

6. Differentiate between one-sided test and two-sided test?(Write any four )

7. Write a python code snippet that draws a histogram for the following list of positive numbers.
7 18 9 44 2 5 89 91 11 6 77 85 91 6 55

import matplotlib.pyplot as plt


# Your data
data = [7, 18, 9, 44, 2, 5, 89, 91, 11, 6, 77, 85, 91, 6, 55]

# Create the histogram


plt.hist(data, bins=7, color='cornflowerblue', edgecolor='black')

# Add labels and title


plt.title('Histogram of Given Numbers')
plt.xlabel('Value Range')
plt.ylabel('Frequency')

# Show the plot


plt.grid(True)
plt.show()
8. Make use of the yerr parameter in plt.errorbar() to explain its role in plotting error
bars.
The yerr parameter in the plt.errorbar() function in Matplotlib is used to add vertical error
bars to your plot. These error bars visually represent the uncertainty or variability in the data
points along the y-axis.
📌 Role of yerr Parameter:

yerr allows you to specify the error (or uncertainty) values for the y-axis data points.

It can be a single value (applies the same error bar to all points) or an array (specific error for
each point).

✅Syntax:
plt.errorbar(x, y, yerr=error_values, fmt='o', capsize=5)

 x: x-axis data points

 y: y-axis data points

 yerr: error values for the y-axis

 fmt: format of the points (e.g., 'o' for circles)

 capsize: size of the caps at the ends of the error bars

Example code:
import matplotlib.pyplot as plt
import numpy as np

# Example data
x = np.linspace(0, 10, 5)
y = np.sin(x)
errors = np.random.rand(5) * 0.5 # Random error values for y

# Create the error bar plot


plt.errorbar(x, y, yerr=errors, fmt='o', color='blue', capsize=5)

# Add labels and title


plt.title('Error Bar Plot with yerr')
plt.xlabel('X')
plt.ylabel('Y')

# Display the plot


plt.show()

The error bars represent the potential uncertainty in each data point.

9. Discover which library is used alongside Matplotlib for visualizing geographic data

When working with geographic data in conjunction with Matplotlib, the most commonly
used libraries are:
1. Basemap:

Basemap is a powerful library that extends Matplotlib by adding support for visualizing
geographic data. It allows you to plot maps, represent data points on the map, and apply
various map projections.

It provides functions to overlay geographic data such as coastlines, countries, and other
features onto a map.

Key Features of Basemap:

Support for various map projections (e.g., Mercator, Orthographic).

Ability to plot shapefiles and geographic data (e.g., rivers, countries).

Tools to draw map boundaries, coastlines, and grids.

2. Cartopy:

Cartopy is an alternative to Basemap and is more actively maintained. It integrates


seamlessly with Matplotlib and provides tools for plotting geographic data, including various
map projections, shapefiles, and data overlays like weather patterns, ocean currents, and
more.

Cartopy is designed to work with Matplotlib, making it easier to create publication-quality


geographic visualizations.

10. What method is used to display a Plotly graph in a Jupyter Notebook?


In a Jupyter Notebook, the method used to display a Plotly graph is:
plotly.express or plotly.graph_objects with show()
To display a Plotly graph, you can use the .show() method, which renders the plot directly
within the notebook. Here's how you can do it:
1. Using plotly.express:
plotly.express is a high-level interface for creating Plotly visualizations. It is simple to use
and provides quick ways to create plots.
Example:
import plotly.express as px
import pandas as pd

# Example dataset
df = pd.DataFrame({
'x': [1, 2, 3, 4, 5],
'y': [10, 11, 12, 13, 14]
})

# Create a scatter plot


fig = px.scatter(df, x='x', y='y')

# Display the plot


fig.show()

2. Using plotly.graph_objects:
plotly.graph_objects is a lower-level interface that gives you more control over your plots.
You can define every aspect of the graph.
Example:
import plotly.graph_objects as go

# Create a scatter plot using graph_objects


fig = go.Figure(data=go.Scatter(x=[1, 2, 3, 4, 5], y=[10, 11, 12, 13, 14], mode='markers'))

# Display the plot


fig.show()

PART-B
11. Explain Various features of Matplotlib platform used for data visualization and
illustrate its challenges.

Matplotlib: Features and Challenges in Data Visualization

Matplotlib is a widely used data visualization library in Python. It provides powerful tools
for creating static, animated, and interactive plots. It is especially popular in scientific
computing, data analysis, and machine learning workflows.

Features of Matplotlib
1. Versatile Plot Types
Matplotlib supports a wide variety of plots:

Line plots
Bar charts
Histograms
Pie charts
Scatter plots
Box plots
Heatmaps

3D plots (via mpl_toolkits.mplot3d)

2. Fine-Grained Customization

Control over figure size, DPI, labels, ticks, legends, grids, colors, line styles, fonts, and more.

Can annotate plots with text, arrows, and shapes.

3. Integration with Other Libraries


Works well with NumPy, Pandas, and Seaborn.

Can be embedded in Tkinter, PyQt, Jupyter Notebooks, and web applications.

4. Object-Oriented and Pyplot Interfaces

pyplot: quick and easy for beginners.

OO-style: for more control and complex plots (using Figure and Axes objects).

5. Subplots and Layouts

Allows complex layouts using subplot() and gridspec for multiple plots in one figure.

6. Interactive Plots

Using widgets and toolkits, Matplotlib can produce interactive plots, especially in Jupyter
notebooks.

7. Export Options

Can save plots to various formats: PNG, JPG, SVG, PDF, EPS, etc.

Challenges of Using Matplotlib


1. Steep Learning Curve for Complex Visualizations

While simple plots are easy, creating complex or publication-quality visuals often requires
in-depth knowledge.

2. Verbose Syntax

Compared to newer libraries like Seaborn or Plotly, Matplotlib can be more verbose and less
intuitive.

3. Limited Interactivity

Basic Matplotlib is static; interactive visualizations require additional tools (e.g., %matplotlib
notebook or Plotly for richer interaction).

4. Performance on Large Datasets

Matplotlib can struggle with very large datasets compared to libraries optimized for
performance (e.g., Datashader, Bokeh).

5. Inconsistent Defaults

The default color schemes and styles used to be outdated (though newer versions like
Matplotlib 3.0+ improved this).

Example: Basic Line Plot


import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y, label="Linear Growth", color="green", marker="o")


plt.title("Simple Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.grid(True)
plt.show()

OUTPUT:

12. Explain about various visualization charts like line plots, scatter plots using Matplot lib
with an example.

1. Line Plot
A line plot is used to display data points connected by straight lines. It’s commonly used to
show trends over time.
Use Case: Plotting monthly sales
import matplotlib.pyplot as plt

# Data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May']
sales = [100, 120, 90, 150, 130]

# Create line plot


plt.plot(months, sales, marker='o', linestyle='-', color='blue', label="Sales")
plt.title("Monthly Sales")
plt.xlabel("Month")
plt.ylabel("Sales (in units)")
plt.legend()
plt.grid(True)
plt.show()

2. Scatter Plot
A scatter plot displays individual data points based on two variables. It's useful for
identifying relationships or correlations.

Use Case: Relationship between study hours and exam scores

# Data
study_hours = [1, 2, 3, 4, 5, 6, 7]
exam_scores = [50, 55, 65, 70, 75, 85, 90]

# Create scatter plot


plt.scatter(study_hours, exam_scores, color='green')
plt.title("Study Hours vs Exam Scores")
plt.xlabel("Hours Studied")
plt.ylabel("Exam Score")
plt.grid(True)
plt.show()

3. Bar Chart
A bar chart is used to represent categorical data with rectangular bars. Great for comparing
quantities.
📌 Use Case: Number of students in different departments
# Data
departments = ['CS', 'ECE', 'ME', 'CE']
students = [120, 100, 80, 60]

# Create bar chart


plt.bar(departments, students, color='purple')
plt.title("Students in Departments")
plt.xlabel("Department")
plt.ylabel("Number of Students")
plt.show()

4. Histogram
A histogram shows the distribution of a numeric variable by splitting it into intervals (bins).
📌 Use Case: Distribution of marks in a class

import numpy as np

# Random marks data


marks = np.random.normal(70, 10, 100)

# Create histogram
plt.hist(marks, bins=10, color='orange', edgecolor='black')
plt.title("Distribution of Marks")
plt.xlabel("Marks")
plt.ylabel("Frequency")
plt.show()
13. Write a Python program to create a line plot with multiple lines, each with a unique
style and color.

import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]

# Multiple y-values for different lines


y1 = [1, 4, 9, 16, 25]
y2 = [2, 3, 5, 7, 11]
y3 = [5, 3, 2, 4, 1]

# Plotting each line with a unique style and color


plt.plot(x, y1, color='blue', linestyle='-', marker='o', label='y = x²')
plt.plot(x, y2, color='green', linestyle='--', marker='s', label='Prime Numbers')
plt.plot(x, y3, color='red', linestyle='-.', marker='^', label='Random Trend')

# Adding title and labels


plt.title('Multiple Line Plot with Different Styles')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')

# Show legend
plt.legend()

# Show grid
plt.grid(True)

# Display the plot


plt.tight_layout()
plt.show()
OUTPUT:
14. Using data from the NSFG, make a scatter plot of birth weight versus mother’s age.
Plot percentiles of birth weight versus mother’s age. Compute Pearson’s and
Spearman’s correlations. How would you characterize the relationship between these
variables?(Important Question)

To create a scatter plot of birth weight versus mother’s age and compute the Pearson and
Spearman correlations using data from the National Survey of Family Growth (NSFG),
you'll follow these general steps in Python using pandas, matplotlib, seaborn, and scipy.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr, spearmanr

# Load the dataset (replace 'nsfg_data.csv' with your actual file path)
df = pd.read_csv('nsfg_data.csv')

# Preview data
print(df[['agepreg', 'birthwgt_lb', 'birthwgt_oz']].head())

# Combine birth weight in pounds and ounces


df['birth_weight_oz'] = df['birthwgt_lb'] * 16 + df['birthwgt_oz']

# Drop NaN values for plotting and correlation


data = df[['agepreg', 'birth_weight_oz']].dropna()

# Scatter Plot: Birth weight vs Mother's Age


plt.figure(figsize=(10, 6))
sns.scatterplot(data=data, x='agepreg', y='birth_weight_oz', alpha=0.3)
plt.title("Scatter Plot: Birth Weight vs Mother's Age")
plt.xlabel("Mother's Age at Pregnancy")
plt.ylabel("Birth Weight (oz)")
plt.grid(True)
plt.show()

# Plot percentiles (e.g., 25th, 50th, 75th percentiles of birth weight per age bin)
age_bins = pd.cut(data['agepreg'], bins=10)
percentiles = data.groupby(age_bins)['birth_weight_oz'].quantile([0.25, 0.5, 0.75]).unstack()

# Plot percentiles
percentiles.plot(kind='line', marker='o', figsize=(10, 6))
plt.title("Birth Weight Percentiles by Mother's Age Group")
plt.xlabel("Mother's Age Group")
plt.ylabel("Birth Weight (oz)")
plt.grid(True)
plt.show()

# Pearson correlation
pearson_corr, _ = pearsonr(data['agepreg'], data['birth_weight_oz'])
print(f"Pearson Correlation: {pearson_corr:.3f}")

# Spearman correlation
spearman_corr, _ = spearmanr(data['agepreg'], data['birth_weight_oz'])
print(f"Spearman Correlation: {spearman_corr:.3f}")

15. Write a Python script to create a plot with both vertical and horizontal error bars.
Python Script: Plot with Vertical and Horizontal Error Bars

import matplotlib.pyplot as plt


import numpy as np

# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 1, 3, 5])

# Define errors
x_error = np.array([0.1, 0.2, 0.15, 0.1, 0.3]) # horizontal error bars
y_error = np.array([0.2, 0.4, 0.3, 0.2, 0.5]) # vertical error bars

# Create the plot


plt.figure(figsize=(8, 6))
plt.errorbar(x, y, xerr=x_error, yerr=y_error, fmt='o', ecolor='red', capsize=5, label='Data
with Error Bars')

# Add labels and title


plt.title("Plot with Vertical and Horizontal Error Bars")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.grid(True)
plt.legend()

# Show the plot


plt.show()
OUTPUT:

16. Explain how density plots can help in understanding the distribution of data points

Density plots, also known as kernel density estimation (KDE) plots, are a powerful tool
for visualizing the distribution of a continuous variable. They provide a smooth,
continuous curve that represents the probability density of the data.

How Density Plots Help:

1. Smooth Representation of Distribution

Unlike histograms, which display data in discrete bins, density plots provide a smoothed
version of the data distribution.

This makes it easier to see the overall shape (e.g., unimodal, bimodal, skewed).

2. Comparison of Multiple Distributions

You can overlay multiple density plots to compare distributions between groups (e.g., males
vs females).

3. Better Insight into Skewness and Kurtosis

Density plots clearly show if the data is symmetrical or skewed, and whether it has long tails
or is tightly packed.
4. Estimates Probability Densities

The area under the curve equals 1, giving a sense of the likelihood of values in specific
ranges.

5. Cleaner Visualization for Large Datasets

With large datasets, histograms may look noisy or cluttered. Density plots offer a cleaner
view.

Example:

import seaborn as sns


import matplotlib.pyplot as plt
import numpy as np

# Create synthetic data


data = np.random.normal(loc=50, scale=10, size=1000)

# Plot density
sns.kdeplot(data, shade=True, color="skyblue")
plt.title("Density Plot of Normally Distributed Data")
plt.xlabel("Value")
plt.ylabel("Density")
plt.grid(True)
plt.show()

OUTPUT:

17. Describe a scenario where histograms are insufficient for data analysis, and density
plots are preferable.
Scenario: Comparing the Test Scores of Two Classes
Imagine you have test scores from two different classes, and you want to compare their
distributions to understand which class performed better and how their scores are spread.

Problem with Histograms:

1.Histograms require setting a bin size (bin width), and different bin sizes can drastically
change the appearance of the data.

2.When you overlay histograms from both classes, it becomes cluttered and hard to interpret,
especially if the bins overlap in confusing ways.

3.Histograms are discrete, showing frequency within intervals — which can hide subtle
differences in the shape of the distributions.

Why Density Plots Are Better:

1.Smooth curves help you clearly see if one class has a higher peak (more consistent scores)
or a wider spread (more variability).

2.You can easily overlay multiple density plots without clutter, making comparison more
intuitive.

3.Density plots help you detect patterns like:

3.1 Multimodal distributions (e.g., if a class has two groups of students performing at
different levels)

3.2 Skewness (e.g., one class may have a longer tail of low scores)

import seaborn as sns


import matplotlib.pyplot as plt
import numpy as np

# Generate synthetic data for two classes


class_A = np.random.normal(loc=70, scale=5, size=100)
class_B = np.random.normal(loc=75, scale=10, size=100)

# Plot density plots for both classes


sns.kdeplot(class_A, label="Class A", shade=True)
sns.kdeplot(class_B, label="Class B", shade=True)

plt.title("Test Score Distribution: Class A vs Class B")


plt.xlabel("Score")
plt.ylabel("Density")
plt.legend()
plt.grid(True)
plt.show()

OUTPUT:
Conclusion:
Use density plots when you need to compare distributions, uncover underlying patterns, or
present a clean, interpretable visualization — especially when histograms become too rigid
or cluttered.

18. Outline any two three-dimensional plotting in Matplotlib with an example


1. 3D Line Plot
✏️ Description:
A 3D line plot displays a line in 3D space, showing how values change across three
dimensions (X, Y, Z).
✏️ Example
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np

# Data
t = np.linspace(0, 10, 100)
x = np.sin(t)
y = np.cos(t)
z=t

# Plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot(x, y, z, color='blue', label='3D Line')
ax.set_title("3D Line Plot")
ax.set_xlabel("X")
ax.set_ylabel("Y")
ax.set_zlabel("Z")
ax.legend()
plt.show()

OUTPUT:
2. 3D Surface Plot
✏️ Description:
A surface plot shows a 3D surface that connects data points over a 2D grid, often used to
visualize mathematical functions or terrain data.
✏️ Example:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np

# Create meshgrid
x = np.linspace(-5, 5, 50)
y = np.linspace(-5, 5, 50)
x, y = np.meshgrid(x, y)
z = np.sin(np.sqrt(x**2 + y**2))

# Plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
surf = ax.plot_surface(x, y, z, cmap='viridis')
ax.set_title("3D Surface Plot")
ax.set_xlabel("X")
ax.set_ylabel("Y")
ax.set_zlabel("Z")
plt.colorbar(surf) # adds color legend
plt.show()

OUTPUT:
19. Discuss the differences between Basemap and Cartopy for geographic visualization.
1. Basemap
✏️ Description:

An older library for plotting 2D data on maps using Matplotlib.

Part of the mpl_toolkits toolkit.

✏️ Key Features:

Supports various map projections.

Can draw coastlines, countries, rivers, states, etc.

Allows overlay of data on geospatial maps (e.g., weather, earthquakes).


📌 Limitations:
Deprecated: No longer actively developed (officially discontinued).

Uses proprietary or older map projection libraries (like PROJ.4).

Less compatible with modern data formats and standards.

Difficult to customize and extend.

2. Cartopy
📌 Description:

A modern replacement for Basemap.


Built on top of Matplotlib and Shapely.

Uses modern map projection libraries (like PROJ and pyproj).

📌 Key Features:

Actively maintained and recommended for new projects.

Better support for geographic projections and vector data.

Can handle shapefiles, GeoTIFFs, and satellite data more effectively.

Integrates well with other libraries like xarray and netCDF4.

📌 Advantages:

Cleaner API, easier customization.

Works well with high-resolution data and supports tile-based maps.

More extensible and supports interactive plotting with libraries like Plotly.

20. Explain the advantages of using Plotly over Matplotlib for web-based visualizations.

Both Plotly and Matplotlib are powerful visualization libraries in Python, but when it comes
to web-based visualizations, Plotly clearly offers some key advantages.
Advantages of Using Plotly Over Matplotlib for Web-Based Visualizations
1. 📌 Interactivity

Plotly creates interactive charts by default.

Zoom, pan, hover tooltips, and data selection.

In Matplotlib, interactivity is very limited or requires external tools (e.g., mpld3 or Plotly
conversion).

2. 📌 Web Integration

Plotly can easily embed plots into web applications (HTML, Dash apps, Jupyter Notebooks).

Plots are rendered using JavaScript (D3.js + WebGL) in browsers.

Matplotlib is primarily static and designed for offline, print-style plots.

3. 📌 Responsive and Mobile-Friendly

Plotly charts auto-resize to fit different screen sizes (mobile, tablet, desktop).

Matplotlib plots require manual adjustments for different resolutions or device contexts.

4. 📌 High-Performance for Large Datasets

Plotly supports WebGL rendering, which is optimized for large datasets.

Useful for real-time dashboards or complex scientific plots online.

5. 📌 Dash Integration

Plotly integrates seamlessly with Dash, a powerful web app framework built for Python.

Makes it easy to build data dashboards without JavaScript knowledge.

Matplotlib lacks this kind of built-in web framework support.

6. 📌 Custom Styling and Themes

Plotly has modern, visually appealing themes and templates.

It's easier to match the look and feel of a web app or brand.

Example: Plotly vs Matplotlib


Plotly (Interactive Line Plot)

import plotly.express as px
import pandas as pd

df = pd.DataFrame({
"Year": [2018, 2019, 2020, 2021],
"Sales": [250, 300, 400, 370]
})

fig = px.line(df, x="Year", y="Sales", title="Interactive Sales Over Time")


fig.show()

OUTPUT:

Matplotlib (Static Line Plot)

import matplotlib.pyplot as plt

years = [2018, 2019, 2020, 2021]


sales = [250, 300, 400, 370]

plt.plot(years, sales)
plt.title("Sales Over Time")
plt.xlabel("Year")
plt.ylabel("Sales")
plt.grid(True)
plt.show()

OUTPUT:
Reference Link:

https://colab.research.google.com/drive/1_uPjXVJay47GeFTT-
vqnSYEgPh0ekssS#scrollTo=dWpaY8b-vKPE

You might also like