UNIT-5 Important Q-A
UNIT-5 Important Q-A
PART-A
1. What is Data Visualisation?
2. Define Matplotlib.
Matplotlib is a popular Python library used for creating static, animated, and interactive
visualizations in Python.
Key Features of Matplotlib:
Produces high-quality 2D plots like line graphs, bar charts, histograms, scatter plots, pie
charts, and more.
Highly customizable – you can control figure size, colors, labels, axes, and styles.
Often used in data science, machine learning, engineering, and academic research.
4. What is Subplots?
Subplots in Matplotlib refer to the technique of displaying multiple plots in a single figure
window. It's useful when you want to compare multiple graphs side by side or stack them
in rows/columns.
✅Definition:
Pearson’s r measures how closely two variables move together in a straight-line pattern.
Range: −1 to +1
0: No linear correlation
7. Write a python code snippet that draws a histogram for the following list of positive numbers.
7 18 9 44 2 5 89 91 11 6 77 85 91 6 55
yerr allows you to specify the error (or uncertainty) values for the y-axis data points.
It can be a single value (applies the same error bar to all points) or an array (specific error for
each point).
✅Syntax:
plt.errorbar(x, y, yerr=error_values, fmt='o', capsize=5)
Example code:
import matplotlib.pyplot as plt
import numpy as np
# Example data
x = np.linspace(0, 10, 5)
y = np.sin(x)
errors = np.random.rand(5) * 0.5 # Random error values for y
The error bars represent the potential uncertainty in each data point.
9. Discover which library is used alongside Matplotlib for visualizing geographic data
When working with geographic data in conjunction with Matplotlib, the most commonly
used libraries are:
1. Basemap:
Basemap is a powerful library that extends Matplotlib by adding support for visualizing
geographic data. It allows you to plot maps, represent data points on the map, and apply
various map projections.
It provides functions to overlay geographic data such as coastlines, countries, and other
features onto a map.
2. Cartopy:
# Example dataset
df = pd.DataFrame({
'x': [1, 2, 3, 4, 5],
'y': [10, 11, 12, 13, 14]
})
2. Using plotly.graph_objects:
plotly.graph_objects is a lower-level interface that gives you more control over your plots.
You can define every aspect of the graph.
Example:
import plotly.graph_objects as go
PART-B
11. Explain Various features of Matplotlib platform used for data visualization and
illustrate its challenges.
Matplotlib is a widely used data visualization library in Python. It provides powerful tools
for creating static, animated, and interactive plots. It is especially popular in scientific
computing, data analysis, and machine learning workflows.
Features of Matplotlib
1. Versatile Plot Types
Matplotlib supports a wide variety of plots:
Line plots
Bar charts
Histograms
Pie charts
Scatter plots
Box plots
Heatmaps
2. Fine-Grained Customization
Control over figure size, DPI, labels, ticks, legends, grids, colors, line styles, fonts, and more.
OO-style: for more control and complex plots (using Figure and Axes objects).
Allows complex layouts using subplot() and gridspec for multiple plots in one figure.
6. Interactive Plots
Using widgets and toolkits, Matplotlib can produce interactive plots, especially in Jupyter
notebooks.
7. Export Options
Can save plots to various formats: PNG, JPG, SVG, PDF, EPS, etc.
While simple plots are easy, creating complex or publication-quality visuals often requires
in-depth knowledge.
2. Verbose Syntax
Compared to newer libraries like Seaborn or Plotly, Matplotlib can be more verbose and less
intuitive.
3. Limited Interactivity
Basic Matplotlib is static; interactive visualizations require additional tools (e.g., %matplotlib
notebook or Plotly for richer interaction).
Matplotlib can struggle with very large datasets compared to libraries optimized for
performance (e.g., Datashader, Bokeh).
5. Inconsistent Defaults
The default color schemes and styles used to be outdated (though newer versions like
Matplotlib 3.0+ improved this).
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
OUTPUT:
12. Explain about various visualization charts like line plots, scatter plots using Matplot lib
with an example.
1. Line Plot
A line plot is used to display data points connected by straight lines. It’s commonly used to
show trends over time.
Use Case: Plotting monthly sales
import matplotlib.pyplot as plt
# Data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May']
sales = [100, 120, 90, 150, 130]
2. Scatter Plot
A scatter plot displays individual data points based on two variables. It's useful for
identifying relationships or correlations.
# Data
study_hours = [1, 2, 3, 4, 5, 6, 7]
exam_scores = [50, 55, 65, 70, 75, 85, 90]
3. Bar Chart
A bar chart is used to represent categorical data with rectangular bars. Great for comparing
quantities.
📌 Use Case: Number of students in different departments
# Data
departments = ['CS', 'ECE', 'ME', 'CE']
students = [120, 100, 80, 60]
4. Histogram
A histogram shows the distribution of a numeric variable by splitting it into intervals (bins).
📌 Use Case: Distribution of marks in a class
import numpy as np
# Create histogram
plt.hist(marks, bins=10, color='orange', edgecolor='black')
plt.title("Distribution of Marks")
plt.xlabel("Marks")
plt.ylabel("Frequency")
plt.show()
13. Write a Python program to create a line plot with multiple lines, each with a unique
style and color.
# Sample data
x = [1, 2, 3, 4, 5]
# Show legend
plt.legend()
# Show grid
plt.grid(True)
To create a scatter plot of birth weight versus mother’s age and compute the Pearson and
Spearman correlations using data from the National Survey of Family Growth (NSFG),
you'll follow these general steps in Python using pandas, matplotlib, seaborn, and scipy.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr, spearmanr
# Load the dataset (replace 'nsfg_data.csv' with your actual file path)
df = pd.read_csv('nsfg_data.csv')
# Preview data
print(df[['agepreg', 'birthwgt_lb', 'birthwgt_oz']].head())
# Plot percentiles (e.g., 25th, 50th, 75th percentiles of birth weight per age bin)
age_bins = pd.cut(data['agepreg'], bins=10)
percentiles = data.groupby(age_bins)['birth_weight_oz'].quantile([0.25, 0.5, 0.75]).unstack()
# Plot percentiles
percentiles.plot(kind='line', marker='o', figsize=(10, 6))
plt.title("Birth Weight Percentiles by Mother's Age Group")
plt.xlabel("Mother's Age Group")
plt.ylabel("Birth Weight (oz)")
plt.grid(True)
plt.show()
# Pearson correlation
pearson_corr, _ = pearsonr(data['agepreg'], data['birth_weight_oz'])
print(f"Pearson Correlation: {pearson_corr:.3f}")
# Spearman correlation
spearman_corr, _ = spearmanr(data['agepreg'], data['birth_weight_oz'])
print(f"Spearman Correlation: {spearman_corr:.3f}")
15. Write a Python script to create a plot with both vertical and horizontal error bars.
Python Script: Plot with Vertical and Horizontal Error Bars
# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 1, 3, 5])
# Define errors
x_error = np.array([0.1, 0.2, 0.15, 0.1, 0.3]) # horizontal error bars
y_error = np.array([0.2, 0.4, 0.3, 0.2, 0.5]) # vertical error bars
16. Explain how density plots can help in understanding the distribution of data points
Density plots, also known as kernel density estimation (KDE) plots, are a powerful tool
for visualizing the distribution of a continuous variable. They provide a smooth,
continuous curve that represents the probability density of the data.
Unlike histograms, which display data in discrete bins, density plots provide a smoothed
version of the data distribution.
This makes it easier to see the overall shape (e.g., unimodal, bimodal, skewed).
You can overlay multiple density plots to compare distributions between groups (e.g., males
vs females).
Density plots clearly show if the data is symmetrical or skewed, and whether it has long tails
or is tightly packed.
4. Estimates Probability Densities
The area under the curve equals 1, giving a sense of the likelihood of values in specific
ranges.
With large datasets, histograms may look noisy or cluttered. Density plots offer a cleaner
view.
Example:
# Plot density
sns.kdeplot(data, shade=True, color="skyblue")
plt.title("Density Plot of Normally Distributed Data")
plt.xlabel("Value")
plt.ylabel("Density")
plt.grid(True)
plt.show()
OUTPUT:
17. Describe a scenario where histograms are insufficient for data analysis, and density
plots are preferable.
Scenario: Comparing the Test Scores of Two Classes
Imagine you have test scores from two different classes, and you want to compare their
distributions to understand which class performed better and how their scores are spread.
1.Histograms require setting a bin size (bin width), and different bin sizes can drastically
change the appearance of the data.
2.When you overlay histograms from both classes, it becomes cluttered and hard to interpret,
especially if the bins overlap in confusing ways.
3.Histograms are discrete, showing frequency within intervals — which can hide subtle
differences in the shape of the distributions.
1.Smooth curves help you clearly see if one class has a higher peak (more consistent scores)
or a wider spread (more variability).
2.You can easily overlay multiple density plots without clutter, making comparison more
intuitive.
3.1 Multimodal distributions (e.g., if a class has two groups of students performing at
different levels)
3.2 Skewness (e.g., one class may have a longer tail of low scores)
OUTPUT:
Conclusion:
Use density plots when you need to compare distributions, uncover underlying patterns, or
present a clean, interpretable visualization — especially when histograms become too rigid
or cluttered.
# Data
t = np.linspace(0, 10, 100)
x = np.sin(t)
y = np.cos(t)
z=t
# Plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot(x, y, z, color='blue', label='3D Line')
ax.set_title("3D Line Plot")
ax.set_xlabel("X")
ax.set_ylabel("Y")
ax.set_zlabel("Z")
ax.legend()
plt.show()
OUTPUT:
2. 3D Surface Plot
✏️ Description:
A surface plot shows a 3D surface that connects data points over a 2D grid, often used to
visualize mathematical functions or terrain data.
✏️ Example:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
# Create meshgrid
x = np.linspace(-5, 5, 50)
y = np.linspace(-5, 5, 50)
x, y = np.meshgrid(x, y)
z = np.sin(np.sqrt(x**2 + y**2))
# Plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
surf = ax.plot_surface(x, y, z, cmap='viridis')
ax.set_title("3D Surface Plot")
ax.set_xlabel("X")
ax.set_ylabel("Y")
ax.set_zlabel("Z")
plt.colorbar(surf) # adds color legend
plt.show()
OUTPUT:
19. Discuss the differences between Basemap and Cartopy for geographic visualization.
1. Basemap
✏️ Description:
✏️ Key Features:
2. Cartopy
📌 Description:
📌 Key Features:
📌 Advantages:
More extensible and supports interactive plotting with libraries like Plotly.
20. Explain the advantages of using Plotly over Matplotlib for web-based visualizations.
Both Plotly and Matplotlib are powerful visualization libraries in Python, but when it comes
to web-based visualizations, Plotly clearly offers some key advantages.
Advantages of Using Plotly Over Matplotlib for Web-Based Visualizations
1. 📌 Interactivity
In Matplotlib, interactivity is very limited or requires external tools (e.g., mpld3 or Plotly
conversion).
2. 📌 Web Integration
Plotly can easily embed plots into web applications (HTML, Dash apps, Jupyter Notebooks).
Plotly charts auto-resize to fit different screen sizes (mobile, tablet, desktop).
Matplotlib plots require manual adjustments for different resolutions or device contexts.
5. 📌 Dash Integration
Plotly integrates seamlessly with Dash, a powerful web app framework built for Python.
It's easier to match the look and feel of a web app or brand.
import plotly.express as px
import pandas as pd
df = pd.DataFrame({
"Year": [2018, 2019, 2020, 2021],
"Sales": [250, 300, 400, 370]
})
OUTPUT:
plt.plot(years, sales)
plt.title("Sales Over Time")
plt.xlabel("Year")
plt.ylabel("Sales")
plt.grid(True)
plt.show()
OUTPUT:
Reference Link:
https://colab.research.google.com/drive/1_uPjXVJay47GeFTT-
vqnSYEgPh0ekssS#scrollTo=dWpaY8b-vKPE