0% found this document useful (0 votes)
16 views13 pages

MGNM801 Ca2 Final

Uploaded by

astitvaawasthi33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views13 pages

MGNM801 Ca2 Final

Uploaded by

astitvaawasthi33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Course Code: MGNM801 Course Title: Business Analytics 1

Course Instructor: Vikas Budhani Academic Task No.:2

Academic Task Title: Online Assignment Date of submission: 25th December,2023

Student Name: Astitva Awasthi Section: Q2345

Student’s Roll No: RQ2345A27 Student’s Reg. No: 12319511

Evaluation Parameters: (Parameters on which student is to be evaluated- To be


mentioned by students as specified at the time of assigning the task by the instructor)

Declaration:
I declare that this Assignment is my individual work. I have not copied it
from any other student’s work or from any other source except where due
acknowledgement is made explicitly in the text, nor has any part been written
for me by any other person.
Evaluator’s comments (For Instructor’s use only)

General observation Scope of improvement Best part

Evaluator’s Signature and Date:


Marks Obtained: _________ Max. Marks: ________
Unit: -4
Part1: Pandas
1. List at least three real-world scenarios where Pandas can be used for
data analysis. Explain the specific use cases in each scenario.
Ans)

Scenario 1: Unmasking Your Customer Personas

Use case: Pandas helps uncover hidden patterns and segment your customer
base into distinct groups with shared characteristics. This lets you craft targeted
campaigns, personalize messaging, and deliver experiences that resonate,
ultimately driving engagement and sales.
Example: -

Scenario 2: Campaign Performance Under the Microscope

Use case: Pandas empowers you to dissect the effectiveness of your marketing
campaigns across different channels. Analyse key metrics like impressions,
clicks, and conversion rates to identify top performers, optimize budget
allocation, and maximize ROI.

Example:
Scenario 3: Predicting Customer Churn Before It's Too Late
Use case: Pandas helps you identify customers at risk of churning (leaving your
brand) based on their past behaviour and purchase patterns. This allows you to
take proactive measures like offering personalized incentives or resolving
potential issues, ultimately decreasing customer loss and boosting lifetime
value.
Example:
2. Describe the primary data structures in Pandas, namely Series and Data
Frame. Explain the differences and use cases for each.
Ans)
Series:
Structure:
▪ One-dimensional array like a list or column in a spreadsheet.
▪ Holds an array of data values and an associated array of labels,
called an index.
Key characteristics:
▪ Can hold any data type, including numbers, strings, dates, and
Booleans.
▪ Data must be homogeneous (all elements of the same type).
▪ Labelled with an index that can be used for selection and
alignment.
Use cases:
▪ Representing a single column of data in a dataset.
▪ Storing time series data or sequences of values. ▪ Creating
simple statistical summaries of data.
Data Frame:
Structure:
▪ Two-dimensional labelled data structure with rows and
columns, resembling a spreadsheet or SQL table.
▪ Can be thought of as a collection of Series objects, each
representing a column.
Key characteristics:
▪ Columns can hold different data types.
▪ Labelled with both row and column indices for flexible
access and manipulation.
Use cases:
▪ Representing tabular datasets with multiple columns and
rows.
▪ Loading and storing data from various file formats (CSV,
Excel, databases).
Performing complex data cleaning, transformation, and analysis tasks.
Feature Series Data Frame
Dimensionality One-dimensional Two-dimensional
Data types Homogeneous Heterogeneous (different types per
column)
Structure Array of values + index Collection of Series objects
(columns)
Use cases Single-column data, Tabular datasets, multiple columns,
sequences and rows

Example:

Part2: NumPy

1. Write a brief description of what NumPy is and why it is important for


scientific computing and data analysis in Python.
Ans)

NumPy, short for Numerical Python, is a powerful open-source library in Python


designed for numerical computing. It provides support for large,
multidimensional arrays and matrices, along with a collection of high-level
mathematical functions to operate on these arrays. NumPy is a fundamental
building block for scientific computing and data analysis in Python, and its
importance stems from several key factors:
Efficient multidimensional arrays:
▪ It introduces the ndarray object, a fundamental data structure
for representing and manipulating large arrays of numbers in
Python. These arrays are more efficient in terms of memory
usage and operations compared to Python's built-in lists.

Foundation for scientific computing:


▪ It serves as the cornerstone for many other scientific and data
analysis libraries in Python, including Pandas, SciPy,
Matplotlib, and scikit-learn.

Comprehensive mathematical functions:


▪ It offers a vast collection of mathematical functions for linear
algebra, Fourier transforms, random number generation, and
more.

 Key reasons for its importance in scientific computing and data analysis:
Performance:
▪ Vectorized operations: NumPy arrays enable you to perform
operations on entire arrays at once, rather than element-by-
element, leading to significant speed gains.
▪ Optimized for numerical computations: NumPy's arrays are
optimized for numerical operations, making them much faster
than Python lists for large datasets.

Foundation for other libraries:


▪ Interoperability: NumPy arrays seamlessly integrate with
other scientific Python libraries, providing a cohesive
ecosystem for data analysis and scientific computing.

Mathematical capabilities:
▪ Comprehensive toolkit: NumPy offers a rich set of
mathematical functions for common tasks in scientific
computing, eliminating the need to write custom code for
many operations.
In essence, NumPy's efficient array structures, fast computations, and extensive
mathematical functions make it an indispensable tool for anyone working with
numerical data in Python, especially in the fields of scientific computing, data
analysis, machine learning, and engineering.
2.Explain the significance of NumPy in terms of performance and
efficiency when working with large datasets and numerical computations.
Ans)
NumPy (Numerical Python) is a powerful library in the Python programming
language that provides support for large, multi-dimensional arrays and matrices,
along with a collection of mathematical functions to operate on these elements.
It is a fundamental package for scientific computing in Python and is widely
used in various domains such as data science, machine learning, signal
processing, and more. The significance of NumPy, particularly in terms of
performance and efficiency when working with large datasets and numerical
computations, can be explained through several key aspects:

1. Array Representation:
• NumPy introduces the ndarray (N-dimensional array) data
structure, which allows for efficient representation of large
datasets. This array is a contiguous block of memory containing
elements of the same type, enabling fast and memory-efficient
operations.
2. Vectorized Operations:
• NumPy provides a set of highly optimized functions that
operate on entire arrays at once, eliminating the need for
explicit looping in Python. This vectorized approach takes
advantage of low-level optimizations in the underlying C and
Fortran code, resulting in significantly faster computations.
3. Broadcasting:
• NumPy allows for implicit element-wise operations on arrays of
different shapes and sizes through a feature called broadcasting.
This enables more concise and readable code, without the need
to explicitly reshape or replicate arrays.
4. Memory Efficiency:
• NumPy arrays are more memory-efficient compared to Python
lists, especially for large datasets. The array's homogeneous
data type ensures that memory is allocated in a contiguous
block, reducing memory overhead, and allowing for better
cache utilization.

5. Integration with low level Languages:


• NumPy is built on top of efficient, low-level libraries such as
BLAS (Basic Linear Algebra Subprograms) and LAPACK
(Linear Algebra Package). These libraries are written in
languages like Fortran and C and are highly optimized for
numerical computations. NumPy seamlessly integrates with
these libraries, providing a high-level interface for users.
6. Parallelization and Multithreading:
• NumPy operations can take advantage of parallelization and
multithreading on supported hardware, which can lead to
significant performance improvements, especially on modern
multicore processors.
7. Extensive Mathematical Functions:
• NumPy includes a comprehensive set of mathematical functions
for linear algebra, Fourier analysis, random number generation,
and more. These functions are implemented in highly efficient
C and Fortran code, contributing to the overall performance of
numerical computations.
8. Interoperability:
• NumPy provides seamless interoperability with other libraries
and tools in the scientific computing ecosystem, such as SciPy,
pandas, and scikit-learn. This interoperability allows users to
leverage the strengths of each library for different aspects of
their work.
Unit: -5 Data Visualization:
1. Create a Matplotlib bar plot showing the sales of products in a store for a
given month. Label the axes, add a title, and customize the appearance
(e.g., colour, width).

Output:-
2.Provide at least three examples of data visualization scenarios where
Seaborn is the preferred library over Matplotlib. Describe the type of plots
or charts involved and why Seaborn is a better choice.
Ans)
1. Statistical Relationships
 Plot Type: lmplot, joint plot, pair plot
 Scenario: When exploring relationships between variables or performing
regression analysis, Seaborn's specialized functions make it simpler to
create visualizations that include regression lines, scatter plots with trend
lines, and distribution plots. Seaborn's lmplot and joint plot provide built-
in functionalities for visualizing linear relationships between variables,
along with additional features like adding regression lines, confidence
intervals, and kernel density estimation.
 Why Seaborn: Seaborn streamlines the process of creating complex
statistical visualizations by providing convenient high-level functions that
directly handle these tasks, making it easier to visualize relationships in
data without the need for extensive customization.
2.Categorical Data Analysis
 Plot Type: cat plot, boxplot, violin plot
 Scenario: Analysing categorical variables involves visualizing distributions,
relationships, or comparisons across categories. Seaborn's cat plot, boxplot,
and violin plot functions offer a concise way to display categorical data
distributions, especially when dealing with multiple categories or
comparing distributions across different groups.
 Why Seaborn: Seaborn provides specialized functions specifically designed
for categorical data visualization, offering better aesthetics, flexibility, and
ease of use compared to manually customizing Matplotlib plots for
categorical data analysis.
3.Distribution Visualization
 Plot Type: distplot, kdeplot, rug plot
 Scenario: Visualizing distributions of variables is crucial in understanding
the underlying data patterns. Seaborn's distplot, kdeplot, and rug plot allow
easy plotting of univariate distributions, kernel density estimations, and rug
plots to represent individual data points on a distribution axis.
 Why Seaborn: Seaborn simplifies the creation of distribution plots by
providing intuitive functions that handle both the creation of the histogram-
like representation and the estimation of the underlying probability density
function (PDF) simultaneously, offering a more streamlined approach
compared to Matplotlib.
Additional advantages of Seaborn:
 Aesthetically pleasing defaults: Seaborn's default styles and colour palettes
create visually appealing and informative plots.
 Close integration with Pandas: Seaborn works effortlessly with Pandas Data
Frames, making it convenient for data analysis workflows.
Focus on statistical visualization: Seaborn is designed to create informative
statistical graphics, making it a valuable tool for data exploration and
communication.

Unit: -6
Describe the three key structures in Plotly:
1.Figure, Data, and Layout. Explain the purpose of each structure in creating
visualizations.
Ans)
- The key structures in Plotly and their purposes in creating visualizations:
Figure -
The overall container, which houses all the visualization's components, including
the data and layout.
Serves as a canvas: This is where the visual components are put together and
coordinated.
Crucial to interaction: it makes functions like panning, zooming, and hovering
over data points possible.
Data –
The major component of the visual aid: It contains the real data that you wish to
visualize.
Several traces: Multiple traces (data sets) can be included in a figure, and each
one can be seen as a separate visual entity (e.g., lines, bars, scatter points).
Trace-specific properties: A trace's look can be defined by its own attributes, such
as type, name, mode, marker style, line style, etc.
Layout –
Manages visual presentation: It oversees the visualization's non-data components,
including titles, labels, and annotations.
Gridlines and axes
Legend and colour bar o Margins and spacing
Colour and style of the background Collaborating Together:
Figure orchestrates: It combines layout and data to provide the entire
representation.
Information offers content: The visual elements are formed from this raw
material.
Context is created by layout: It sets the general look and feel, provides labels and
annotations, and arranges the visual elements.

2.Load a sales dataset with columns 'Sales,' create a Plotly line chart to
visualize the total sales trend. Include axis labels, a title, and customize the
appearance.

You might also like