0% found this document useful (0 votes)
7 views

BD-Topic 4-Big Data

The document outlines the fundamentals of data analysis, including its importance for informed decision-making, cost reduction, and better customer targeting. It details the data analysis process, which consists of five key stages: identifying, collecting, cleaning, analyzing, and interpreting data. Additionally, it describes various data analysis methods and techniques, such as descriptive, exploratory, diagnostic, predictive, and prescriptive analysis, along with specific analytical methods like regression and cluster analysis.

Uploaded by

khanhnhn.bod
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

BD-Topic 4-Big Data

The document outlines the fundamentals of data analysis, including its importance for informed decision-making, cost reduction, and better customer targeting. It details the data analysis process, which consists of five key stages: identifying, collecting, cleaning, analyzing, and interpreting data. Additionally, it describes various data analysis methods and techniques, such as descriptive, exploratory, diagnostic, predictive, and prescriptive analysis, along with specific analytical methods like regression and cluster analysis.

Uploaded by

khanhnhn.bod
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

04/10/2022

NATIONAL ECONOMICS UNIVERSITY


BUSINESS SCHOOL
MARKETING & OPERATION DEPARTMENT

TEACHING SLIDE
SUBJECT

BIG DATA & DATA ANALYSIS


CODE:EBDBOR102
CREDIT: 3
INSTRCUTOR: VANNAM LE
1

PART 2 DATA ANALYSIS

Topic

4
Data Analysis Methods And Techniques

1
04/10/2022

Learning objectives
Theory lesson Practice lesson

1. What Is Data Analysis?

2. Why Is Data Analysis Important?

3. What Is The Data Analysis Process?

4. Types Of Data Analysis Methods

5. Top Data Analysis Techniques To Apply

What is Data Analysis

Data analysis is the process of collecting, modeling,


and analyzing data to extract insights that support
decision-making. There are several methods and
techniques to perform analysis depending on the
industry and the aim of the investigation.

2
04/10/2022

Why Is Data
Analysis Important? Informed decision-making

From a management perspective, you can benefit from analyzing


your data as it helps you make decisions based on facts and not
simple intuition. For instance, you can understand where to invest
your capital, detect growth opportunities, predict your incomes, or
tackle uncommon situations before they become problems. Like
this, you can extract relevant insights from all areas in your
organization, and with the help of dashboard software, present the
information in a professional and interactive way to different
stakeholders.

Why Is Data
Analysis Important? Reduce costs

Another great benefit is to reduce costs. With the


help of advanced technologies such as predictive
analytics, businesses can spot improvement
opportunities, trends, and patterns in their data and
plan their strategies accordingly. In time, this will
help you save money and resources on
implementing the wrong strategies. And not just that,
by predicting different scenarios such as sales and
demand you can also anticipate production and
supply.

3
04/10/2022

Why Is Data
Analysis Important? Target customers better

Customers are arguably the most crucial element in any


business. By using analytics to get a 360° vision of all
aspects related to your customers, you can understand
which channels they use to communicate with you, their
demographics, interests, habits, purchasing behaviors, and
more. In the long run, it will drive success to your marketing
strategies, allow you to identify new potential customers,
and avoid wasting resources on targeting the wrong people
or sending the wrong message. You can also track
customer satisfaction by analyzing your client’s reviews or
your customer service department’s performance

The Data Analysis Process 1


Identify

5 2
There is an order to follow in Interpret Collect
order to extract the needed
conclusions.

The analysis process consists


of 5 key stages.

4 3
Analyze Clean

4
04/10/2022

The Data Analysis Process

1 1 - Identify
Identify
Before you get your hands dirty with data, you first
5 2 need to identify why do you need it in the first place.
Interpret Collect The identification is the stage in which you establish
the questions you will need to answer. For example,
what is the customer's perception of our brand? Or
what type of packaging is more engaging to our
4 3 potential customers? Once the questions are
Analyze Clean outlined you are ready for the next step.

The Data Analysis Process

2 - Collect
1
Identify
As its name suggests, this is the stage where you
start collecting the needed data. Here, you define
5 2
Interpret Collect
which sources of information you will use and how
you will use them. The collection of data can come
in different forms such as internal or external
sources, surveys, interviews, questionnaires, focus
groups, among others. An important note here is
4 3
Analyze Clean that the way you collect the information will be
different in a quantitative and qualitative scenario.

10

10

5
04/10/2022

The Data Analysis Process

3 - Clean
1
Identify
Once you have the necessary data it is time to
clean it and leave it ready for analysis. Not all the
5 2
Interpret Collect
data you collect will be useful, when collecting big
amounts of information in different formats it is very
likely that you will find yourself with duplicate or
badly formatted data. To avoid this, before you start
working with your data you need to make sure to
4 3
Analyze Clean erase any white spaces, duplicate records, or
formatting errors. This way you avoid hurting your
analysis with incorrect data.

11

11

The Data Analysis Process

3 - Analyze
1
Identify
Once you have the necessary data it is time to
clean it and leave it ready for analysis. Not all the
5 2
Interpret Collect
data you collect will be useful, when collecting big
amounts of information in different formats it is very
likely that you will find yourself with duplicate or
badly formatted data. To avoid this, before you start
working with your data you need to make sure to
4 3
Analyze Clean erase any white spaces, duplicate records, or
formatting errors. This way you avoid hurting your
analysis with incorrect data.

12

12

6
04/10/2022

The Data Analysis Process

4 - Analyze
1
Identify
With the help of various techniques such as statistical
analysis, regressions, neural networks, text analysis, and
5 2 more, you can start analyzing and manipulating your data
Interpret Collect
to extract relevant conclusions. At this stage, you find
trends, correlations, variations, and patterns that can help
you answer the questions you first thought of in the
identify stage. Various technologies in the market assists
4 3 researchers and average business users with the
Analyze Clean management of their data. Some of them include
business intelligence and visualization software,
predictive analytics, data mining, among others.

13

13

The Data Analysis Process

5 - Interpret
1
Identify
Last but not least you have one of the most
important steps: it is time to interpret your
5 2
Interpret Collect results. This stage is where the researcher
comes up with courses of action based on the
findings. For example, here you would
understand if your clients prefer packaging that
4 3 is red or green, plastic or paper, etc.
Analyze Clean
Additionally, at this stage, you can also find
some limitations and work on them.

14

14

7
04/10/2022

Types Of Data Analysis Methods

Descriptive Exploratory Diagnostic Predictive Prescriptive


analysis analysis analysis analysis analysis
How to explore
What Why it What will How will it
data
happened relationships happened happen happen

15

15

Types Of Data Analysis Methods

The descriptive analysis method is the starting point to any


analytic reflection, and it aims to answer the question of what
happened? It does this by ordering, manipulating, and
interpreting raw data from various sources to turn it into
valuable insights for your organization.
Performing descriptive analysis is essential, as it allows us to
present our insights in a meaningful way. Although it is
relevant to mention that this analysis on its own will not allow
you to predict future outcomes or tell you the answer to
questions like why something happened, it will leave your data
organized and ready to conduct further investigations.

16

16

8
04/10/2022

Types Of Data Analysis Methods

As its name suggests, the main aim of the exploratory analysis


is to explore. Prior to it, there was still no notion of the
relationship between the data and the variables. Once the data
is investigated, the exploratory analysis enables you to find
connections and generate hypotheses and solutions for specific
problems. A typical area of application for it is data mining.

17

17

Types Of Data Analysis Methods

Diagnostic data analytics empowers analysts and


executives by helping them gain a firm contextual
understanding of why something happened. If you know
why something happened as well as how it happened, you
will be able to pinpoint the exact ways of tackling the issue
or challenge.
Designed to provide direct and actionable answers to
specific questions, this is one of the world’s most important
methods in research, among its other key organizational
functions such as retail analytics.

18

18

9
04/10/2022

Types Of Data Analysis Methods

The predictive method allows you to look into the future to answer the
question: what will happen? In order to do this, it uses the results of the
previously mentioned descriptive, exploratory, and diagnostic analysis, in
addition to machine learning (ML) and artificial intelligence (AI). Like this,
you can uncover future trends, potential problems or inefficiencies,
connections, and casualties in your data.
With predictive analysis, you can unfold and develop initiatives that will
not only enhance your various operational processes but also help you
gain an all-important edge on the competition. If you understand why a
trend, pattern, or event happened through data, you will be able to
develop an informed projection of how things may unfold in particular
areas of the business.

19

19

Types Of Data Analysis Methods

Another of the most effective types of analysis methods in


research. Prescriptive data techniques cross over from
predictive analysis in the way that it revolves around using
patterns or trends to develop responsive, practical business
strategies.
By drilling down into prescriptive analysis, you will play an
active role in the data consumption process by taking well-
arranged sets of visual data and using it as a powerful fix to
emerging issues in a number of key areas, including marketing,
sales, customer experience, HR, fulfillment, finance, logistics
analytics, and others.

20

20

10
04/10/2022

Top 10 Data
Analysis Methods

Cluster Analysis The action of grouping a set of data elements in a way that said elements
are more similar (in a particular sense) to each other than to those in
Cohort analysis
other groups – hence the term ‘cluster.’ Since there is no target variable
Regression analysis when clustering, the method is often used to find hidden patterns in the
data. The approach is also used to provide additional context to a trend
Neural networks
or dataset.
Factor analysis Let's look at it from a business perspective. In a perfect world, marketers
Data mining would be able to analyze each customer separately and give them the
best-personalized service, but let's face it, with a large customer base, it
Text analysis is timely impossible to do that. That's where clustering comes in. By
Time series analysis grouping customers into clusters based on demographics, purchasing
behaviors, monetary value, or any other factor that might be relevant for
Decision Trees your company, you will be able to immediately optimize your efforts and
Conjoint analysis give your customers the best experience based on their needs.

21

21

Top 10 Data
Analysis Methods

Cluster Analysis This type of data analysis method uses historical data to examine and
compare a determined segment of users' behavior, which can then be
Cohort analysis
grouped with others with similar characteristics. By using this
Regression analysis methodology, it's possible to gain a wealth of insight into consumer needs
or a firm understanding of a broader target group.
Neural networks
Cohort analysis can be really useful to perform analysis in marketing as it
Factor analysis will allow you to understand the impact of your campaigns on specific
Data mining groups of customers. To exemplify, imagine you send an email campaign
encouraging customers to sign up to your site. For this, you create two
Text analysis versions of the campaign with different designs, CTAs, and ad content.
Time series analysis Later on, you can use cohort analysis to track the performance of the
campaign for a longer period of time and understand which type of
Decision Trees content is driving your customers to sign up, repurchase, or engage in
Conjoint analysis other ways.

22

22

11
04/10/2022

Top 10 Data
Analysis Methods

Cluster Analysis Example


Cohort analysis
Regression analysis
Neural networks
Factor analysis
Data mining
Text analysis
Time series analysis
Decision Trees
Conjoint analysis

23

23

Top 10 Data
Analysis Methods

Cluster Analysis Regression uses historical data to understand how a


Cohort analysis dependent variable's value is affected when one (linear
Regression analysis regression) or more independent variables (multiple
regression) change or stay the same. By understanding
Neural networks
each variable's relationship and how they developed in the
Factor analysis past, you can anticipate possible outcomes and make
Data mining better decisions in the future.
Text analysis
Time series analysis
Decision Trees
Conjoint analysis

24

24

12
04/10/2022

Top 10 Data
Analysis Methods

Cluster Analysis Example


Cohort analysis
Regression analysis Imagine you did a regression analysis of your sales in 2019 and
discovered that variables like product quality, store design,
Neural networks
customer service, marketing campaigns, and sales channels
Factor analysis affected the overall result. Now you want to use regression to
analyze which of these variables changed or if any new ones
Data mining
appeared during 2020. For example, you couldn’t sell as much in
Text analysis your physical store due to COVID lockdowns. Therefore, your sales
could’ve either dropped in general or increased in your online
Time series analysis
channels. Like this, you can understand which independent
Decision Trees variables affected the overall performance of your dependent
variable, annual sales.
Conjoint analysis

25

25

Top 10 Data
Analysis Methods

Cluster Analysis The neural network forms the basis for the intelligent
Cohort analysis algorithms of machine learning. It is a form of analytics
Regression analysis that attempts, with minimal intervention, to understand
how the human brain would generate insights and predict
Neural networks
values. Neural networks learn from each and every data
Factor analysis transaction, meaning that they evolve and advance over
Data mining time.
Text analysis A typical area of application for neural networks is predictive
Time series analysis analytics. There are BI reporting tools that have this feature
Decision Trees implemented within them, such as the Predictive Analytics
Tool from SPSS, R, Stata…
Conjoint analysis

26

26

13
04/10/2022

Top 10 Data
Analysis Methods

Cluster Analysis The factor analysis also called “dimension reduction” is a


Cohort analysis type of data analysis used to describe variability among
observed, correlated variables in terms of a potentially lower
Regression analysis
number of unobserved variables called factors. The aim here
Neural networks is to uncover independent latent variables, an ideal method
Factor analysis for streamlining specific segments.
Data mining
Text analysis
Time series analysis
Decision Trees
Conjoint analysis

27

27

Top 10 Data
Analysis Methods

Cluster Analysis Example


Cohort analysis
Regression analysis A good way to understand this data analysis method is a
Neural networks customer evaluation of a product. The initial assessment is
based on different variables like color, shape, wearability,
Factor analysis current trends, materials, comfort, place where they
Data mining bought the product, frequency of usage. Like this, the list
Text analysis can be endless, depending on what you want to track. In this
case, factor analysis comes to the picture by summarizing all
Time series analysis of these variables into homogenous groups, for example, by
Decision Trees grouping the variables color, materials, quality, and trends
into a brother latent variable of design.
Conjoint analysis

28

28

14
04/10/2022

Top 10 Data
Analysis Methods

Cluster Analysis A method of data analysis that is the umbrella term for
Cohort analysis engineering metrics and insights for additional value,
Regression analysis direction, and context. By using exploratory statistical
evaluation, data mining aims to identify dependencies,
Neural networks
relations, patterns, and trends to generate advanced
Factor analysis knowledge. When considering how to analyze data,
Data mining adopting a data mining mindset is essential to success -
Text analysis as such, it’s an area that is worth exploring in greater
Time series analysis detail.
Decision Trees
Conjoint analysis

29

29

Top 10 Data
Analysis Methods

Cluster Analysis Example


Cohort analysis
Regression analysis
Neural networks
Factor analysis
Data mining
Text analysis
Time series analysis
Decision Trees
Conjoint analysis

30

30

15
04/10/2022

Top 10 Data
Analysis Methods

Cluster Analysis Text analysis, also known in the industry as text mining,
Cohort analysis works by taking large sets of textual data and arranging it in
a way that makes it easier to manage. By working through
Regression analysis
this cleansing process in stringent detail, you will be able to
Neural networks extract the data that is truly relevant to your organization and
Factor analysis use it to develop actionable insights that will propel you
Data mining forward.
Text analysis
Time series analysis
Decision Trees
Conjoint analysis

31

31

Top 10 Data
Analysis Methods

Cluster Analysis Example


Cohort analysis
Regression analysis
Neural networks
Factor analysis
Data mining
Text analysis
Time series analysis
Decision Trees
Conjoint analysis

32

32

16
04/10/2022

Top 10 Data
Analysis Methods

Cluster Analysis As its name suggests, the time series analysis is used to
Cohort analysis analyze a set of data points collected over a specified
Regression analysis period of time. Although analysts use this method to
monitor the data points in a specific interval of time rather
Neural networks
than just monitoring them intermittently, the time series
Factor analysis analysis is not uniquely used with the purpose of collecting
Data mining data over time. Instead, it allows researchers to
Text analysis understand if variables changed during the duration of the
Time series analysis study, how the different variables are dependent, and how
did it reach the end result.
Decision Trees
Conjoint analysis

33

33

Top 10 Data
Analysis Methods

Cluster Analysis The decision tree analysis aims to act as a support tool to
Cohort analysis make smart and strategic decisions. By visually displaying
potential outcomes, consequences, and costs in a tree-like
Regression analysis
model, researchers and business users can easily evaluate
Neural networks all factors involved and choose the best course of action.
Factor analysis Decision trees are helpful to analyze quantitative data and
Data mining they allow for an improved decision-making process by
helping you spot improvement opportunities, reduce costs,
Text analysis
enhance operational efficiency and production.
Time series analysis
Decision Trees
Conjoint analysis

34

34

17
04/10/2022

Top 10 Data
Analysis Methods

Cluster Analysis Example


Cohort analysis
Regression analysis
Neural networks
Factor analysis
Data mining
Text analysis
Time series analysis
Decision Trees
Conjoint analysis

35

35

Top 10 Data
Analysis Methods

Cluster Analysis Last but not least, we have the conjoint analysis. This
Cohort analysis approach is usually used in surveys to understand how
Regression analysis individuals value different attributes of a product or service
and it is one of the most effective methods to extract
Neural networks
consumer preferences. When it comes to purchasing,
Factor analysis some clients might be more price-focused, others more
Data mining features-focused, others might have a sustainable focus,
Text analysis whatever your customer's preferences are, you can find
Time series analysis them with conjoint analysis. Like this, companies can
define pricing strategies, packaging options, subscription
Decision Trees
packages, and more.
Conjoint analysis

36

36

18
04/10/2022

Data analysis tools

Business Intelligence
Statistical analysis
SQL Consoles
Data Visualization

37

37

Data analysis tools

Business Intelligence BI tools allow you to process significant amounts of data from
Statistical analysis a number of sources in any format. As such, you can not only
SQL Consoles analyze and monitor your data to extract relevant insights, but
also create interactive reports and dashboards to visualize
Data Visualization
KPIs and use them to your advantage. benefits of your
company. There are many BI tools: Power BI is one of the
best

38

38

19
04/10/2022

Data analysis tools

Business Intelligence These tools are usually designed for scientists, statisticians, market
Statistical analysis researchers, and mathematicians, as they allow them to perform complex
statistical analyses with methods like regression analysis, predictive
SQL Consoles analysis, and statistical modeling. A good tool to perform this type of
Data Visualization analysis is R-Studio as it offers a powerful data modeling and hypothesis
testing feature that can cover both academic and general data analysis.
This tool is one of the favorite ones in the industry, due to its capability for
data cleaning, data reduction, and performing advanced analysis with
several statistical methods. Another relevant tool to mention
is SPSS from IBM. The software offers advanced statistical analysis for
users of all skill levels. Thanks to a vast library of machine learning
algorithms, text analysis, and a hypothesis testing approach it can help
your company find relevant insights to drive better decisions. SPSS also
works as a cloud service that enables you to run it anywhere.

39

39

Data analysis tools

Business Intelligence SQL is a programming language often used to handle


Statistical analysis structured data in relational databases. Tools like these
SQL Consoles ones are popular among data scientists as they are
Data Visualization
extremely effective to unlock the value of these databases.
Without a doubt, one of the most used SQL software's in
the market is MySQL Workbench This tool offers several
features such as a visual tool for database modeling and
monitoring, complete SQL optimization, along with
administration tools, and visual performance dashboards
to keep track of KPIs.

40

40

20
04/10/2022

Data analysis tools

Business Intelligence These tools are used to represent your data through charts,
Statistical analysis graphs, and maps that allow you to find patterns and trends in
SQL Consoles the data. datapine's already mentioned BI platform also offers
a wealth of powerful online data visualization tools with
Data Visualization
several benefits. Some of them include: delivering compelling
data-driven presentations to share with your entire company,
the ability to see your data online with any device wherever
you are, an interactive dashboard design feature that enables
you to showcase your results in an interactive and
understandable way, and to perform online self-service
reports that can be used simultaneously with several other
people to enhance team productivity.

41

41

Column/bar chart
Select this chart to compare data
categorized into separate groups.
Whether it's your sales for each
quarter or comparing scores
between teams.

42

42

21
04/10/2022

Stacked bar/column chart


This graph shows the contribution of
the individual compartments that
make up the columns. For example,
quarterly sales (represented as
compartments) for each region

43

43

Combination chart
This is a combination of bar and line
charts. Choose this chart when you have
a mix of data series types. Or to
represent an additional data series along
with the main data. For example, if the
total sales across regions are
represented by bars, the average sales
and ROI can be represented as individual
lines in the same graph.

44

44

22
04/10/2022

Pie Chart
Use this chart to find out the
contribution of the parts in the whole
set. For example, which expenses
this month account for the most in
your pie chart of income. Or for
example the distribution of business
between regions.
Variation: 3D Circle (3D Pie).

45

45

Ring Chart
The donut chart closely resembles the
pie chart shown as a ring/ring. This
graph also shows the contribution of
parts to the whole set.

46

46

23
04/10/2022

Web Charts
Select this chart for comparative
study of different data series. Radar
charts compare the values of
several data series represented by
data markers, relative to a central
point.
Variation: Web is populated.

47

47

Funnel Chart
Select this chart to show the
progressive flow/decline of a business
metric over periods. For example,
visualize the conversion of your leads
into actual sales across stages like
market-eligible leads, leads to sell to,
and leads acquired.

48

48

24
04/10/2022

Line Chart
Select this chart to visualize trends
of all data series over any time
period. Just like the roller coaster
ride your favorite stock has taken
over the past quarter.
Variation: Steps, seamless lines.

49

49

Scatter Chart
Scatter plots are often used to plot
discrete data with unequal intervals.
This chart is used to compare two
number axes unlike a line chart, where
an axis is never a number.

50

50

25
04/10/2022

Area Chart
Area charts cover the area below
the lines, thus making it easy to
compare data levels. This chart is
mainly useful for highlighting the
change in metrics over time. For
example, change in business
metrics for a specific period of time.
Variations: Area with point, region
with solid, region with point.

51

51

Stacked Area Chart


A stacked area chart showing the relationship of
the parts to the whole. With this chart, see how
much individual stacks or elements have
contributed to the total value over time.
Variations: Stacked Area with points, Stacked
Simple Area, Stacked Simple Area with points.

52

52

26
04/10/2022

Bullet chart
The bullet chart highlights a key
scale and compares this scale to the
target value, relative to qualitative
performance ranges, such as poor,
satisfactory, and good. Bullet charts
can often be used as widgets in
dashboards because there's a lot of
information in a smaller space.

53

53

Dial Chart
Select this chart to show the current
value in a range. It is similar to a bullet
chart but the values are displayed on the
dial. The dial chart is the perfect fit for
your business and executive
dashboards.

54

54

27
04/10/2022

Bubble Chart
Select this chart if you want to add
another level - dimension to your
visualization. Bubble charts are
extremely useful in highlighting the
weights of a data metric.

55

55

Packing bubble chart


As the name suggests, Packed Bubbles
represent data in a cluster of circles or
bubbles. This histogram is used to display
values without the need for axes. You can
use packed bubble charts to visualize large
amounts of data in a small space.

56

56

28
04/10/2022

To be continue

57

57

29

You might also like