BD-Topic 4-Big Data
BD-Topic 4-Big Data
TEACHING SLIDE
SUBJECT
Topic
4
Data Analysis Methods And Techniques
1
04/10/2022
Learning objectives
Theory lesson Practice lesson
2
04/10/2022
Why Is Data
Analysis Important? Informed decision-making
Why Is Data
Analysis Important? Reduce costs
3
04/10/2022
Why Is Data
Analysis Important? Target customers better
5 2
There is an order to follow in Interpret Collect
order to extract the needed
conclusions.
4 3
Analyze Clean
4
04/10/2022
1 1 - Identify
Identify
Before you get your hands dirty with data, you first
5 2 need to identify why do you need it in the first place.
Interpret Collect The identification is the stage in which you establish
the questions you will need to answer. For example,
what is the customer's perception of our brand? Or
what type of packaging is more engaging to our
4 3 potential customers? Once the questions are
Analyze Clean outlined you are ready for the next step.
2 - Collect
1
Identify
As its name suggests, this is the stage where you
start collecting the needed data. Here, you define
5 2
Interpret Collect
which sources of information you will use and how
you will use them. The collection of data can come
in different forms such as internal or external
sources, surveys, interviews, questionnaires, focus
groups, among others. An important note here is
4 3
Analyze Clean that the way you collect the information will be
different in a quantitative and qualitative scenario.
10
10
5
04/10/2022
3 - Clean
1
Identify
Once you have the necessary data it is time to
clean it and leave it ready for analysis. Not all the
5 2
Interpret Collect
data you collect will be useful, when collecting big
amounts of information in different formats it is very
likely that you will find yourself with duplicate or
badly formatted data. To avoid this, before you start
working with your data you need to make sure to
4 3
Analyze Clean erase any white spaces, duplicate records, or
formatting errors. This way you avoid hurting your
analysis with incorrect data.
11
11
3 - Analyze
1
Identify
Once you have the necessary data it is time to
clean it and leave it ready for analysis. Not all the
5 2
Interpret Collect
data you collect will be useful, when collecting big
amounts of information in different formats it is very
likely that you will find yourself with duplicate or
badly formatted data. To avoid this, before you start
working with your data you need to make sure to
4 3
Analyze Clean erase any white spaces, duplicate records, or
formatting errors. This way you avoid hurting your
analysis with incorrect data.
12
12
6
04/10/2022
4 - Analyze
1
Identify
With the help of various techniques such as statistical
analysis, regressions, neural networks, text analysis, and
5 2 more, you can start analyzing and manipulating your data
Interpret Collect
to extract relevant conclusions. At this stage, you find
trends, correlations, variations, and patterns that can help
you answer the questions you first thought of in the
identify stage. Various technologies in the market assists
4 3 researchers and average business users with the
Analyze Clean management of their data. Some of them include
business intelligence and visualization software,
predictive analytics, data mining, among others.
13
13
5 - Interpret
1
Identify
Last but not least you have one of the most
important steps: it is time to interpret your
5 2
Interpret Collect results. This stage is where the researcher
comes up with courses of action based on the
findings. For example, here you would
understand if your clients prefer packaging that
4 3 is red or green, plastic or paper, etc.
Analyze Clean
Additionally, at this stage, you can also find
some limitations and work on them.
14
14
7
04/10/2022
15
15
16
16
8
04/10/2022
17
17
18
18
9
04/10/2022
The predictive method allows you to look into the future to answer the
question: what will happen? In order to do this, it uses the results of the
previously mentioned descriptive, exploratory, and diagnostic analysis, in
addition to machine learning (ML) and artificial intelligence (AI). Like this,
you can uncover future trends, potential problems or inefficiencies,
connections, and casualties in your data.
With predictive analysis, you can unfold and develop initiatives that will
not only enhance your various operational processes but also help you
gain an all-important edge on the competition. If you understand why a
trend, pattern, or event happened through data, you will be able to
develop an informed projection of how things may unfold in particular
areas of the business.
19
19
20
20
10
04/10/2022
Top 10 Data
Analysis Methods
Cluster Analysis The action of grouping a set of data elements in a way that said elements
are more similar (in a particular sense) to each other than to those in
Cohort analysis
other groups – hence the term ‘cluster.’ Since there is no target variable
Regression analysis when clustering, the method is often used to find hidden patterns in the
data. The approach is also used to provide additional context to a trend
Neural networks
or dataset.
Factor analysis Let's look at it from a business perspective. In a perfect world, marketers
Data mining would be able to analyze each customer separately and give them the
best-personalized service, but let's face it, with a large customer base, it
Text analysis is timely impossible to do that. That's where clustering comes in. By
Time series analysis grouping customers into clusters based on demographics, purchasing
behaviors, monetary value, or any other factor that might be relevant for
Decision Trees your company, you will be able to immediately optimize your efforts and
Conjoint analysis give your customers the best experience based on their needs.
21
21
Top 10 Data
Analysis Methods
Cluster Analysis This type of data analysis method uses historical data to examine and
compare a determined segment of users' behavior, which can then be
Cohort analysis
grouped with others with similar characteristics. By using this
Regression analysis methodology, it's possible to gain a wealth of insight into consumer needs
or a firm understanding of a broader target group.
Neural networks
Cohort analysis can be really useful to perform analysis in marketing as it
Factor analysis will allow you to understand the impact of your campaigns on specific
Data mining groups of customers. To exemplify, imagine you send an email campaign
encouraging customers to sign up to your site. For this, you create two
Text analysis versions of the campaign with different designs, CTAs, and ad content.
Time series analysis Later on, you can use cohort analysis to track the performance of the
campaign for a longer period of time and understand which type of
Decision Trees content is driving your customers to sign up, repurchase, or engage in
Conjoint analysis other ways.
22
22
11
04/10/2022
Top 10 Data
Analysis Methods
23
23
Top 10 Data
Analysis Methods
24
24
12
04/10/2022
Top 10 Data
Analysis Methods
25
25
Top 10 Data
Analysis Methods
Cluster Analysis The neural network forms the basis for the intelligent
Cohort analysis algorithms of machine learning. It is a form of analytics
Regression analysis that attempts, with minimal intervention, to understand
how the human brain would generate insights and predict
Neural networks
values. Neural networks learn from each and every data
Factor analysis transaction, meaning that they evolve and advance over
Data mining time.
Text analysis A typical area of application for neural networks is predictive
Time series analysis analytics. There are BI reporting tools that have this feature
Decision Trees implemented within them, such as the Predictive Analytics
Tool from SPSS, R, Stata…
Conjoint analysis
26
26
13
04/10/2022
Top 10 Data
Analysis Methods
27
27
Top 10 Data
Analysis Methods
28
28
14
04/10/2022
Top 10 Data
Analysis Methods
Cluster Analysis A method of data analysis that is the umbrella term for
Cohort analysis engineering metrics and insights for additional value,
Regression analysis direction, and context. By using exploratory statistical
evaluation, data mining aims to identify dependencies,
Neural networks
relations, patterns, and trends to generate advanced
Factor analysis knowledge. When considering how to analyze data,
Data mining adopting a data mining mindset is essential to success -
Text analysis as such, it’s an area that is worth exploring in greater
Time series analysis detail.
Decision Trees
Conjoint analysis
29
29
Top 10 Data
Analysis Methods
30
30
15
04/10/2022
Top 10 Data
Analysis Methods
Cluster Analysis Text analysis, also known in the industry as text mining,
Cohort analysis works by taking large sets of textual data and arranging it in
a way that makes it easier to manage. By working through
Regression analysis
this cleansing process in stringent detail, you will be able to
Neural networks extract the data that is truly relevant to your organization and
Factor analysis use it to develop actionable insights that will propel you
Data mining forward.
Text analysis
Time series analysis
Decision Trees
Conjoint analysis
31
31
Top 10 Data
Analysis Methods
32
32
16
04/10/2022
Top 10 Data
Analysis Methods
Cluster Analysis As its name suggests, the time series analysis is used to
Cohort analysis analyze a set of data points collected over a specified
Regression analysis period of time. Although analysts use this method to
monitor the data points in a specific interval of time rather
Neural networks
than just monitoring them intermittently, the time series
Factor analysis analysis is not uniquely used with the purpose of collecting
Data mining data over time. Instead, it allows researchers to
Text analysis understand if variables changed during the duration of the
Time series analysis study, how the different variables are dependent, and how
did it reach the end result.
Decision Trees
Conjoint analysis
33
33
Top 10 Data
Analysis Methods
Cluster Analysis The decision tree analysis aims to act as a support tool to
Cohort analysis make smart and strategic decisions. By visually displaying
potential outcomes, consequences, and costs in a tree-like
Regression analysis
model, researchers and business users can easily evaluate
Neural networks all factors involved and choose the best course of action.
Factor analysis Decision trees are helpful to analyze quantitative data and
Data mining they allow for an improved decision-making process by
helping you spot improvement opportunities, reduce costs,
Text analysis
enhance operational efficiency and production.
Time series analysis
Decision Trees
Conjoint analysis
34
34
17
04/10/2022
Top 10 Data
Analysis Methods
35
35
Top 10 Data
Analysis Methods
Cluster Analysis Last but not least, we have the conjoint analysis. This
Cohort analysis approach is usually used in surveys to understand how
Regression analysis individuals value different attributes of a product or service
and it is one of the most effective methods to extract
Neural networks
consumer preferences. When it comes to purchasing,
Factor analysis some clients might be more price-focused, others more
Data mining features-focused, others might have a sustainable focus,
Text analysis whatever your customer's preferences are, you can find
Time series analysis them with conjoint analysis. Like this, companies can
define pricing strategies, packaging options, subscription
Decision Trees
packages, and more.
Conjoint analysis
36
36
18
04/10/2022
Business Intelligence
Statistical analysis
SQL Consoles
Data Visualization
37
37
Business Intelligence BI tools allow you to process significant amounts of data from
Statistical analysis a number of sources in any format. As such, you can not only
SQL Consoles analyze and monitor your data to extract relevant insights, but
also create interactive reports and dashboards to visualize
Data Visualization
KPIs and use them to your advantage. benefits of your
company. There are many BI tools: Power BI is one of the
best
38
38
19
04/10/2022
Business Intelligence These tools are usually designed for scientists, statisticians, market
Statistical analysis researchers, and mathematicians, as they allow them to perform complex
statistical analyses with methods like regression analysis, predictive
SQL Consoles analysis, and statistical modeling. A good tool to perform this type of
Data Visualization analysis is R-Studio as it offers a powerful data modeling and hypothesis
testing feature that can cover both academic and general data analysis.
This tool is one of the favorite ones in the industry, due to its capability for
data cleaning, data reduction, and performing advanced analysis with
several statistical methods. Another relevant tool to mention
is SPSS from IBM. The software offers advanced statistical analysis for
users of all skill levels. Thanks to a vast library of machine learning
algorithms, text analysis, and a hypothesis testing approach it can help
your company find relevant insights to drive better decisions. SPSS also
works as a cloud service that enables you to run it anywhere.
39
39
40
40
20
04/10/2022
Business Intelligence These tools are used to represent your data through charts,
Statistical analysis graphs, and maps that allow you to find patterns and trends in
SQL Consoles the data. datapine's already mentioned BI platform also offers
a wealth of powerful online data visualization tools with
Data Visualization
several benefits. Some of them include: delivering compelling
data-driven presentations to share with your entire company,
the ability to see your data online with any device wherever
you are, an interactive dashboard design feature that enables
you to showcase your results in an interactive and
understandable way, and to perform online self-service
reports that can be used simultaneously with several other
people to enhance team productivity.
41
41
Column/bar chart
Select this chart to compare data
categorized into separate groups.
Whether it's your sales for each
quarter or comparing scores
between teams.
42
42
21
04/10/2022
43
43
Combination chart
This is a combination of bar and line
charts. Choose this chart when you have
a mix of data series types. Or to
represent an additional data series along
with the main data. For example, if the
total sales across regions are
represented by bars, the average sales
and ROI can be represented as individual
lines in the same graph.
44
44
22
04/10/2022
Pie Chart
Use this chart to find out the
contribution of the parts in the whole
set. For example, which expenses
this month account for the most in
your pie chart of income. Or for
example the distribution of business
between regions.
Variation: 3D Circle (3D Pie).
45
45
Ring Chart
The donut chart closely resembles the
pie chart shown as a ring/ring. This
graph also shows the contribution of
parts to the whole set.
46
46
23
04/10/2022
Web Charts
Select this chart for comparative
study of different data series. Radar
charts compare the values of
several data series represented by
data markers, relative to a central
point.
Variation: Web is populated.
47
47
Funnel Chart
Select this chart to show the
progressive flow/decline of a business
metric over periods. For example,
visualize the conversion of your leads
into actual sales across stages like
market-eligible leads, leads to sell to,
and leads acquired.
48
48
24
04/10/2022
Line Chart
Select this chart to visualize trends
of all data series over any time
period. Just like the roller coaster
ride your favorite stock has taken
over the past quarter.
Variation: Steps, seamless lines.
49
49
Scatter Chart
Scatter plots are often used to plot
discrete data with unequal intervals.
This chart is used to compare two
number axes unlike a line chart, where
an axis is never a number.
50
50
25
04/10/2022
Area Chart
Area charts cover the area below
the lines, thus making it easy to
compare data levels. This chart is
mainly useful for highlighting the
change in metrics over time. For
example, change in business
metrics for a specific period of time.
Variations: Area with point, region
with solid, region with point.
51
51
52
52
26
04/10/2022
Bullet chart
The bullet chart highlights a key
scale and compares this scale to the
target value, relative to qualitative
performance ranges, such as poor,
satisfactory, and good. Bullet charts
can often be used as widgets in
dashboards because there's a lot of
information in a smaller space.
53
53
Dial Chart
Select this chart to show the current
value in a range. It is similar to a bullet
chart but the values are displayed on the
dial. The dial chart is the perfect fit for
your business and executive
dashboards.
54
54
27
04/10/2022
Bubble Chart
Select this chart if you want to add
another level - dimension to your
visualization. Bubble charts are
extremely useful in highlighting the
weights of a data metric.
55
55
56
56
28
04/10/2022
To be continue
57
57
29