0% found this document useful (0 votes)
4 views

Big Data Analytics Introduction-lect 1

Big data analytics involves examining large datasets to uncover insights that aid in informed business decisions, such as identifying market trends and customer preferences. It is crucial for data-driven decision-making, leading to improved marketing, operational efficiency, and new revenue opportunities. The process includes steps like data collection, cleansing, analysis, and the application of various analytical techniques such as machine learning and predictive analytics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Big Data Analytics Introduction-lect 1

Big data analytics involves examining large datasets to uncover insights that aid in informed business decisions, such as identifying market trends and customer preferences. It is crucial for data-driven decision-making, leading to improved marketing, operational efficiency, and new revenue opportunities. The process includes steps like data collection, cleansing, analysis, and the application of various analytical techniques such as machine learning and predictive analytics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Big data analytics

• complex process of examining big data to uncover


information –
• such as hidden patterns, correlations,
• market trends and
• Customer preferences –
• that can help organizations make informed business
decisions.
• data analytics technologies and techniques give
organizations a way to analyze data sets and gather
new information
Why is big data analytics important?
• to make data-driven decisions that can
improve business-related outcomes
• More effective marketing, new revenue
opportunities, customer personalization and
improved operational efficiency
How does big data analytics work?
• collect
• process :organize, configure and
• partition the data properly for analytical
queries,
• clean any errors or inconsistencies, such as
duplications or formatting mistakes, and
organize and uncluttered the data.
• analyze
Analyze
• data mining, which sifts through data sets in search of patterns and
• relationships
• predictive analytics, which builds models to forecast customer
behavior and other future actions, scenarios and trends
• machine learning, which taps various algorithms to analyze large
data sets
• deep learning, which is a more advanced offshoot of machine
learning
• text mining and statistical analysis software
• artificial intelligence (AI)
• mainstream business intelligence software
• data visualization tools
Steps of Data Analytics
• Precise
results
• Setting • Data ’
prioriti analysi interpr
es for • Data s etation
• Goals measu cleansi • Data • Chec
setting remen • Data ng minin king
• Vital, ts • Outli g, whet
gatheri
unde • Decid er busin her
ng
rstan e • Avail reject ess they
dable what ion, intelli are
able
, to missi genc helpf
datas
simpl meas ng e, ul in
ets,
e, uring, value data meeti
recor
short, and s visual ng
ding/
and what inter izatio initial
gener
meas meth polati n, objec
ating
urabl ods on, explo tives,
data
e to data rator result
goals use struct y s
for uring data limiti
meas analy ng, or
ure it sis incon
clusiv
e
1 2 3 4 5 6
1. Goal Setting
• The business unit has to decide on objectives for the
data analytics.
• These objectives might be set out in question format
• For example, if a business is struggling to sell its products,
some relevant questions may be:
– Are we overpricing our goods?
– How is the competition’s product different to ours?
• To answer the question, “Are we overpricing our goods?”
business company have to gather data of:
– Production costs
– Details about the price of similar goods on the market.
2. Setting Priorities for Measurements
• Determining what type of data is
needed to answer the questions
regarding objectives.
• How much time to take for the
analysis of the project.
• The units of measurement going to
be using.
3. Data Gathering
• Data can be already available datasets
• Data can be generated by:
– The direct or interview method
• Company would interview “shoppers” regarding their favorite brand of
toothpaste.
– The indirect or questionnaire method
• The questionnaire are distributed to the respondents either by personal
delivery or by mail/email.
– The registration method
• The registration records kept by government organizations, e.g.,
NADRA.
– The experimental method
• Experimentation, simulation.
4. Data Cleansing
• Data cleansing process identifying:
– Incomplete
– Incorrect
– Inaccurate
– Irrelevant parts of the data
• The dirty or coarse data is:
• Replaced
• Modified
• Or deleted.
Data Cleansing Cycle
5. Data Analysis
• Data analysis is process of:
– Evaluating data using:
• Analytical reasoning
• Logical reasoning
• To examine each component of the data provided.
Steps of Data
Analysis
Data Analytics Capabilities
Feature Engineering FE
• “Feature engineering is the process of transforming
raw data into features that better represent the
underlying problem to the predictive models,
resulting in improved accuracy on unseen data.”
Jason Brownlee, Machine Learning Mastery.

• As the models are getting better and better, the focus


shifts to what is put into them.

• Transforming data to create model’s inputs.


Feature Extraction
• Dimension reduction
– Principal component analysis (PCA)
– Non-negative matrix factorization (NMF)
– Kernel PCA
– Graph-based kernel PCA
– Generalized discriminant analysis (GDA)
• Data smoothing
– Wavelet transform
– Ramer–Douglas–Peucker algorithm
– Kernel smoother
– Laplacian smoothing
– Local regression, …
Feature Selection
• Identifying features that are redundant or
irrelevant
• Improved model interpretability.
Models for Analysis
• Approaches
– Classification
– Regression
• Techniques
– Data mining
– Machine learning
– Artificial Intelligence (AI)
Introduction to Computational
Data Analytics
The computational data analytics
• The computational data analytics : interdisciplinary
field to provide depth and specialization in
• data science, ML, deep learning, natural language,
AI, visualization, databases, high-performance
computing, etc.
• Some examples of computational thinking
include developing a chess strategy, making and
reading maps, and organizing a long to-do list into
manageable daily tasks
Computational Data Analytics
• Steps of Computational Thinking:
• Abstraction: Problem formulation;
• Automation: Solution expression;
• Analysis: Solution execution and evaluation.

• Principals of Computational Thinking:


• This broad problem-solving technique includes four elements:
• decomposition, pattern recognition, abstraction and algorithms.

• There are a variety of ways that students can practice their


computational thinking, well before they try computer
programming.
Computational Data Analytics
• computational skills are defined as the abilities to
calculate basic addition, subtraction, multiplication,
and division problems quickly and accurately using
mental methods, paper-and-pencil, and other tools,
such as a calculator.
• The biggest benefit of computational thinking is how
it enables real-world problem solving. For kids,
knowing how to take large problems and break them
into simpler steps can help with everything from
solving math problems to writing a book report.
Computational Data Analytics
• Types of Computation:
• Models of computation can be classified into three
categories: sequential models, functional models,
and concurrent models.
• Purpose of Computational: Computational models
intelligently gather, filter, analyze and present
information
• e.g. present health information to provide guidance
to doctors for disease treatment based on detailed
characteristics of each patient.
Classification Prediction
Definition: A classification is a division or Definition: Prediction is a statement
category in a system which divides things made about the future, forecasting
into groups or types unknown/ future figures
Model: Predicts categorical class labels Model: Models continuous-valued
(discrete or nominal) functions, i.e., predicts unknown or
missing values
Methods: Methods:
Linear Classifier LDA Linear Regression
SVM Non linear regression
Decision trees Poisson regression
Bayesian Classifier Generalized linear model
Artificial Neural network Log-linear models
Kernel estimation k-nearest neighbor Regression trees
Applications : Email spam filtering Applications : Credit approval
Cancer diagnosis Target marketing
Voice classification (for Siri type Fault avoidance
applications) Medical diagnosis
Video classification (for uploaded videos Fraud detection
on youtube, etc.)

You might also like