lecture1
lecture1
Basic Concepts
1/49
Data Mining Methods
Prof. Dr. C. Andersson
Contents
Basic Concepts
1 Basic Concepts
2/49
Data Mining Methods
Prof. Dr. C. Andersson
What is data mining?
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
3/49
Data Mining Methods
Prof. Dr. C. Andersson
Data mining is not ...
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
4/49
Data Mining Methods
Prof. Dr. C. Andersson
Definitions of data mining (1)
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
5/49
Data Mining Methods
Prof. Dr. C. Andersson
Definitions of data mining (2)
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
6/49
Data Mining Methods
Prof. Dr. C. Andersson
Definitions of data mining (3)
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
7/49
Data Mining Methods
Prof. Dr. C. Andersson
Concepts occurring in most definitions
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
8/49
Data Mining Methods
Prof. Dr. C. Andersson
Broad and narrow definitions
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
Broad definition:
Traditional statistical methods included
Narrow definition:
Focusing on automated/heuristic methods
Knowledge discovery in databases:
A step in the KDD process
9/49
Data Mining Methods
Prof. Dr. C. Andersson
What is typical for data mining?
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
10/49
Data Mining Methods
Prof. Dr. C. Andersson
Multidisciplinarity
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
Confusing terminology
Case = row = observation
Target = response variable = output = dependent
variable
11/49
Data Mining Methods
Prof. Dr. C. Andersson
Multidisciplinarity
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
12/49
Data Mining Methods
Prof. Dr. C. Andersson
Some of many application branches
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
13/49
Data Mining Methods
Prof. Dr. C. Andersson
Some of many application areas
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
14/49
Data Mining Methods
Prof. Dr. C. Andersson
Example: Banking
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
15/49
Data Mining Methods
Prof. Dr. C. Andersson
Example: Insurance companies
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
16/49
Data Mining Methods
Prof. Dr. C. Andersson
Example: Telecommunication
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process Data:
Data Mining Methods
Customer records (personal information, products
Data Issues in Data
Mining bought, billings)
Transactional data (concerning each phone call)
Other data (network load, breakdowns, )
Questions:
Which customer group is more profitable than
other groups?
Is the behavior of the customer changing over
time?
To which customers should we present a special
offer?
Fraud detection
Churn identification and prevention
...
17/49
Data Mining Methods
Prof. Dr. C. Andersson
Example: Retail companies
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
18/49
Data Mining Methods
Prof. Dr. C. Andersson
Knowledge discovery in databases
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
19/49
Data Mining Methods
Prof. Dr. C. Andersson
The data mining process
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
20/49
Data Mining Methods
Prof. Dr. C. Andersson
Examples of data mining methods
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
21/49
Data Mining Methods
Prof. Dr. C. Andersson
How to know when to use which method(s)?
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
Basic guidance:
Predictive modeling (supervised)
Classification
(Point) Estimation
Pattern discovery (unsupervised)
22/49
Data Mining Methods
Prof. Dr. C. Andersson
Supervised vs. unsupervised learning
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
23/49
Data Mining Methods
Prof. Dr. C. Andersson
Supervised vs. unsupervised learning
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
24/49
Data Mining Methods
Prof. Dr. C. Andersson
Supervised learning: The data
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
25/49
Data Mining Methods
Prof. Dr. C. Andersson
Multiple linear regression
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
26/49
Data Mining Methods
Prof. Dr. C. Andersson
Logistic regression
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
27/49
Data Mining Methods
Prof. Dr. C. Andersson
Decision trees
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
28/49
Data Mining Methods
Prof. Dr. C. Andersson
Ensemble decision trees
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
29/49
Data Mining Methods
Prof. Dr. C. Andersson
Neural networks
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
30/49
Data Mining Methods
Prof. Dr. C. Andersson
Support vector machines
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
31/49
Data Mining Methods
Prof. Dr. C. Andersson
Discriminant analysis
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
32/49
Data Mining Methods
Prof. Dr. C. Andersson
Cluster analysis
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
33/49
Data Mining Methods
Prof. Dr. C. Andersson
Association analysis
Basic Concepts
What is Data Mining?
Applications
The Data Mining Process
Data Mining Methods
34/49
Data Mining Methods
Prof. Dr. C. Andersson
Typical features of data in data mining
Basic Concepts
35/49
Data Mining Methods
Prof. Dr. C. Andersson
Data size
Basic Concepts
36/49
Data Mining Methods
Prof. Dr. C. Andersson
Data sources
Basic Concepts
37/49
Data Mining Methods
Prof. Dr. C. Andersson
Data sources
Basic Concepts
38/49
Data Mining Methods
Prof. Dr. C. Andersson
Recall: Types and scales of data
Basic Concepts
Information scale:
Nominal
Ordinal
Interval
Ratio
39/49
Data Mining Methods
Prof. Dr. C. Andersson
Recall: Quantitative variables
Basic Concepts
40/49
Data Mining Methods
Prof. Dr. C. Andersson
Ordering of data
Basic Concepts
41/49
Data Mining Methods
Prof. Dr. C. Andersson
Data organization
Basic Concepts
42/49
Data Mining Methods
Prof. Dr. C. Andersson
Meta-data
Basic Concepts
43/49
Data Mining Methods
Prof. Dr. C. Andersson
Data warehousing
Basic Concepts
44/49
Data Mining Methods
Prof. Dr. C. Andersson
Problems: Target variable
Basic Concepts
Oversampling required
No modeling of rejected customers (What should
have happened?)
45/49
Data Mining Methods
Prof. Dr. C. Andersson
Problems: Dirty data
Basic Concepts
46/49
Data Mining Methods
Prof. Dr. C. Andersson
Problems: Dirty data
Basic Concepts
Inconsistent data
Coding, impossible values, out-of-range values, ...
Noisy data
Data with errors, outliers, random fluctuations
47/49
Data Mining Methods
Prof. Dr. C. Andersson
The curse of dimensionality
Basic Concepts
48/49
Data Mining Methods
Prof. Dr. C. Andersson
Input variable reduction
Basic Concepts
49/49