06_Banerjee and Banerjee_Business Analytics_Ch06
06_Banerjee and Banerjee_Business Analytics_Ch06
Non-parametric Data
Contents
Opening example:
• Every year, about 2 lac students in India appear for the CAT for securing admission
across reputed business schools in the postgraduate programme in India.
• This is a computer-based examination that tests the students’ verbal ability (VA),
quantitative ability (QA), data interpretation (DI), logical reasoning (LR) and reading
comprehension (RC).
• Students receive their score as a percentile which ranks them vis-a-vis the
performance of all those who appeared for the examination.
• But does a high score in CAT guarantee a seat in big business schools?
• Several variables and metrics are considered together by the business schools’
admission team for finalizing the list of admitted students.
Significance of Sampling in Business Research
• There are primarily two types of sampling techniques: probability and non-
probability sampling.
• In probability sampling, every item in the sample has an equal chance of getting
selected.
• While in non-probability technique, interventions such as likelihood of reach,
human intelligence and grouping methods are used.
Types of Sampling
Confidence Interval and Hypothesis Testing
• When the variables are numerical, the degree of strength of the relationship
between them is expressed by correlation.
• The variables need to be numerical and continuous in nature. For example, age,
height, weight, sales volume and number of units sold per day.
• Correlation is expressed by r which has a value from +1 to –1. Positive values
indicate positive correlation (as one variable increases so does the other and vice
versa).
Negative Correlation
January Temperature
80
70
f(x) = − 1.75828304061843 x + 104.98203702448
60 R² = 0.722976628031957
T
e 50
m
p
e 40
r
a
t 30
u
r
e 20
10
0
15 20 25 30 35 40 45 50 55 60 65
Latitude
Factor Analysis
• OLS regression models, another type of multivariate analyses, are very prolifically
used as prediction models.
• The basic requirement for developing these models is an outcome variable that is
measured on a continuous scale (interval or ratio) and juxtaposed on the outcome
variable should be some relevant predictor (explanatory) variables also measured
usually on a continuous scale.
• This modelling technique is actually another form of correlation analyses across
multiple variables.
• The theme of these models is the association of one target variable (outcome) to
other explanatory (predictor) variables, which may also be termed as ‘antecedent’
variables. Antecedence is only proven by the domain in which the model is being
built and not by the correlation among the variables.
Regression (OLS) Models
Regression (OLS) Models
• The closer the fit is to the actual data (higher r-square), the better is the chance
that the equation will be able to predict outcomes based on values of the input
(explanatory) variables.
• However, there is no guarantee that the models will continue to predict well across
other data samples, unless the nature of the data remains largely the same.
• The standardized coefficient (the magnitude) signifies the importance of the
variable in determining the value of the outcome. The sign (+ve or –ve) determines
the nature of the relationship between the outcome and the variable.
Regression (OLS) Models
• There are many applications where the primary role of the model is to find a
relationship between explanatory variable and the outcome, in order to
predict/forecast outcomes in the future. Such models are termed as forecasting
models.
• Smaller is the band of uncertainty, higher is the confidence in the estimated
outcome (or forecast).
• R-square of these models is usually expected to be very high (although this is just
one of the many necessary conditions).
Forecasting and Time Series Analysis