0% found this document useful (0 votes)
11 views21 pages

06_Banerjee and Banerjee_Business Analytics_Ch06

Chapter 6 discusses various analytical methods for both parametric and non-parametric data in business research, emphasizing the significance of sampling, confidence intervals, hypothesis testing, and correlation. It covers techniques such as cross-tabulation, factor analysis, regression models, and forecasting, while addressing issues like multicollinearity and heteroscedasticity in time series analysis. The chapter aims to equip researchers with the necessary tools to analyze data effectively and make informed decisions based on statistical findings.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views21 pages

06_Banerjee and Banerjee_Business Analytics_Ch06

Chapter 6 discusses various analytical methods for both parametric and non-parametric data in business research, emphasizing the significance of sampling, confidence intervals, hypothesis testing, and correlation. It covers techniques such as cross-tabulation, factor analysis, regression models, and forecasting, while addressing issues like multicollinearity and heteroscedasticity in time series analysis. The chapter aims to equip researchers with the necessary tools to analyze data effectively and make informed decisions based on statistical findings.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Chapter 6: Analytical Methods for Parametric and

Non-parametric Data
Contents

1. Significance of sampling in business research.


2. Confidence interval and hypothesis testing.
3. Cross-tabulation.
4. Correlation.
5. Factor analysis.
6. Regression (OLS) models.
7. Multicollinearity.
8. Forecasting and time series analysis.
9. Heteroscedasticity in time series models.
Significance of Sampling in Business Research

Opening example:
• Every year, about 2 lac students in India appear for the CAT for securing admission
across reputed business schools in the postgraduate programme in India.
• This is a computer-based examination that tests the students’ verbal ability (VA),
quantitative ability (QA), data interpretation (DI), logical reasoning (LR) and reading
comprehension (RC).
• Students receive their score as a percentile which ranks them vis-a-vis the
performance of all those who appeared for the examination.
• But does a high score in CAT guarantee a seat in big business schools?
• Several variables and metrics are considered together by the business schools’
admission team for finalizing the list of admitted students.
Significance of Sampling in Business Research

• Population is the entire collection of items under study.


• The characteristics of the individuals (variables/attributes) who form the
population help define the target group.
• Small group that is a representation of the population under study and has all the
characteristics of the population is called a sample.
Types of Sampling

• There are primarily two types of sampling techniques: probability and non-
probability sampling.
• In probability sampling, every item in the sample has an equal chance of getting
selected.
• While in non-probability technique, interventions such as likelihood of reach,
human intelligence and grouping methods are used.
Types of Sampling
Confidence Interval and Hypothesis Testing

• Analysts test several hypotheses based on objectives of a research problem or


opportunity. In most cases, the investment of time, money and resources is done
because the analyst intends to test the possibility of accepting an alternate
hypothesis.
• As you have studied in statistics and BRM course, the confidence interval is used to
accept or reject the null hypothesis based on statistical tests conducted on the
sample data.
• For a 95 per cent confidence interval, the null hypothesis gets rejected if the p-
value in the outcome is less than 0.05, indicating that the value lies in the 5 per
cent rejection region.
Bell-Shaped Curve
Cross-Tabulation

• Cross-tabulation or contingency tables are ways of grouping variable to analyse the


relationships between them.
• They are used for categorical data, groups of data in categories that are mutually
exclusive.
Correlation

• When the variables are numerical, the degree of strength of the relationship
between them is expressed by correlation.
• The variables need to be numerical and continuous in nature. For example, age,
height, weight, sales volume and number of units sold per day.
• Correlation is expressed by r which has a value from +1 to –1. Positive values
indicate positive correlation (as one variable increases so does the other and vice
versa).
Negative Correlation

January Temperature
80

70
f(x) = − 1.75828304061843 x + 104.98203702448
60 R² = 0.722976628031957
T
e 50
m
p
e 40
r
a
t 30
u
r
e 20

10

0
15 20 25 30 35 40 45 50 55 60 65
Latitude
Factor Analysis

• A correlation is a useful input into conducting various types of factor analysis.


• A commonly used variant of factor analysis is the PCA, also referred as exploratory
factor analysis.
• In PCA, all associations among variables of interest are identified (in numeric terms
through a correlation analysis).
• Factor analysis (PCA) provides a platform to reduce data without a commensurate
reduction in the information content of the data. Using correlation as the basis of
commonality across variables, it groups variables with similar ‘information’ and
bunches them together (each one of the variables represents others).
• In applications, this is a useful way to summarize the information into tighter
dimensions, helpful for understanding and interpreting the information.
Regression (OLS) Models

• OLS regression models, another type of multivariate analyses, are very prolifically
used as prediction models.
• The basic requirement for developing these models is an outcome variable that is
measured on a continuous scale (interval or ratio) and juxtaposed on the outcome
variable should be some relevant predictor (explanatory) variables also measured
usually on a continuous scale.
• This modelling technique is actually another form of correlation analyses across
multiple variables.
• The theme of these models is the association of one target variable (outcome) to
other explanatory (predictor) variables, which may also be termed as ‘antecedent’
variables. Antecedence is only proven by the domain in which the model is being
built and not by the correlation among the variables.
Regression (OLS) Models
Regression (OLS) Models

• The closer the fit is to the actual data (higher r-square), the better is the chance
that the equation will be able to predict outcomes based on values of the input
(explanatory) variables.
• However, there is no guarantee that the models will continue to predict well across
other data samples, unless the nature of the data remains largely the same.
• The standardized coefficient (the magnitude) signifies the importance of the
variable in determining the value of the outcome. The sign (+ve or –ve) determines
the nature of the relationship between the outcome and the variable.
Regression (OLS) Models

• The (un)standardized coefficients indicate the relationship between each of the


explanatory variables and the outcome variable (referred as dependent variable).
• The ‘std. error’ is a measure of the ‘fuzziness’ of the relationship (higher the
number, larger the fuzziness about the existence of the relationship).
• The unstandardized coefficient and the ‘std. error’ together determine the strength
of the relationship and is indicated by the ‘t’.
• Higher the absolute ‘t’ value, stronger is the relationship between the specific
predictor variable and the outcome variable.
Multicollinearity

• Multicollinearity is a usual problem in a diagnostic regression model which is meant to


identify the strength of the relationships of predictor variables with outcome variable.
• If the true strength of the relationship is ‘hidden’ due to the correlation of an
explanatory variable with another, the objective of identifying relationships among
variables (predictor and outcome variables) becomes difficult.
• Usually, when two explanatory variables are correlated, the regression model is unable
to churn out the independent associations of each of the explanatory variables with
the outcome variable, since the former have an association among them as well.
• In reality, even when correlations are moderate, effect of multicollinearity is noticed in
terms of higher ‘fuzziness’ in coefficients, weakening the association with the outcome
variables.
• Manoeuvring around a multicollinearity problem that makes diagnosis complicated
requires some creativity and prior experience of handling such problems and does not
usually have a standard solution.
Forecasting and Time Series Analysis

• There are many applications where the primary role of the model is to find a
relationship between explanatory variable and the outcome, in order to
predict/forecast outcomes in the future. Such models are termed as forecasting
models.
• Smaller is the band of uncertainty, higher is the confidence in the estimated
outcome (or forecast).
• R-square of these models is usually expected to be very high (although this is just
one of the many necessary conditions).
Forecasting and Time Series Analysis

• Time series analysis is a form of forecasting models that is estimated using


historically (time-based) collected data of outcomes that are used to project the
expected outcome for the future.
• The time series model represents a situation where the forecasted value of the
future is assumed to be significantly driven by the past outcomes (although there
may be a part that is determined by the explanatory variable).
• In forecasting models using time series data, the objective is not so much as to
explain the reasons for a certain expected outcome, it is more about predicting the
future accurately.
Heteroscedasticity in Time Series Models

• In time series analysis, it is worthwhile to discuss the properties of the unexplained


part (Єt+1) of the model.
• This is normally assumed to be random (normally distributed) and uncorrelated
across successive observations, although in most practitioner settings, analysts
rarely check on the actual distribution of the errors to validate if the model is
correct (not that we would like to ratify such practice).
• If the error is positive from the mean, chances of the next error being positive are
high and vice versa are termed as heteroscedastic errors and are a violation of the
assumptions of the regression model.
• Technically, this distortion needs to be addressed to build a technically sound
model. The statistical properties of the estimate of coefficients in an OLS model are
invalid when the errors are correlated.
• A Durbin–Watson test is employed to ascertain whether the serial correlation in
the errors has been resolved.
Attempt the review questions and case studies at
the end of the chapter
****************

You might also like