Sitemap

ANOVA (One Way Vs. Two Way)

6 min readSep 3, 2018

What | Why | Types | Algorithm | Pros & Cons | Application

Introduction

Analysis of Variance, shortly known as ANOVA is an extremely important tool for analysis of data (both One Way and Two Way ANOVA is used). It is a statistical method to compare the population means of two or more groups by analyzing variance. The variance would differ only when the means are significantly different.

Table of Contents

  • Introduction
  • Why ANOVA instead of multiple t-tests?
  • Types of ANOVA
  • Assumptions
  • Algorithm
  • One-way ANOVA: the procedural overview
  • What is the use of ANOVA table?
  • ANOVA test using excel
  • Comparison between One-way and two-way ANOVA

It is a generalized method of t-test for more than 2 groups but is more conservative(results in less type 1 error) and hence suited to a wide range of practical applications.

“Classical” ANOVA for balanced data does three things at once:

  1. As exploratory data analysis, an ANOVA employs an additive data decomposition, and its sums of squares indicate the variance of each component of the decomposition (or, equivalently, each set of terms of a linear model).
  2. Comparisons of mean squares, along with an F-test… allow testing of a nested sequence of models.
  3. Closely related to the ANOVA is a linear model fit with coefficient estimates and standard errors.

In short, ANOVA is a statistical tool used in several ways to develop and confirm an explanation for the observed data.

Additionally:

  1. It is computationally elegant and relatively robust against violations of its assumptions.
  2. ANOVA provides strong (multiple sample comparison) statistical analysis.
  3. It has been adapted to the analysis of a variety of experimental designs.

Why ANOVA instead of multiple t-tests?

  • Before ANOVA, multiple t-tests was the only option available to compare population means of two or more groups.
  • As the number of groups increases, the number of two sample t-test also increases.
  • With increases in the number of t-tests, the probability of making the type 1 error also increases.

Types of ANOVA

  1. One-way ANOVA
  2. Two-way ANOVA

One-way ANOVA is a hypothesis test in which only one categorical variable or single factor is taken into consideration. With the help of F-distribution, it enables us to compare the means of three or more samples. The Null hypothesis (H0) is the equity in all population means while an Alternative hypothesis is a difference in at least one mean.

Two-way ANOVA examines the effect of two independent factors on a dependent variable. It also studies the inter-relationship between independent variables influencing the values of the dependent variable, if any.

For example, analyzing the test score of a class based on gender and age. Here test score is a dependent variable and gender and age are the independent variables. Two-way ANOVA can be used to find the relationship between these dependent and independent variables.

Assumptions

One-way ANOVA

Normal distribution of the population from which the samples are drawn.

  • Measurement of the dependent variable is an interval or ratio level.
  • Two or more than two categorical independent groups in an independent variable.
  • Independence of samples
  • Homogeneity of the variance of the population.

Two-way ANOVA

  • Normal distribution of the population from which the samples are drawn.
  • Measurement of dependent variable at continuous level.
  • Two or more than two categorical independent groups in two factors.
  • Categorical independent groups should have the same size.
  • Independence of observations
  • Homogeneity of the variance of the population.

Algorithm

  1. State null and alternative hypothesis
  2. State alpha
  3. Calculate the degree of freedom
  4. State decision rule
  5. Calculate test statistics
  • Calculate the variance between samples
  • Calculate variance within samples
  • Calculate F-statistic : if calculated F value > F table value, reject Ho
  • If F is significant, perform post hoc test.
  1. The test statistic of analysis of variance is:

Where r-1, n-r is the degree of freedom in numerator and denominator respectively.

One-way ANOVA: the procedural overview

Partitioning total sum of the square of variation

The total Sum of Squares (SST) is the sum of Treatment Sum of Squares (SSC) and Error Sum of Squares (SSE).

Sum of Square definitions

Computational formulae: One-way ANOVA

Analysis of variance:

Two-way ANOVA

Where SSR/ SST: Sum of square of Treatment in rows

SSC: Sum of the square between columns

MSV: The Mean sum of variance

Use of ANOVA table

ANOVA table shows the statistics use to test hypotheses about the population means. When the Null hypothesis of equal means is true, the two mean sum of square estimate the same quantity(error variance) and should be about of equal magnitude. In other words, their ratio should be close to 1. If the null hypothesis is false, MSC should be larger than MSE.

The objective of the F-test is to find out whether the two independent estimates of population variance differ significantly or whether the two samples may be regarded as drawn from the normal population. That’s why also known as Variance Ratio test. This technique is mostly used in the research process because for conclusion or interpretation of more than two variables are required.

Uses of ANOVA

  • To test the significance between the variance of two samples.
  • To test correlation and regression.
  • To study the homogeneity in case of two-way classification.
  • To test the significance of the multiple correlation coefficient.
  • To test the linearity of regression.
  • Interpretation of the significance of means and their interactions.

Advantages and Disadvantages

Advantages:

  • it is an improved technique over t-test and z-test.
  • Suitable for multidimensional variables.
  • Analysis of various factors at a time.
  • economical method of parametric testing.
  • Can be used in 3 or more than 3 groups.

Disadvantages:

  • it is difficult to analyze ANOVA under strict assumptions regarding the nature of data.
  • It is not so helpful in comparison with t-test that there is no special interpretation of the significance of two means.
  • The requirement of post-ANOVA t-test for further testing.

Applications

  • Recommendation of a fertilizer against others for the improvement of crop yield.
  • ANOVA has immensely useful practical applications in business, particularly Lean-Six Sigma/operational efficiency.
  • Comparing the gas mileage of different vehicles, or the same vehicle under different fuel types, or road types.
  • Understanding the impact of temperature, pressure or chemical concentration on some chemical reaction (power reactors, chemical plants, etc).
  • Understanding the impact of different catalysts on chemical reaction rates.
  • Studying whether advertisements of different kinds solicit different numbers of customer responses.
  • Understanding the performance, quality or speed of manufacturing processes based on the number of cells or steps they’re divided into.

For further reading, hit the link to full article with all the necessary examples included.

Do you share the same enthusiasm for Data Science, ML, Deep Learning and collaborative learning!! Go ahead and fill in your details here and we will add you as a writer on our Medium publication and StepUp Analytics. Happy writing!

And of course — don’t forget to spread the word around about our publication!.

Scale Up Your Skills with StepUp Analytics.

“Keep Learning, Keep Practicing”

--

--

StepUp Analytics
StepUp Analytics

Written by StepUp Analytics

StepUp Analytics is a Community of Creative, Highly Energetic Data Science and Analytics Professionals and Data Enthusiast.

No responses yet