Analysis of variance (ANOVA) is a statistical method for investigating data by comparing means of the subsets of data. “One way ANOVA” is the most basic type. I am going to explain about it in this blog.

Few statistical terms required for understanding ANOVA.

- Variance: it is a numerical value indicating how widely individuals in a group vary. If individual observations
**vary greatly**from the group mean, the**variance****is large**; and vice versa. - Null hypothesis: it’s a hypothesis which states that there is no
**statistically**significant relationship between two observed cases**or**between two set of populations.

Now, when and why to use ANOVA? It compares the means between the groups you are interested in and determines whether any of those means are significantly different from each other. Specifically, it tests the null hypothesis.

Where, *µ* –> group mean and *k* –> number of groups.

For example, a researcher wishes to know whether different pacing strategies affect the **time to complete** a marathon. Groups of volunteers are randomly assigned based on,

- starts slow and then increases their speed or,
- Starts fast and slow down or
- Runs at a steady pace throughout.

The time to complete the marathon is the outcome (**dependent**) variable which is determined using this test.

Basically, ANOVA is also used for prediction. Now a question arises as how **ANOVA differs from regression analysis**.

The difference is that regression is the statistical model that is used to predict a continuous outcome on the basis of one or more **continuous** predictor variables. Whereas, ANOVA is the statistical model that you use to predict a continuous outcome on the basis of one or more** categorical** predictor variables.

Implementing ANOVA using R.

**STEP1: **Create data; I randomly created numbers depicting stress levels among employees during announcement of layoff’s..!

Group1 <- c(2,3,7,2,6) àstress level during regular time

Group2 <- c(10,8,7,5,10) àstress when layoffs are announced.

Group3 <- c(10,13,14,13,15) àstress level after announcement of layoffs.

I am going to use ANOVA to check if there is any statistical relationship between these groups of employees.

Then, creating a data frame and inserting these groups of data into it

Combined_Groups <- data.frame(cbind(Group1, Group2, Group3))

**STEP2**: Stacking up the data and then executing this test

Stacked_Groups <- stack(Combined_Groups)

Anova_Results <- aov(values ~ ind, data = Stacked_Groups)

Command **aov** is used to conduct ANOVA test in R.

The summary of the test is as shown.

Summary(Anova_Results)

Interpreting output of aov:

- P-value: it gives the probability of accepting null hypothesis if it has a null value then null hypothesis is rejected and alternative is selected. In this case we reject the null hypothesis since p-value is too small.
- The F statistic is a ratio of 2 different measure of variance for the data. If the null hypothesis is true then these are both estimates of the same thing and the ratio will be around 1.
- The numerator is computed by measuring the variance of the means and if the true means of the groups are identical then this is a function of the overall variance of the data. But if the null hypothesis is false and the means are not all equal, then this measure of variance will be larger.
- The denominator is an average of the sample variances for each group, which is an estimate of the overall population variance (assuming all groups have equal variances).
**STEP4**: analyzing the output.The means of each group can be found by using model.tables(Anova_Results,”means”)

ANOVA will determine if there are significant differences between groups. Whereas another test named “tukey’s HSD” will determine **WHERE** those significant differences lay, allowing you to pinpoint what exact groups are actually significantly different from each other.

As said before, Purpose of this test is to determine where the significant difference lays.

- Group2 and group 1 have very low p value with which it can be inferred that there is a statistical significance between them
- In general,
**lower the P-value higher is the significant**relationship between the variables.

**Conclusion: **The test suggests that there is quiet a relationship between layoffs and stress levels of employees.

References: https://www.youtube.com/watch?v=6-4mWkOgDtg