In the parametric scenario, we’re dealing with multiple independent samples or groups of data and aiming to detect significant differences among them. For instance, let’s consider comparing four different types of maize varieties, assuming their yields follow a normal distribution. However, it’s worth noting that maize yields often have a skewed distribution, typical of many biological phenomena, which is why we opted for a non-parametric test in our previous small dataset analysis.

Initially, one might consider conducting t-tests for every possible pair of comparisons. Yet, this approach is risky because it involves multiple testing, essentially reusing the same data. If, hypothetically, we performed 100 tests and the null hypothesis (no difference) held true each time, we’d still expect to find 5 significant differences by sheer chance, given a significance level of 5%. This implies that 5% of studies in literature reporting significance at the exact 5% level may not actually reflect genuine differences.

To mitigate this issue, similar to our previous approach, we adopt a strategy of conducting an overall test at a predetermined significance level, such as 5%. This overall test assesses whether all the samples originate from the same population or if two or more varieties differ. It primarily focuses on differences in means, assuming comparable variances among populations. If the overall test indicates significance, suggesting that at least two groups differ, we then proceed with follow-up tests (multiple comparison tests) at a stricter significance level. These follow-up tests help pinpoint where the differences among groups lie.

Comparison of several groups on one factor

In terms of terminology, when we’re analyzing data involving multiple groups, we typically aim to compare responses across different levels of a factor, treating each group as receiving a distinct treatment.

Here are some examples:

  • Evaluating respondents’ ratings for five distinct packagings of a product (e.g., varying sizes or colors of boxes for detergent, different shapes or hues of spray deodorant cans, various options on financial products like bank accounts or medical plans). Here, the response is the rating, the factor is packaging, and the levels are the five packaging types.

  • Comparing the return on investments for four types of shares (e.g., mining, industrial) at a specific point in time. Each data point consists of the return on investment and the type of share.

  • Investigating the effect of three strengths of hormones plus a control mixture on salt excretion from ducks’ eye ducts. The factor is hormone, with four levels (three strengths plus control), and the response is the amount of salt excreted.

Further examples include:

  • Analyzing percentage changes in share prices based on market conditions.
  • Comparing ownership of durable goods (like refrigerators or cars) among different age groups or life stages.
  • Assessing attitudes toward brand equity among different managerial roles.
  • Examining digestive efficiencies of various foods in a species.
  • Testing DDT resistance in mosquito colonies.
  • Investigating oxygen consumption in bats at different temperatures.
  • Comparing hospitalization duration for children with pneumonia based on HIV status.
  • Analyzing hemoglobin levels in pregnant women under different anaemia treatments.

In parametric analysis, the overall test for these scenarios is known as Analysis of Variance (ANOVA). ANOVA assesses variability between groups (or levels of the factor) compared to variability within groups (or samples) to determine if there’s a significant difference. Essentially, it checks if the differences between treatment group means exceed the overall random variability within those groups, using a pooled estimate of variance derived from individual group variability estimates.

 

To conduct the test, we first need to establish the mean sum of squares for the factor (denoted as MSA) and for the error (MSE). Let’s designate the factor as A, considering that we might later deal with multiple factors. The formula for MSA is:

Here, represents the number of groups, signifies the number of cases in the th group, denotes the mean of the th group, and is the overall mean. Essentially, MSA reflects the variance of differences between each factor level’s mean and the overall mean. The extra complexity involving niaccounts for potential differences in sample sizes among groups. Next, the mean sum of squares for the error, MSE, is defined as:

Here,si2represents the sample variance for the th group. Similar to MSA, MSE indicates the pooled variance, accounting for varying sample sizes. If you ever compute this manually, note that the denominator simplifies to , as the sum of sample sizes across groups equals the total sample size.

The null hypothesis asserts no difference between the groups, while the alternative hypothesis suggests otherwise. In the context of comparing multiple factor levels, the alternative hypothesis is always two-sided, as it’s impractical to predict where differences might exist beforehand.

It’s established that the ratio of MSA to MSE follows an F distribution. This distribution is more intricate than those encountered previously, with two sets of degrees of freedom (one for MSA and one for MSE), defined as and, respectively. The null hypothesis is rejected if the p-value falls below the specified significance level (or if the test statistic F exceeds the critical value). Typically, we solely consider the p-value to determine whether to reject the null hypothesis or not, while understanding the degrees of freedom aids in interpreting research literature, where F values and associated degrees of freedom are often reported.

Multiple Comparisons

After conducting the overall test to determine if there’s a difference between the means of different groups, the next step is to identify where these differences lie, if they exist. However, the overall test alone doesn’t provide this information. In cases where the overall test indicates a significant difference, multiple comparison tests can be employed to pinpoint specific group differences. These tests adjust the significance level to account for the increased likelihood of making a type I error due to conducting multiple tests.

There are several multiple comparison procedures available, each suitable for specific scenarios. Here, we’ll discuss three widely used procedures. In terms of confidence intervals, these procedures involve calculating the difference between means, plus and minus a constant times the estimate of the standard error:

 

The difference lies in the choice of the constant , which is determined to ensure that the overall chance of a type I error does not exceed a specified level . For example, in the case of two independent populations, is calculated using the formula: For a factor with only two levels, the formula simplifies to:

This formula aligns with the two-sample t-test discussed in section 2.8, as the two-sample t-test is a special case of ANOVA with only two independent samples.

For more than two groups, various methods of choosing

exist, depending on different assumptions. These assumptions primarily concern the type of comparisons to be made. Typically, all pair-wise comparisons are tested. For instance, with a factor having three levels, one tests if level 1 differs from level 2, level 2 differs from level 3, and if level 1 differs from level 3. The more comparisons made, the more stringent the testing level needs to be to maintain an overall significance level of no more than.

The three most common multiple comparison tests are:

SCHEFFE procedure:
G=KFk1,Nk,1a

where

 Fk1,Nk,1a

is the critical value of the F distribution with degrees of freedom 

K1 and Nk

 

BONFERRONI procedure:
G=t1a2k,NK

which uses a critical value from the t distribution, but with 

a2k divided by k the number of factor levels to be compared.

 

TUKEY procedure:
G=qk,Nk,1a2

 for qthe studentized range for k means based on 

Nk cases

These procedures generally assume the type of comparisons to be made and may be conservative (over-correcting to some extent) in many cases. SCHEFFE is the most conservative, followed by BONFERRONI, and TUKEY being the least conservative.

For Example 7.1, at the 5% significance level:

  • For SCHEFFE:
    G=3F3,20;0.95=3.049

 

  • For BONFERRONI:
    G=t0.8975,20=2.927

     

  • For TUKEY:
    G=q4,20,0.975=3.958

    Using these values in the confidence interval formulas, we obtain:

 

iJScheffeBonferroniTukey1259.657,0.3439.471,0.5299.275,0.72513711.657,2.34311.47,2.52911.275,7.0001404.418,4.4184.242,4.2424.056,4.0562326.165,2.1655.999,1.9995.824,1.8242451.104,8.8961.259,8.7411.423,8.5773473.104,10.8963.259,10.7413.423,10.577

Interpretation: If the confidence interval includes zero, the two treatments do not differ significantly. Treatments for which the confidence intervals do not include zero are marked by stars. Thus, it’s observed that diets 1 and 4 differ from diets 2 and 3, but diets 1 and 4 do not differ from each other, nor do diets 2 and 3 differ from each other.

Assumptions of the ANOVA test

The assumptions of the ANOVA test are based on the formal definition of the ANOVA model. The general one-way ANOVA model can be represented as:

yij=i+ij

Where:

  • yij represents the values of the response variable for the th replicate of the th group or factor level.
  • i is the population mean for the th level of the factor.
  • ij are the random error terms for the individual observations.

For example, in the diet example (blood coagulation times), where (number of diets) and ri represents the number of replicates for each diet, y2,5 would denote the fifth observation for the second diet, which is 65.

The objective of the ANOVA test is to compare the means of the different groups or factor levels. It tests the null hypothesis against the alternative hypothesis that at least two of the ‘s are different from each other.

The assumptions underlying the ANOVA test include:

  1. Independence: The data for each factor level are independent random samples from the relevant population.
  2. Normality: The data are normally distributed around the mean for each factor level. This assumption implies that the residuals (differences between observed and expected values) follow a normal distribution.

It’s worth noting that the ANOVA model can sometimes be written differently in other textbooks, but the underlying concept remains the same. For instance, some textbooks may write the model as , where represents the deviation from the overall mean . Both representations are mathematically equivalent.

Independence of observations

The independence assumption in ANOVA means that each sample within a factor level is independent of the others, and observations within each sample are also independent.

For example, let’s consider the maize example from section 6.7. We had samples of four different varieties of maize. Each variety was represented by a separate sample, with the number of plants varying (4, 5, 2, and 3 plants, respectively). These samples were independent of each other, and within each sample, the observations (such as yield measurements) were also independent.

Similarly, in the diet example from section 7.1, blood coagulation times were measured for animals fed different diets. Each diet represented a separate sample, with varying numbers of animals (4, 6, 6, and 8 animals, respectively). Again, these samples were independent of each other, and within each sample, the measurements of blood coagulation times were independent.

In terms of the ANOVA model, this independence assumption ensures that the random error terms () are independent, allowing for valid statistical inference.

Normality of the error term

The assumption of normality in ANOVA refers to the distribution of the random error terms () rather than the data itself (i.e., the response variable (). This assumption implies that the errors are normally distributed around the population means ().

It’s important to clarify that ANOVA does not assume the response variable () to be normally distributed. Instead, it assumes normality in the deviations of the observed values from the population means.

This assumption is often valid for many datasets. However, if the assumption of normality is not met, transformations of the data can sometimes be applied to approximate normality. For example, a log transformation is often used for biological data, while a square root transformation is suitable for count data like the number of insects. These transformations are discussed further in section 13.5.

Error variance the same for all groups (homoscedasticity)

The ANOVA test also assumes that the error variance () is constant across all groups. While it’s possible to relax this assumption, doing so can introduce complications. One common approach is to transform the data, aiming to make the error variances more similar across groups. However, it’s important to note that the variances don’t need to be identical; rather, they shouldn’t differ too much, ideally not by more than a factor of 2.

The F-test in ANOVA tends to be robust to unequal variances when the sample sizes across groups are roughly equal. However, unequal variances can significantly impact multiple comparison tests.

There are various methods to check this assumption. One is to calculate the standard deviations for each group and compare them. Another approach is to plot the standard deviations against the means, observing if there’s a pattern. Some statistical packages offer tests for the equality of variances, such as the Hartley and modified Levene tests.

Examining residuals is another way to assess equal error variances. Residuals are the deviations of the data from the model predictions. Under the null hypothesis and model assumptions, residuals should be approximately normally distributed and independent. Checking for random distribution in residuals can indicate whether the assumption of equal variances holds.

In Example 7.1, the residuals are ordered and plotted against the factor levels to assess the spread across groups. The plot shows the scatter of residuals for different diets, aiming to identify if the spread is similar across groups

The residual plot in Figure 7.1 provides a visual representation of the residuals for each factor level. It allows for the identification of outliers, which are points that deviate significantly from the others. In the example provided, a residual of 5 for treatment 2 stands out slightly, but it’s not a major concern given the presence of a similarly sized residual of -5 for treatment 4. However, it’s essential to re-check the correctness of the data points corresponding to these residuals. Keep in mind that some level of variability in residuals is expected, and minor differences should not be over-interpreted.

Outliers can heavily influence the analysis and lead to incorrect conclusions. Therefore, it’s crucial to investigate suspicious points, checking for data entry errors or unusual circumstances that may have caused them. Deleting outliers from the analysis can help assess their impact on conclusions. If the conclusions remain unchanged without the outliers, it suggests that they may not significantly affect the analysis. However, if the conclusions are substantially altered, further investigation is necessary to understand why.

It’s important to note that deleting points should not be done arbitrarily. Each point represents data that may have valuable information. Deleting points because they don’t fit the desired conclusions is inappropriate. Instead, focus on understanding why they differ and whether adjustments to the model or data are needed.

Analyzing data with fewer points may weaken conclusions, so caution is required. However, maintaining a healthy skepticism towards data quality is crucial to avoid drawing incorrect conclusions.

Transformations

Transformations play a crucial role in statistical analysis, especially when dealing with data that do not meet the assumptions required by statistical tests. One common scenario where transformations are employed is in the analysis of data using techniques like ANOVA (Analysis of Variance). ANOVA assumes that the error terms (εij) in the model are normally distributed and have constant variance across factor levels. However, when these assumptions are violated, transformations can be applied to the data to mitigate the issue.

A key tool in assessing whether transformation is necessary is the residual plot. Residual plots display the differences between observed data points and the values predicted by the statistical model. An ideal residual plot should exhibit random scatter around zero, indicating that the model adequately captures the variation in the data. However, patterns in the residual plot, such as non-constant variance or non-linearity, suggest that the model assumptions are not met and transformation may be warranted.

One common pattern observed in residual plots is a relationship between the variance of residuals and the mean response variable that is more or less constant across factor levels. This pattern is indicative of heteroscedasticity, where the spread of residuals varies systematically with the level of the independent variable. To address this issue, transformations can be applied to stabilize the variance and meet the assumption of constant variance across factor levels.

Several types of transformations can be utilized depending on the specific characteristics of the data. Some common transformations and their applications include:

  1. Logarithmic Transformation: This transformation involves taking the logarithm of the response variable. It is particularly useful for data with a fan-shaped pattern on residual plots, where the variance increases or decreases exponentially as the mean response variable changes. Logarithmic transformations help stabilize the variance, making it more constant across factor levels.

  2. Square Root Transformation: The square root transformation is applied by taking the square root of the response variable. It is suitable for count data, such as the number of occurrences of an event, where the variance tends to increase with the mean. Square root transformations help address the fan-shaped pattern on residual plots by stabilizing the variance.

  3. Reciprocal Transformation: The reciprocal transformation involves taking the reciprocal (or inverse) of the response variable. It is often used for data with a lens-shaped pattern on residual plots, where the variance decreases as the mean response variable increases. Reciprocal transformations help stabilize the variance and make it more constant across factor levels.

  4. Logit Transformation: The logit transformation is applied to proportions or probabilities by taking the natural logarithm of the odds ratio. It is useful for data with proportions that exhibit a very sharp fan pattern on residual plots. Logit transformations help spread out extreme values and stabilize the variance across factor levels.

No Change over time

Checking for changes over time is another critical aspect of data analysis, especially in ANOVA. Changes over time can introduce systematic biases into the data, potentially invalidating the conclusions drawn from the analysis. There are several factors that can lead to changes over time, including environmental factors, variations in experimental conditions, or external events. Some common sources of change over time include:

  • Seasonal Variations: Environmental factors such as weather or seasons can impact biological data significantly. For example, plant growth rates may vary with the changing seasons.

  • Observer or Operator Effects: Changes in personnel conducting experiments or observations can introduce variability into the data.

  • Changes in Materials or Equipment: Differences in the materials or equipment used in experiments, such as chemicals from different suppliers or new equipment, can affect the results.

  • Economic or Social Factors: Changes in economic conditions or social trends may influence consumer behavior or market research outcomes.

  • External Events: Events in the media or society, such as public figures’ deaths or major news events, can influence people’s behaviors or responses.

To assess whether changes over time are present in the data, residual plots can be constructed with the residuals plotted against time or the order of data collection. The index representing the order of observations serves as a proxy for time, as data are typically recorded sequentially. An ideal residual plot should exhibit random scatter around zero, indicating that there are no systematic changes over time.

Random scatter means that there is no discernible pattern in the plot. While points may not be evenly spread across the plot, there should be no clear trend or specific pattern. Any deviations from random scatter may indicate the presence of changes over time.

It’s essential to ensure that the order of observations for different treatments aligns with the order of data collection. If observations for certain treatments are collected at different times, any observed effects may be attributed to time rather than treatment effects. Residuals, obtained by removing estimated factor effects from the data, help identify changes over time independent of treatment effects.

While checking model assumptions in simple ANOVA models can often be done by examining the data, more complex models may require residual plots. Ensuring that model assumptions are met is crucial, as failure to do so can lead to erroneous conclusions or the overlooking of significant differences in the data. Proper validation of model assumptions enhances the reliability and validity of statistical analyses.

Summary of Key Points discussed regarding ANOVA

  1. ANOVA Model: ANOVA compares the means of multiple groups to determine if they are statistically different from each other. The general one-way ANOVA model can be expressed as , where represents the observed value for the

    -th replicate of the

    -th group, is the population mean for the

    -th group, and represents the random error term.

  2. Assumptions of ANOVA:

    • Independence: Data for each group are independent, and observations within each group are independent.
    • Normality: The error terms () are normally distributed around the group means.
    • Homogeneity of Variance: The variance of the error terms is the same across all groups.
  3. Checking Assumptions:

    • Independence: Ensure each group has a separate sample, and observations within each group are independent.
    • Normality: Assess normality of error terms by examining residuals or using statistical tests.
    • Homogeneity of Variance: Check for constant variance across groups using residual plots or statistical tests.
  4. Transformation: If data violate assumptions, transformations (e.g., log, square root) can be applied to approximate normality or stabilize variance.

  5. Residual Analysis: Residual plots help assess model assumptions, including normality, constant variance, and absence of patterns over time. Random scatter of residuals around zero is desired.

  6. Detecting Changes Over Time: Plotting residuals against time or the order of data collection helps identify systematic changes over time, such as seasonal variations or observer effects.

  7. Interpreting Residual Plots: Look for random scatter of residuals without any discernible patterns or trends. Deviations from random scatter may indicate violations of assumptions or changes over time.

  8. Impact of Violated Assumptions: Failure to meet assumptions can lead to erroneous conclusions or missed significant differences in the data. Proper validation of assumptions is critical for reliable statistical analyses.

In summary, ANOVA is a powerful tool for comparing means across multiple groups, but it’s essential to validate the assumptions of independence, normality, and homogeneity of variance. Residual analysis and checks for changes over time help ensure the reliability and validity of ANOVA results.

References

Confused! Click below to get a tutor
WhatsApp chat