Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Uses  



1.1  Increase power  





1.2  Adjusting preexisting differences  







2 Assumptions  



2.1  Assumption 1: linearity of regression  





2.2  Assumption 2: homogeneity of error variances  





2.3  Assumption 3: independence of error terms  





2.4  Assumption 4: normality of error terms  





2.5  Assumption 5: homogeneity of regression slopes  







3 Conducting an ANCOVA  



3.1  Test multicollinearity  





3.2  Test the homogeneity of variance assumption  





3.3  Test the homogeneity of regression slopes assumption  





3.4  Run ANCOVA analysis  





3.5  Follow-up analyses  







4 Power considerations  





5 See also  





6 References  





7 External links  














Analysis of covariance






Català
Deutsch
Español
فارسی
Français
Magyar

Polski
Português
Українська

 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


Analysis of covariance (ANCOVA) is a general linear model that blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of one or more categorical independent variables (IV) and across one or more continuous variables. For example, the categorical variable(s) might describe treatment and the continuous variable(s) might be covariates or nuisance variables; or vice versa. Mathematically, ANCOVA decomposes the variance in the DV into variance explained by the CV(s), variance explained by the categorical IV, and residual variance. Intuitively, ANCOVA can be thought of as 'adjusting' the DV by the group means of the CV(s).[1]

The ANCOVA model assumes a linear relationship between the response (DV) and covariate (CV):

In this equation, the DV, is the jth observation under the ith categorical group; the CV, is the jth observation of the covariate under the ith group. Variables in the model that are derived from the observed data are (the grand mean) and (the global mean for covariate ). The variables to be fitted are (the effect of the ith level of the categorical IV), (the slope of the line) and (the associated unobserved error term for the jth observation in the ith group).

Under this specification, the categorical treatment effects sum to zero The standard assumptions of the linear regression model are also assumed to hold, as discussed below.[2]

Uses[edit]

Increase power[edit]

ANCOVA can be used to increase statistical power (the probability a significant difference is found between groups when one exists) by reducing the within-group error variance.[3] In order to understand this, it is necessary to understand the test used to evaluate differences between groups, the F-test. The F-test is computed by dividing the explained variance between groups (e.g., medical recovery differences) by the unexplained variance within the groups. Thus,

If this value is larger than a critical value, we conclude that there is a significant difference between groups. Unexplained variance includes error variance (e.g., individual differences), as well as the influence of other factors. Therefore, the influence of CVs is grouped in the denominator. When we control for the effect of CVs on the DV, we remove it from the denominator making F larger, thereby increasing our power to find a significant effect if one exists at all.

Partitioning variance
Partitioning variance

Adjusting preexisting differences[edit]

Another use of ANCOVA is to adjust for preexisting differences in nonequivalent (intact) groups. This controversial application aims at correcting for initial group differences (prior to group assignment) that exists on DV among several intact groups. In this situation, participants cannot be made equal through random assignment, so CVs are used to adjust scores and make participants more similar than without the CV. However, even with the use of covariates, there are no statistical techniques that can equate unequal groups. Furthermore, the CV may be so intimately related to the categorical IV that removing the variance on the DV associated with the CV would remove considerable variance on the DV, rendering the results meaningless.[4]

Assumptions[edit]

There are several key assumptions that underlie the use of ANCOVA and affect interpretation of the results.[2] The standard linear regression assumptions hold; further we assume that the slope of the covariate is equal across all treatment groups (homogeneity of regression slopes).

Assumption 1: linearity of regression[edit]

The regression relationship between the dependent variable and concomitant variables must be linear.

Assumption 2: homogeneity of error variances[edit]

The error is a random variable with conditional zero mean and equal variances for different treatment classes and observations.

Assumption 3: independence of error terms[edit]

The errors are uncorrelated. That is, the error covariance matrix is diagonal.

Assumption 4: normality of error terms[edit]

The residuals (error terms) should be normally distributed ~ .

Assumption 5: homogeneity of regression slopes[edit]

The slopes of the different regression lines should be equivalent, i.e., regression lines should be parallel among groups.

The fifth issue, concerning the homogeneity of different treatment regression slopes is particularly important in evaluating the appropriateness of ANCOVA model. Also note that we only need the error terms to be normally distributed. In fact both the independent variable and the concomitant variables will not be normally distributed in most cases.

Conducting an ANCOVA[edit]

Test multicollinearity[edit]

If a CV is highly related to another CV (at a correlation of 0.5 or more), then it will not adjust the DV over and above the other CV. One or the other should be removed since they are statistically redundant.

Test the homogeneity of variance assumption[edit]

Tested by Levene's test of equality of error variances. This is most important after adjustments have been made, but if you have it before adjustment you are likely to have it afterwards.

Test the homogeneity of regression slopes assumption[edit]

To see if the CV significantly interacts with the categorical IV, run an ANCOVA model including both the IV and the CVxIV interaction term. If the CVxIV interaction is significant, ANCOVA should not be performed. Instead, Green & Salkind[5] suggest assessing group differences on the DV at particular levels of the CV. Also consider using a moderated regression analysis, treating the CV and its interaction as another IV. Alternatively, one could use mediation analyses to determine if the CV accounts for the IV's effect on the DV[citation needed].

Run ANCOVA analysis[edit]

If the CV×IV interaction is not significant, rerun the ANCOVA without the CV×IV interaction term. In this analysis, you need to use the adjusted means and adjusted MSerror. The adjusted means (also referred to as least squares means, LS means, estimated marginal means, or EMM) refer to the group means after controlling for the influence of the CV on the DV.

Simple main effects plot showing a small Interaction between the two levels of the independent variable.

Follow-up analyses[edit]

If there was a significant main effect, it means that there is a significant difference between the levels of one categorical IV, ignoring all other factors.[6] To find exactly which levels are significantly different from one another, one can use the same follow-up tests as for the ANOVA. If there are two or more IVs, there may be a significant interaction, which means that the effect of one IV on the DV changes depending on the level of another factor. One can investigate the simple main effects using the same methods as in a factorial ANOVA.

Power considerations[edit]

While the inclusion of a covariate into an ANOVA generally increases statistical power by accounting for some of the variance in the dependent variable and thus increasing the ratio of variance explained by the independent variables, adding a covariate into ANOVA also reduces the degrees of freedom. Accordingly, adding a covariate which accounts for very little variance in the dependent variable might actually reduce power.

See also[edit]

References[edit]

  1. ^ Keppel, G. (1991). Design and analysis: A researcher's handbook (3rd ed.). Englewood Cliffs: Prentice-Hall, Inc.
  • ^ a b Montgomery, Douglas C. "Design and analysis of experiments" (8th Ed.). John Wiley & Sons, 2012.
  • ^ Tabachnick, B. G.; Fidell, L. S. (2007). Using Multivariate Statistics (5th ed.). Boston: Pearson Education.
  • ^ Miller, G. A.; Chapman, J. P. (2001). "Misunderstanding Analysis of Covariance". Journal of Abnormal Psychology. 110 (1): 40–48. doi:10.1037/0021-843X.110.1.40. PMID 11261398.
  • ^ Green, S. B., & Salkind, N. J. (2011). Using SPSS for Windows and Macintosh: Analyzing and Understanding Data (6th ed.). Upper Saddle River, NJ: Prentice Hall.
  • ^ Howell, D. C. (2009) Statistical methods for psychology (7th ed.). Belmont: Cengage Wadsworth.
  • External links[edit]

  • Index
  • Generalized/power
  • Geometric
  • Harmonic
  • Heronian
  • Heinz
  • Lehmer
  • Median
  • Mode
  • Dispersion

  • Coefficient of variation
  • Interquartile range
  • Percentile
  • Range
  • Standard deviation
  • Variance
  • Shape

  • Moments
  • Count data

    Summary tables

  • Frequency distribution
  • Grouped data
  • Dependence

  • Pearson product-moment correlation
  • Rank correlation
  • Scatter plot
  • Graphics

  • Biplot
  • Box plot
  • Control chart
  • Correlogram
  • Fan chart
  • Forest plot
  • Histogram
  • Pie chart
  • Q–Q plot
  • Radar chart
  • Run chart
  • Scatter plot
  • Stem-and-leaf display
  • Violin plot
  • Missing data
  • Optimal design
  • Population
  • Replication
  • Sample size determination
  • Statistic
  • Statistical power
  • Survey methodology

  • Stratified
  • Opinion poll
  • Questionnaire
  • Standard error
  • Controlled experiments

  • Factorial experiment
  • Interaction
  • Random assignment
  • Randomized controlled trial
  • Randomized experiment
  • Scientific control
  • Adaptive designs

  • Stochastic approximation
  • Up-and-down designs
  • Observational studies

  • Cross-sectional study
  • Natural experiment
  • Quasi-experiment
  • Statistic
  • Probability distribution
  • Sampling distribution
  • Empirical distribution
  • Statistical model
  • Parameter
  • Parametric family
  • Completeness
  • Sufficiency
  • Statistical functional
  • Optimal decision
  • Efficiency
  • Statistical distance
  • Asymptotics
  • Robustness
  • Frequentist inference

    Point estimation

  • Method of moments
  • M-estimator
  • Minimum distance
  • Unbiased estimators
  • Plug-in
  • Interval estimation

  • Pivot
  • Likelihood interval
  • Prediction interval
  • Tolerance interval
  • Resampling
  • Testing hypotheses

  • Power
  • Permutation test
  • Multiple comparisons
  • Parametric tests

  • Score/Lagrange multiplier
  • Wald
  • Specific tests

  • Student's t-test
  • F-test
  • Goodness of fit

  • G-test
  • Kolmogorov–Smirnov
  • Anderson–Darling
  • Lilliefors
  • Jarque–Bera
  • Normality (Shapiro–Wilk)
  • Likelihood-ratio test
  • Model selection
  • Rank statistics

  • Signed rank (Wilcoxon)
  • Rank sum (Mann–Whitney)
  • Nonparametric anova
  • Van der Waerden test
  • Bayesian inference

  • posterior
  • Credible interval
  • Bayes factor
  • Bayesian estimator
  • Partial correlation
  • Confounding variable
  • Coefficient of determination
  • Regression analysis

  • Regression validation
  • Mixed effects models
  • Simultaneous equations models
  • Multivariate adaptive regression splines (MARS)
  • Linear regression

  • Ordinary least squares
  • General linear model
  • Bayesian regression
  • Non-standard predictors

  • Nonparametric
  • Semiparametric
  • Isotonic
  • Robust
  • Heteroscedasticity
  • Homoscedasticity
  • Generalized linear model

  • Logistic (Bernoulli) / Binomial / Poisson regressions
  • Partition of variance

  • Analysis of covariance
  • Multivariate ANOVA
  • Degrees of freedom
  • Contingency table
  • Graphical model
  • Log-linear model
  • McNemar's test
  • Cochran–Mantel–Haenszel statistics
  • Multivariate

  • Manova
  • Principal components
  • Canonical correlation
  • Discriminant analysis
  • Cluster analysis
  • Classification
  • Structural equation model
  • Multivariate distributions
  • Time-series

    General

  • Trend
  • Stationarity
  • Seasonal adjustment
  • Exponential smoothing
  • Cointegration
  • Structural break
  • Granger causality
  • Specific tests

  • Johansen
  • Q-statistic (Ljung–Box)
  • Durbin–Watson
  • Breusch–Godfrey
  • Time domain

  • Cross-correlation (XCF)
  • ARMA model
  • ARIMA model (Box–Jenkins)
  • Autoregressive conditional heteroskedasticity (ARCH)
  • Vector autoregression (VAR)
  • Frequency domain

  • Fourier analysis
  • Least-squares spectral analysis
  • Wavelet
  • Whittle likelihood
  • Survival

    Survival function

  • Proportional hazards models
  • Accelerated failure time (AFT) model
  • First hitting time
  • Hazard function

    Test

  • Clinical trials / studies
  • Epidemiology
  • Medical statistics
  • Engineering statistics

  • Methods engineering
  • Probabilistic design
  • Process / quality control
  • Reliability
  • System identification
  • Social statistics

  • Census
  • Crime statistics
  • Demography
  • Econometrics
  • Jurimetrics
  • National accounts
  • Official statistics
  • Population statistics
  • Psychometrics
  • Spatial statistics

  • Environmental statistics
  • Geographic information system
  • Geostatistics
  • Kriging
  • icon Mathematics portal
  • Commons
  • WikiProject
  • Scientific
    method

  • Statistical design
  • Control
  • Internal and external validity
  • Experimental unit
  • Blinding
  • Treatment
    and blocking

  • Effect size
  • Contrast
  • Interaction
  • Confounding
  • Orthogonality
  • Blocking
  • Covariate
  • Nuisance variable
  • Models
    and inference

  • Ordinary least squares
  • Bayesian
  • Designs

    Completely
    randomized

  • Fractional factorial
  • Plackett–Burman
  • Taguchi
  • Category
  • icon Mathematics portal
  • Statistical outline
  • Statistical topics
  • Computational statistics

  • Linear least squares
  • Non-linear least squares
  • Iteratively reweighted least squares
  • Correlation and dependence

  • Rank correlation (Spearman's rho
  • Kendall's tau)
  • Partial correlation
  • Confounding variable
  • Regression analysis

  • Partial least squares
  • Total least squares
  • Ridge regression
  • Regression as a
    statistical model

    Linear regression

  • Ordinary least squares
  • Generalized least squares
  • Weighted least squares
  • General linear model
  • Predictor structure

  • Growth curve (statistics)
  • Segmented regression
  • Local regression
  • Non-standard

  • Nonparametric
  • Semiparametric
  • Robust
  • Quantile
  • Isotonic
  • Non-normal errors

  • Binomial
  • Poisson
  • Logistic
  • Decomposition of variance

  • Analysis of covariance
  • Multivariate AOV
  • Model exploration

  • Model selection
  • Model specification
  • Regression validation
  • Background

  • Gauss–Markov theorem
  • Errors and residuals
  • Goodness of fit
  • Studentized residual
  • Minimum mean-square error
  • Frisch–Waugh–Lovell theorem
  • Design of experiments

  • Optimal design
  • Bayesian design
  • Numerical approximation

  • Approximation theory
  • Numerical integration
  • Gaussian quadrature
  • Orthogonal polynomials
  • Chebyshev polynomials
  • Chebyshev nodes
  • Applications

  • Calibration curve
  • Numerical smoothing and differentiation
  • System identification
  • Moving least squares
  • Statistics category
  • icon Mathematics portal
  • Statistics outline
  • Statistics topics

  • Retrieved from "https://en.wikipedia.org/w/index.php?title=Analysis_of_covariance&oldid=1190854507"

    Categories: 
    Analysis of variance
    Covariance and correlation
    Hidden categories: 
    Articles with short description
    Short description is different from Wikidata
    All articles with unsourced statements
    Articles with unsourced statements from December 2022
     



    This page was last edited on 20 December 2023, at 06:43 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki