Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 The single-sample test  



1.1  Basic test statistic  







2 Tests for families of distributions  



2.1  Test for normality  





2.2  Tests for other distributions  







3 Non-parametric k-sample tests  





4 See also  





5 References  





6 Further reading  





7 External links  














AndersonDarling test






Català
Deutsch
Español
Français
Italiano

Polski
Русский
Türkçe
Українська
 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


The Anderson–Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, in which case the test and its set of critical values is distribution-free. However, the test is most often used in contexts where a family of distributions is being tested, in which case the parameters of that family need to be estimated and account must be taken of this in adjusting either the test-statistic or its critical values. When applied to testing whether a normal distribution adequately describes a set of data, it is one of the most powerful statistical tools for detecting most departures from normality.[1][2] K-sample Anderson–Darling tests are available for testing whether several collections of observations can be modelled as coming from a single population, where the distribution function does not have to be specified.

In addition to its use as a test of fit for distributions, it can be used in parameter estimation as the basis for a form of minimum distance estimation procedure.

The test is named after Theodore Wilbur Anderson (1918–2016) and Donald A. Darling (1915–2014), who invented it in 1952.[3]

The single-sample test[edit]

The Anderson–Darling and Cramér–von Mises statistics belong to the class of quadratic EDF statistics (tests based on the empirical distribution function).[2] If the hypothesized distribution is , and empirical (sample) cumulative distribution function is , then the quadratic EDF statistics measure the distance between and by

where is the number of elements in the sample, and is a weighting function. When the weighting function is , the statistic is the Cramér–von Mises statistic. The Anderson–Darling (1954) test[4] is based on the distance

which is obtained when the weight function is . Thus, compared with the Cramér–von Mises distance, the Anderson–Darling distance places more weight on observations in the tails of the distribution.

Basic test statistic[edit]

The Anderson–Darling test assesses whether a sample comes from a specified distribution. It makes use of the fact that, when given a hypothesized underlying distribution and assuming the data does arise from this distribution, the cumulative distribution function (CDF) of the data can be assumed to follow a uniform distribution. The data can be then tested for uniformity with a distance test (Shapiro 1980). The formula for the test statistic to assess if data (note that the data must be put in order) comes from a CDF is

where

The test statistic can then be compared against the critical values of the theoretical distribution. In this case, no parameters are estimated in relation to the cumulative distribution function .

Tests for families of distributions[edit]

Essentially the same test statistic can be used in the test of fit of a family of distributions, but then it must be compared against the critical values appropriate to that family of theoretical distributions and dependent also on the method used for parameter estimation.

Test for normality[edit]

Empirical testing has found[5] that the Anderson–Darling test is not quite as good as the Shapiro–Wilk test, but is better than other tests. Stephens[1] found to be one of the best empirical distribution function statistics for detecting most departures from normality.

The computation differs based on what is known about the distribution:[6]

The n observations, , for , of the variable must be sorted such that and the notation in the following assumes that Xi represent the ordered observations. Let

The values are standardized to create new values , given by

With the standard normal CDF , is calculated using

An alternative expression in which only a single observation is dealt with at each step of the summation is:

A modified statistic can be calculated using

Ifor exceeds a given critical value, then the hypothesis of normality is rejected with some significance level. The critical values are given in the table below for values of .[1] [7]

Note 1: If = 0 or any (0 or 1) then cannot be calculated and is undefined.

Note 2: The above adjustment formula is taken from Shorack & Wellner (1986, p239). Care is required in comparisons across different sources as often the specific adjustment formula is not stated.

Note 3: Stephens[1] notes that the test becomes better when the parameters are computed from the data, even if they are known.

Note 4: Marsaglia & Marsaglia[7] provide a more accurate result for Case 0 at 85% and 99%.

Case n 15% 10% 5% 2.5% 1%
0 ≥ 5 1.621 1.933 2.492 3.070 3.878
1 0.908 1.105 1.304 1.573
2 ≥ 5 1.760 2.323 2.904 3.690
3 10 0.514 0.578 0.683 0.779 0.926
20 0.528 0.591 0.704 0.815 0.969
50 0.546 0.616 0.735 0.861 1.021
100 0.559 0.631 0.754 0.884 1.047
0.576 0.656 0.787 0.918 1.092

Alternatively, for case 3 above (both mean and variance unknown), D'Agostino (1986) [6] in Table 4.7 on p. 123 and on pages 372–373 gives the adjusted statistic:

and normality is rejected if exceeds 0.631, 0.754, 0.884, 1.047, or 1.159 at 10%, 5%, 2.5%, 1%, and 0.5% significance levels, respectively; the procedure is valid for sample size at least n=8. The formulas for computing the p-values for other values of are given in Table 4.9 on p. 127 in the same book.

Tests for other distributions[edit]

Above, it was assumed that the variable was being tested for normal distribution. Any other family of distributions can be tested but the test for each family is implemented by using a different modification of the basic test statistic and this is referred to critical values specific to that family of distributions. The modifications of the statistic and tables of critical values are given by Stephens (1986)[2] for the exponential, extreme-value, Weibull, gamma, logistic, Cauchy, and von Mises distributions. Tests for the (two-parameter) log-normal distribution can be implemented by transforming the data using a logarithm and using the above test for normality. Details for the required modifications to the test statistic and for the critical values for the normal distribution and the exponential distribution have been published by Pearson & Hartley (1972, Table 54). Details for these distributions, with the addition of the Gumbel distribution, are also given by Shorack & Wellner (1986, p239). Details for the logistic distribution are given by Stephens (1979). A test for the (two parameter) Weibull distribution can be obtained by making use of the fact that the logarithm of a Weibull variate has a Gumbel distribution.

Non-parametric k-sample tests[edit]

Fritz Scholz and Michael A. Stephens (1987) discuss a test, based on the Anderson–Darling measure of agreement between distributions, for whether a number of random samples with possibly different sample sizes may have arisen from the same distribution, where this distribution is unspecified.[8] The R package kSamples and the Python package Scipy implements this rank test for comparing k samples among several other such rank tests.[9][10]

For samples the statistic can be computed as follows under the assumption that the distribution function of-th sample is continuous

where

See also[edit]

References[edit]

  1. ^ a b c d Stephens, M. A. (1974). "EDF Statistics for Goodness of Fit and Some Comparisons". Journal of the American Statistical Association. 69 (347): 730–737. doi:10.2307/2286009. JSTOR 2286009.
  • ^ a b c M. A. Stephens (1986). "Tests Based on EDF Statistics". In D'Agostino, R. B.; Stephens, M. A. (eds.). Goodness-of-Fit Techniques. New York: Marcel Dekker. ISBN 0-8247-7487-6.
  • ^ Anderson, T. W.; Darling, D. A. (1952). "Asymptotic theory of certain "goodness-of-fit" criteria based on stochastic processes". Annals of Mathematical Statistics. 23 (2): 193–212. doi:10.1214/aoms/1177729437.
  • ^ Anderson, T.W.; Darling, D.A. (1954). "A Test of Goodness-of-Fit". Journal of the American Statistical Association. 49 (268): 765–769. doi:10.2307/2281537. JSTOR 2281537.
  • ^ Razali, Nornadiah; Wah, Yap Bee (2011). "Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and Anderson–Darling tests". Journal of Statistical Modeling and Analytics. 2 (1): 21–33.
  • ^ a b Ralph B. D'Agostino (1986). "Tests for the Normal Distribution". In D'Agostino, R.B.; Stephens, M.A. (eds.). Goodness-of-Fit Techniques. New York: Marcel Dekker. ISBN 0-8247-7487-6.
  • ^ a b Marsaglia, G. (2004). "Evaluating the Anderson-Darling Distribution". Journal of Statistical Software. 9 (2): 730–737. CiteSeerX 10.1.1.686.1363. doi:10.18637/jss.v009.i02.
  • ^ a b Scholz, F. W.; Stephens, M. A. (1987). "K-sample Anderson–Darling Tests". Journal of the American Statistical Association. 82 (399): 918–924. doi:10.1080/01621459.1987.10478517.
  • ^ "kSamples: K-Sample Rank Tests and their Combinations". R Project.
  • ^ "The Anderson-Darling test for k-samples. Scipy package".
  • Further reading[edit]

    External links[edit]


    Retrieved from "https://en.wikipedia.org/w/index.php?title=Anderson–Darling_test&oldid=1210085142"

    Categories: 
    Nonparametric statistics
    Normality tests
    Hidden categories: 
    Articles with short description
    Short description is different from Wikidata
     



    This page was last edited on 24 February 2024, at 23:04 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki