|
|
||
Line 56: | Line 56: | ||
Assuming that the underlying distribution is symmetric Cabilio and Masaro showed that the distribution of ''S'' is asymptotically normal. <ref name=Cabilio1996>Cabilio P, Masaro J (1996) A simple test of symmetry about an unknown median. Can J Stat-Revue Canadienne De Statistique 24:349–361</ref> The asymptotic variance depends on the underlying distribution: for the normal distribution the asymptotic variance (''S''√''n'') is 0.5708. |
Assuming that the underlying distribution is symmetric Cabilio and Masaro showed that the distribution of ''S'' is asymptotically normal. <ref name=Cabilio1996>Cabilio P, Masaro J (1996) A simple test of symmetry about an unknown median. Can J Stat-Revue Canadienne De Statistique 24:349–361</ref> The asymptotic variance depends on the underlying distribution: for the normal distribution the asymptotic variance (''S''√''n'') is 0.5708. |
||
Considering the distribution of values above and below the median, Zheng and Gastwirth have argued that<ref name=Zheng2010>Zheng T, Gastwirth J (2010) On bootstrap tests of symmetry about an unknown median. J Data Sci 8(3): 413–427</ref> |
|||
: <math> \sqrt{2 n} \frac{ ( Mean - Median ) }{ \sigma } </math> |
|||
where ''σ'' is the standard deviation, is distributed as a two sample [[t distribution]]. |
|||
==Related statistics== |
==Related statistics== |
Instatistics and probability theory, the nonparametric skew is a statistic occasionally used with random variables that take real values.[1] Its calculation does not require any knowledge of the underlying distribution – hence the name nonparametric.
Although its use has been recommended in older textbooks[2][3] it appears to have gone out of fashion. It has been shown to be less powerful[clarification needed] than the usual measures of skewness.[4]
It is defined as
where the mean (µ), median and standard deviation (σ) of the sample or population have their usual meanings.
It is one third of the Pearson 2 skewness coefficient and lies between −1 and +1 for any distribution.[5][6] This can also be derived from the fact that the mean lies within one standard deviation of any median.[7] These bounds were improved upon by Majindar[8] who showed that the absolute value of this statistic is bounded by
with
where X is a random variable with finite variance, E is the expectation operator and P is the probability of the event occurring.
When p = q = 0.5 the absolute value of this statistic is bounded by 1. With p = 0.1 and p = 0.01, the statistic's absolute value is bounded by 0.6 and 0.199 respectively.
It has been shown that
where xq is the qth quartile.[7] This statistic has also been extended to distributions with infinite means.[9]
For symmetrical distributions including the Normal distribution its expected value is 0.
It is positive for right skewed distributions and negative for left skewed distributions. Absolute values ≥ 0.2 indicate marked skewness.
A table of the values of the Pearson 2 skewness coefficient is available but gives only the 90% limits for samples sizes between 10 and 100 from a normal distribution.[4]
Assuming that the underlying distribution is symmetric Cabilio and Masaro showed that the distribution of S is asymptotically normal. [10] The asymptotic variance depends on the underlying distribution: for the normal distribution the asymptotic variance (S√n) is 0.5708.
Considering the distribution of values above and below the median, Zheng and Gastwirth have argued that[11]
where σ is the standard deviation, is distributed as a two sample t distribution.
Mira studied the distribution of the difference between the mean and the median.[12]
If the underlying distribution is normal γ1 itself is asymptotically normal. This is a statistic that had been earlier suggested by Bonferroni.[13]
Assuming a symmetric underlying distribution, a modification of S was studied by Miao, Gel and Gastwirth who used a modification of the standard deviation to create their statistic.[14]
where Xi are the sample values, || is the absolute value and the sum is taken over all n sample values.
The test statistic was
The variance (T√n) is asymptotically normal with a mean of zero. The value of the variance depends on the underlying distribution: for the normal distribution T√n = 0.5708 and for the t distribution with three degrees of freedom T√n = 0.9689.
In 1895 Pearson first suggested suggested measuring skewness by standardizing the difference between the mean and the mode.
Estimates of the population mode from the sample data may be difficult but the difference between the mean and the mode for many distributions is approximately three times the difference between the mean and the median[15] which suggested to Pearson a second skewness coefficient:
Bowley dropped the factor 3 is from this formula in 1901 leading to the nonparametric skew statistic.
The rule that the difference between the mean and the mode being three times that between the mean and the median is due to Pearson who discovered it while investigating his Type 3 distributions. It is often applied to slightly non symmetric distributions that resemble a normal distribution but it is not always true. In general the mode, median and mean may appear in any order.[16][17]
It is however known that for unimodal distributions, the mode always lies within standard deviations of the mean.
A simple example illustrating this is the binomial distribution with n = 10 and p = 0.09.[18] The mean (0.9) is to the left of the median (1) but the skew (0.906) as defined by the third standardized moment is positive. The distribution when plotted has a long right tail.
Van Zwet derived an inequality which provides sufficient conditions for this inequality to hold.
The inequality
holds if
for all x, where F is the cumulative distribution function of the distribution.[19]
In 1964 van Zwet introduced a series of axioms for ordering measures of skewness.[20] The nonparametric skew does not satisfy these axioms.