Jump to content

Main menu Navigation ●Main page ●Contents ●Current events ●Random article ●About Wikipedia ●Contact us ●Donate Contribute ●Help ●Learn to edit ●Community portal ●Recent changes ●Upload file

●Create account ●Log in ●Create account ● Log in Pages for logged out editors learn more ●Contributions ●Talk

(Top) 1 Definition 1.1 Extensions 2 Values 2.1 Individual distribution values 2.2 Tables 2.3 Significance testing 3 Related statistics 4 History 5 Notes 5.1 van Zwet condition 5.2 Ordering of skewness 6 References

Nonparametric skew: Difference between revisions

Add links ●Article ●Talk ●Read ●Edit ●View history Tools Actions ●Read ●Edit ●View history General ●What links here ●Related changes ●Upload file ●Special pages ●Permanent link ●Page information ●Cite this page ●Get shortened URL ●Download QR code ●Wikidata item Print/export ●Download as PDF ●Printable version Print/export Appearance Help From Wikipedia, the free encyclopedia Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 09:36, 31 May 2012

Instatistics and probability theory, the nonparametric skew is a statistic occasionally used with random variables that take real values.^[1] Its calculation does not require any knowledge of the underlying distribution – hence the name nonparametric.

Although its use has been recommended in older textbooks^[2]^[3] it appears to have gone out of fashion. It has been shown to be less powerful^{[clarification needed]} than the usual measures of skewness.^[4]

Definition

It is defined as

S = (mean − median) / (standard deviation)

where the mean (µ), median and standard deviation (σ) of the sample or population have their usual meanings.

It is one third of the Pearson 2 skewness coefficient and lies between −1 and +1 for any distribution.^[5]^[6] This can also be derived from the fact that the mean lies within one standard deviation of any median.^[7] These bounds were improved upon by Majindar^[8] who showed that the absolute value of this statistic is bounded by

2(pq)^1/2 / ( p + q )^1/2

with

p = P{ X > E(X) } and q = P{ X < E(X) }

where X is a random variable with finite variance, E is the expectation operator and P is the probability of the event occurring.

When p = q = 0.5 the absolute value of this statistic is bounded by 1. With p = 0.1 and p = 0.01, the statistic's absolute value is bounded by 0.6 and 0.199 respectively.

Extensions

It has been shown that

| µ − x_q | / σ ≤ max{ [ ( 1 − q ) / q ]^1/2, [ q /( 1 − q ) ]^1/2 }

where x_q is the q^th quartile.^[7] This statistic has also been extended to distributions with infinite means.^[9]

Values

For symmetrical distributions including the Normal distribution its expected value is 0.

It is positive for right skewed distributions and negative for left skewed distributions. Absolute values ≥ 0.2 indicate marked skewness.

Individual distribution values

Normal distribution: 0
Laplace distribution: 0
Exponential distribution: 0.31
ExGaussian: between 0 and 0.31

Tables

A table of the values of the Pearson 2 skewness coefficient is available but gives only the 90% limits for samples sizes between 10 and 100 from a normal distribution.^[4]

Significance testing

Assuming that the underlying distribution is symmetric Cabilio and Masaro showed that the distribution of S is asymptotically normal. ^[10] The asymptotic variance depends on the underlying distribution: for the normal distribution the asymptotic variance (S√n) is 0.5708.

Considering the distribution of values above and below the median, Zheng and Gastwirth have argued that^[11]

{\sqrt {2n}}{\frac {(Mean-Median)}{\sigma }}

where σ is the standard deviation, is distributed as a two sample t distribution.

Related statistics

Mira studied the distribution of the difference between the mean and the median.^[12]

\gamma _{1}=2(Mean-Median)

If the underlying distribution is normal γ₁ itself is asymptotically normal. This is a statistic that had been earlier suggested by Bonferroni.^[13]

Assuming a symmetric underlying distribution, a modification of S was studied by Miao, Gel and Gastwirth who used a modification of the standard deviation to create their statistic.^[14]

J={\frac {1}{n}}{\sqrt {\frac {\pi }{2}}}\sum {|X_{i}-Median|}

where X_i are the sample values, || is the absolute value and the sum is taken over all n sample values.

The test statistic was

T={\frac {(Mean-Median)}{J}}

The variance (T√n) is asymptotically normal with a mean of zero. The value of the variance depends on the underlying distribution: for the normal distribution T√n = 0.5708 and for the t distribution with three degrees of freedom T√n = 0.9689.

History

In 1895 Pearson first suggested suggested measuring skewness by standardizing the difference between the mean and the mode.

(mean − mode) / (standard deviation)

Estimates of the population mode from the sample data may be difficult but the difference between the mean and the mode for many distributions is approximately three times the difference between the mean and the median^[15] which suggested to Pearson a second skewness coefficient:

3( mean − median ) / ( standard deviation )

Bowley dropped the factor 3 is from this formula in 1901 leading to the nonparametric skew statistic.

Notes

The rule that the difference between the mean and the mode being three times that between the mean and the median is due to Pearson who discovered it while investigating his Type 3 distributions. It is often applied to slightly non symmetric distributions that resemble a normal distribution but it is not always true. In general the mode, median and mean may appear in any order.^[16]^[17]

It is however known that for unimodal distributions, the mode always lies within ${\sqrt {3}}$ standard deviations of the mean.

A simple example illustrating this is the binomial distribution with n = 10 and p = 0.09.^[18] The mean (0.9) is to the left of the median (1) but the skew (0.906) as defined by the third standardized moment is positive. The distribution when plotted has a long right tail.

van Zwet condition

Van Zwet derived an inequality which provides sufficient conditions for this inequality to hold.

The inequality

mode ≤ median ≤ mean

holds if

F( median − x ) + F( median + x ) ≥ 1

for all x, where F is the cumulative distribution function of the distribution.^[19]

Ordering of skewness

In 1964 van Zwet introduced a series of axioms for ordering measures of skewness.^[20] The nonparametric skew does not satisfy these axioms.

References

^ Rubio FJ, Steel MFJ (2012) On the Marshall–Olkin transformation as a skewing mechanism. Computational Statistics & Data Analysis [http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/steel/steel_homepage/techrep/mosrevcsda.pdf Preprint

^ Yule GU, Kendall MG (1950) An Introduction to the Theory of Statistics. 3rd edition. Harper Publishing Company pp 162–163

^ Hildebrand DK (1986) Statistical thinking for behavioral scientists. Boston: Duxbury

^ ^a ^b Doane, David P.; Seward, Lori E. (2011). "Measuring Skewness: A Forgotten Statistic?" (PDF). Journal of Statistics Education. 19 (2).

^ Hotelling H, Solomons LM (1932) The limits of a measure of skewness. Annals Math Stat 3, 141–114

^ Garver (1932) Concerning the limits of a mesuare of skewness. Ann Math Stats 3(4) 141–142

^ ^a ^b O’Cinneide CA (1990) The mean is within one standard deviation of any median. Amer Statist 44, 292–293

^ Majindar KN (1962) Improved bounds on a measure of skewness. Ann. Math. Stat 33, 1192–1194 doi:10.1214/aoms/1177704482

^ Dziubinska R and Szynal D (1996) On functional measures of skewness. Applicationes Mathematicae 23(4) 395–403

^ Cabilio P, Masaro J (1996) A simple test of symmetry about an unknown median. Can J Stat-Revue Canadienne De Statistique 24:349–361

^ Zheng T, Gastwirth J (2010) On bootstrap tests of symmetry about an unknown median. J Data Sci 8(3): 413–427

^ Mira A (1999) Distribution-free test for symmetry based on Bonferroni’s measure. J Appl Stat 26:959–972

^ Bonferroni CE (1930) Elementi di statistica generale. Seeber, Firenze

^ Miao W, Gel YR, Gastwirth JL (2006) A new test of symmetry about an unknown median. In: Hsiung A, Zhang C-H, Ying Z, eds. Random Walk, Sequential Analysis and Related Topics — A Festschrift in honor of Yuan-Shih Chow. World Scientific; Singapore

^ Stuart A, Ord JK (1994) Kendall’s advanced theory of statistics. Vol 1. Distribution theory. 6th Edition. Edward Arnold, London

^ Relationship between the mean, median, mode, and standard deviation in a unimodal distribution

^ von HippelPaul T (2005) J. Stats Edu 13:2 Mean, Median, and Skew: Correcting a Textbook Rule.

^ Lesser LM (2005). Letter to the editor [comment on von Hippel (2005)]. J Stats Edu 13(2). [1]

^ van Zwet WR (1979) Mean, median, mode II. Statistica Neerlandica 33(1) 1–5

^ van Zwet, WR (1964) Convex transformations of random variables. Mathematics Centre Tract 7, Mathematisch Centrum, Amsterdam

Statistics

Descriptive statistics

Continuous data

Center

Mean
- Arithmetic
- Arithmetic-Geometric
- Cubic
- Generalized/power
- Geometric
- Harmonic
- Heronian
- Heinz
- Lehmer
Median
Mode

Dispersion

Shape

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power
Survey methodology	Sampling Cluster Stratified Opinion poll Questionnaire Standard error
Controlled experiments	Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control
Adaptive designs	Adaptive clinical trial Stochastic approximation Up-and-down designs
Observational studies	Cohort study Cross-sectional study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Score/Lagrange multiplier Wald

Correlation

Regression analysis

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis	Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity
Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)
Frequency domain	Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood

Survival

Survival function

Hazard function

Nelson–Aalen estimator

Test

Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Retrieved from "https://en.wikipedia.org/w/index.php?title=Nonparametric_skew&oldid=495269170" Category: ●Summary statistics Hidden category: ●Wikipedia articles needing clarification from January 2012 ●This page was last edited on 31 May 2012, at 09:36 (UTC). ●This version of the page has been revised. Besides normal editing, the reason for revision may have been that this version contains factual inaccuracies, vandalism, or material not compatible with the Creative Commons Attribution-ShareAlike License. ●Privacy policy ●About Wikipedia ●Disclaimers ●Contact Wikipedia ●Code of Conduct ●Developers ●Statistics ●Cookie statement ●Mobile view