Jump to content

Main menu Navigation ●Main page ●Contents ●Current events ●Random article ●About Wikipedia ●Contact us ●Donate Contribute ●Help ●Learn to edit ●Community portal ●Recent changes ●Upload file

●Create account ●Log in ●Create account ● Log in Pages for logged out editors learn more ●Contributions ●Talk

(Top) 1 History 2 Definition 3 Examples 4 See also 5 Notes 6 References

U-statistic

●فارسی ●Français ●中文 Edit links ●Article ●Talk ●Read ●Edit ●View history Tools Actions ●Read ●Edit ●View history General ●What links here ●Related changes ●Upload file ●Special pages ●Permanent link ●Page information ●Cite this page ●Get shortened URL ●Download QR code ●Wikidata item Print/export ●Download as PDF ●Printable version Appearance From Wikipedia, the free encyclopedia

Instatistical theory, a U-statistic is a class of statistics defined as the average over the application of a given function applied to all tuples of a fixed size. The letter "U" stands for unbiased. In elementary statistics, U-statistics arise naturally in producing minimum-variance unbiased estimators.

The theory of U-statistics allows a minimum-variance unbiased estimator to be derived from each unbiased estimator of an estimable parameter (alternatively, statistical functional) for large classes of probability distributions.^[1]^[2] An estimable parameter is a measurable function of the population's cumulative probability distribution: For example, for every probability distribution, the population median is an estimable parameter. The theory of U-statistics applies to general classes of probability distributions.

History

[edit]

Many statistics originally derived for particular parametric families have been recognized as U-statistics for general distributions. In non-parametric statistics, the theory of U-statistics is used to establish for statistical procedures (such as estimators and tests) and estimators relating to the asymptotic normality and to the variance (in finite samples) of such quantities.^[3] The theory has been used to study more general statistics as well as stochastic processes, such as random graphs.^[4]^[5]^[6]

Suppose that a problem involves independent and identically-distributed random variables and that estimation of a certain parameter is required. Suppose that a simple unbiased estimate can be constructed based on only a few observations: this defines the basic estimator based on a given number of observations. For example, a single observation is itself an unbiased estimate of the mean and a pair of observations can be used to derive an unbiased estimate of the variance. The U-statistic based on this estimator is defined as the average (across all combinatorial selections of the given size from the full set of observations) of the basic estimator applied to the sub-samples.

Pranab K. Sen (1992) provides a review of the paper by Wassily Hoeffding (1948), which introduced U-statistics and set out the theory relating to them, and in doing so Sen outlines the importance U-statistics have in statistical theory. Sen says,^[7] “The impact of Hoeffding (1948) is overwhelming at the present time and is very likely to continue in the years to come.” Note that the theory of U-statistics is not limited to^[8] the case of independent and identically-distributed random variables or to scalar random-variables.^[9]

Definition

[edit]

The term U-statistic, due to Hoeffding (1948), is defined as follows.

Let $K$ be either the real or complex numbers, and let $f\colon (K^{d})^{r}\to K$ be a $K$ -valued function of $r$ $d$ -dimensional variables. For each $n\geq r$ the associated U-statistic $f_{n}\colon (K^{d})^{n}\to K$ is defined to be the average of the values $f(x_{i_{1}},\dotsc ,x_{i_{r}})$ over the set $I_{r,n}$ of $r$ -tuples of indices from $\{1,2,\dotsc ,n\}$ with distinct entries. Formally,

f_{n}(x_{1},\dotsc ,x_{n})={\frac {1}{\prod _{i=0}^{r-1}(n-i)}}\sum _{(i_{1},\dotsc ,i_{r})\in I_{r,n}}f(x_{i_{1}},\dotsc ,x_{i_{r}})

In particular, if $f$ is symmetric the above is simplified to

f_{n}(x_{1},\dotsc ,x_{n})={\frac {1}{\binom {n}{r}}}\sum _{(i_{1},\dotsc ,i_{r})\in J_{r,n}}f(x_{i_{1}},\dotsc ,x_{i_{r}})

where now $J_{r,n}$ denotes the subset of $I_{r,n}$ ofincreasing tuples.

Each U-statistic $f_{n}$ is necessarily a symmetric function.

U-statistics are very natural in statistical work, particularly in Hoeffding's context of independent and identically distributed random variables, or more generally for exchangeable sequences, such as in simple random sampling from a finite population, where the defining property is termed ‘inheritance on the average’.

Fisher's k-statistics and Tukey's polykays are examples of homogeneous polynomial U-statistics (Fisher, 1929; Tukey, 1950).

For a simple random sample φ of size n taken from a population of size N, the U-statistic has the property that the average over sample values ƒ_n(xφ) is exactly equal to the population value ƒ_N(x).^{[clarification needed]}

Examples

[edit]

Some examples: If ${\displaystyle f($ the U-statistic ${\displaystyle f_{n}($ is the sample mean.

If $f(x_{1},x_{2})=|x_{1}-x_{2}|$ , the U-statistic is the mean pairwise deviation $f_{n}(x_{1},\ldots ,x_{n})=2/(n(n-1))\sum _{i>j}|x_{i}-x_{j}|$ , defined for $n\geq 2$ .

If $f(x_{1},x_{2})=(x_{1}-x_{2})^{2}/2$ , the U-statistic is the sample variance ${\displaystyle f_{n}($ with divisor $n-1$ , defined for $n\geq 2$ .

The third $k$ -statistic ${\displaystyle k_{3,n}($ , the sample skewness defined for $n\geq 3$ , is a U-statistic.

The following case highlights an important point. If $f(x_{1},x_{2},x_{3})$ is the median of three values, $f_{n}(x_{1},\ldots ,x_{n})$ is not the median of $n$ values. However, it is a minimum variance unbiased estimate of the expected value of the median of three values, not the median of the population. Similar estimates play a central role where the parameters of a family of probability distributions are being estimated by probability weighted moments or L-moments.

Notes

[edit]

^ Cox & Hinkley (1974), p. 200, p. 258

^ Hoeffding (1948), between Eq's(4.3),(4.4)

^ Sen (1992)

^ Page 508 in Koroljuk, V. S.; Borovskich, Yu. V. (1994). Theory of U-statistics. Mathematics and its Applications. Vol. 273 (Translated by P. V. Malyshev and D. V. Malyshev from the 1989 Russian original ed.). Dordrecht: Kluwer Academic Publishers Group. pp. x+552. ISBN 0-7923-2608-3. MR 1472486.

^ Pages 381–382 in Borovskikh, Yu. V. (1996). U-statistics in Banach spaces. Utrecht: VSP. pp. xii+420. ISBN 90-6764-200-2. MR 1419498.

^ Page xii in Kwapień, Stanisƚaw; Woyczyński, Wojbor A. (1992). Random series and stochastic integrals: Single and multiple. Probability and its Applications. Boston, MA: Birkhäuser Boston, Inc. pp. xvi+360. ISBN 0-8176-3572-6. MR 1167198.

^ Sen (1992) p. 307

^ Sen (1992), p306

^ Borovskikh's last chapter discusses U-statistics for exchangeable random elements taking values in a vector space (separable Banach space).

References

[edit]

Borovskikh, Yu. V. (1996). U-statistics in Banach spaces. Utrecht: VSP. pp. xii+420. ISBN 90-6764-200-2. MR 1419498.
Cox, D. R., Hinkley, D. V. (1974) Theoretical statistics. Chapman and Hall. ISBN 0-412-12420-3
Fisher, R. A. (1929) Moments and product moments of sampling distributions. Proceedings of the London Mathematical Society, 2, 30:199–238.
Hoeffding, W. (1948) A class of statistics with asymptotically normal distributions. Annals of Statistics, 19:293–325. (Partially reprinted in: Kotz, S., Johnson, N. L. (1992) Breakthroughs in Statistics, Vol I, pp 308–334. Springer-Verlag. ISBN 0-387-94037-5)
Koroljuk, V. S.; Borovskich, Yu. V. (1994). Theory of U-statistics. Mathematics and its Applications. Vol. 273 (Translated by P. V. Malyshev and D. V. Malyshev from the 1989 Russian original ed.). Dordrecht: Kluwer Academic Publishers Group. pp. x+552. ISBN 0-7923-2608-3. MR 1472486.
Lee, A. J. (1990) U-Statistics: Theory and Practice. Marcel Dekker, New York. pp320 ISBN 0-8247-8253-4
Sen, P. K. (1992) Introduction to Hoeffding (1948) A Class of Statistics with Asymptotically Normal Distribution. In: Kotz, S., Johnson, N. L. Breakthroughs in Statistics, Vol I, pp 299–307. Springer-Verlag. ISBN 0-387-94037-5.
Serfling, Robert J. (1980). Approximation theorems of mathematical statistics. New York: John Wiley and Sons. ISBN 0-471-02403-1.
Tukey, J. W. (1950). "Some Sampling Simplified". Journal of the American Statistical Association. 45 (252): 501–519. doi:10.1080/01621459.1950.10501142.
Halmos, P. (1946). "The Theory of Unbiased Estimation". Annals of Mathematical Statistics. 1 (17): 34–43. doi:10.1214/aoms/1177731020.

Statistics

Descriptive statistics

Continuous data

Center

Mean
- Arithmetic
- Arithmetic-Geometric
- Cubic
- Generalized/power
- Geometric
- Harmonic
- Heronian
- Heinz
- Lehmer
Median
Mode

Dispersion

Shape

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power
Survey methodology	Sampling Cluster Stratified Opinion poll Questionnaire Standard error
Controlled experiments	Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control
Adaptive designs	Adaptive clinical trial Stochastic approximation Up-and-down designs
Observational studies	Cohort study Cross-sectional study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Score/Lagrange multiplier Wald

Correlation

Regression analysis

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis	Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity
Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)
Frequency domain	Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood

Survival

Survival function

Hazard function

Nelson–Aalen estimator

Test

Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Retrieved from "https://en.wikipedia.org/w/index.php?title=U-statistic&oldid=1194863243" Categories: ●Estimation theory ●Nonparametric statistics ●Asymptotic theory (statistics) ●U-statistics Hidden categories: ●Articles with short description ●Short description matches Wikidata ●Wikipedia articles needing clarification from June 2022 ●This page was last edited on 11 January 2024, at 03:20 (UTC). ●Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. ●Privacy policy ●About Wikipedia ●Disclaimers ●Contact Wikipedia ●Code of Conduct ●Developers ●Statistics ●Cookie statement ●Mobile view