Jump to content

Main menu Navigation ●Main page ●Contents ●Current events ●Random article ●About Wikipedia ●Contact us ●Donate Contribute ●Help ●Learn to edit ●Community portal ●Recent changes ●Upload file

●Create account ●Log in ●Create account ● Log in Pages for logged out editors learn more ●Contributions ●Talk

(Top) 1 Definition 2 Empirical distribution function 3 See also 4 References 5 Further reading

Empirical measure

●Deutsch ●Français Edit links ●Article ●Talk ●Read ●Edit ●View history Tools Actions ●Read ●Edit ●View history General ●What links here ●Related changes ●Upload file ●Special pages ●Permanent link ●Page information ●Cite this page ●Get shortened URL ●Download QR code ●Wikidata item Print/export ●Download as PDF ●Printable version Appearance From Wikipedia, the free encyclopedia

This article includes a list of general references, but it lacks sufficient corresponding inline citations. Please help to improve this article by introducing more precise citations. (March 2011) (Learn how and when to remove this message)

Inprobability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical statistics.

The motivation for studying empirical measures is that it is often impossible to know the true underlying probability measure $P$ . We collect observations $X_{1},X_{2},\dots ,X_{n}$ and compute relative frequencies. We can estimate $P$ , or a related distribution function $F$ by means of the empirical measure or empirical distribution function, respectively. These are uniformly good estimates under certain conditions. Theorems in the area of empirical processes provide rates of this convergence.

Definition[edit]

Let $X_{1},X_{2},\dots$ be a sequence of independent identically distributed random variables with values in the state space S with probability distribution P.

Definition

The empirical measure P_n is defined for measurable subsets of S and given by

{\displaystyle P_{n}(

where

I_{A}

is the indicator function and

\delta _{X}

is the Dirac measure.

Properties

For a fixed measurable set A, nP_n(A) is a binomial random variable with mean nP(A) and variance nP(A)(1 − P(A)).
- In particular, P_n(A) is an unbiased estimatorofP(A).
For a fixed partition $A_{i}$ $A_{i}$ ofS, random variables $Y_{i}=nP_{n}(A_{i})$ $Y_{i}=nP_{n}(A_{i})$ form a multinomial distribution with event probabilities $P(A_{i})$ $P(A_{i})$
- The covariance matrix of this multinomial distribution is $Cov(Y_{i},Y_{j})=nP(A_{i})(\delta _{ij}-P(A_{j}))$ .

Definition

{\displaystyle {\bigl (}P_{n}(

is the empirical measure indexed by

{\mathcal {C}}

, a collection of measurable subsets of S.

To generalize this notion further, observe that the empirical measure $P_{n}$ maps measurable functions $f:S\to \mathbb {R}$ to their empirical mean,

f\mapsto P_{n}f=\int _{S}f\,dP_{n}={\frac {1}{n}}\sum _{i=1}^{n}f(X_{i})

In particular, the empirical measure of A is simply the empirical mean of the indicator function, P_n(A) = P_n I_A.

For a fixed measurable function $f$ , $P_{n}f$ is a random variable with mean $\mathbb {E} f$ and variance ${\frac {1}{n}}\mathbb {E} (f-\mathbb {E} f)^{2}$ .

By the strong law of large numbers, P_n(A) converges to P(A) almost surely for fixed A. Similarly $P_{n}f$ converges to $\mathbb {E} f$ almost surely for a fixed measurable function $f$ . The problem of uniform convergence of P_ntoP was open until Vapnik and Chervonenkis solved it in 1968.^[1]

If the class ${\mathcal {C}}$ (or ${\mathcal {F}}$ ) is Glivenko–Cantelli with respect to P then P_n converges to P uniformly over $c\in {\mathcal {C}}$ (or $f\in {\mathcal {F}}$ ). In other words, with probability 1 we have

{\displaystyle \|P_{n}-P\|_{\mathcal {C}}=\sup _{c\in {\mathcal {C}}}|P_{n}(

\|P_{n}-P\|_{\mathcal {F}}=\sup _{f\in {\mathcal {F}}}|P_{n}f-\mathbb {E} f|\to 0.

Empirical distribution function[edit]

The empirical distribution function provides an example of empirical measures. For real-valued iid random variables $X_{1},\dots ,X_{n}$ it is given by

{\displaystyle F_{n}(

In this case, empirical measures are indexed by a class ${\mathcal {C}}=\{(-\infty ,x]:x\in \mathbb {R} \}.$ It has been shown that ${\mathcal {C}}$ is a uniform Glivenko–Cantelli class, in particular,

{\displaystyle \sup _{F}\|F_{n}(

with probability 1.

References[edit]

^ Vapnik, V.; Chervonenkis, A (1968). "Uniform convergence of frequencies of occurrence of events to their probabilities". Dokl. Akad. Nauk SSSR. 181.

Further reading[edit]

Billingsley, P. (1995). Probability and Measure (Third ed.). New York: John Wiley and Sons. ISBN 0-471-80478-9.
Donsker, M. D. (1952). "Justification and extension of Doob's heuristic approach to the Kolmogorov–Smirnov theorems". Annals of Mathematical Statistics. 23 (2): 277–281. doi:10.1214/aoms/1177729445.
Dudley, R. M. (1978). "Central limit theorems for empirical measures". Annals of Probability. 6 (6): 899–929. doi:10.1214/aop/1176995384. JSTOR 2243028.
Dudley, R. M. (1999). Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathematics. Vol. 63. Cambridge, UK: Cambridge University Press. ISBN 0-521-46102-2.
Wolfowitz, J. (1954). "Generalization of the theorem of Glivenko–Cantelli". Annals of Mathematical Statistics. 25 (1): 131–138. doi:10.1214/aoms/1177728852. JSTOR 2236518.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Empirical_measure&oldid=1204999024" Categories: ●Measures (measure theory) ●Empirical process Hidden categories: ●Articles lacking in-text citations from March 2011 ●All articles lacking in-text citations ●This page was last edited on 8 February 2024, at 15:56 (UTC). ●Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. ●Privacy policy ●About Wikipedia ●Disclaimers ●Contact Wikipedia ●Code of Conduct ●Developers ●Statistics ●Cookie statement ●Mobile view