Jump to content

Main menu Navigation ●Main page ●Contents ●Current events ●Random article ●About Wikipedia ●Contact us ●Donate Contribute ●Help ●Learn to edit ●Community portal ●Recent changes ●Upload file

●Create account ●Log in ●Create account ● Log in Pages for logged out editors learn more ●Contributions ●Talk

(Top) 1 Introduction 2 Point estimation 2.1 Robbins' method: non-parametric empirical Bayes (NPEB) 2.2 Parametric empirical Bayes 2.2.1 Gaussian–Gaussian model 2.2.2 Poisson–gamma model 3 See also 4 References 5 Further reading 6 External links

Empirical Bayes method

Add links ●Article ●Talk ●Read ●Edit ●View history Tools Actions ●Read ●Edit ●View history General ●What links here ●Related changes ●Upload file ●Special pages ●Permanent link ●Page information ●Cite this page ●Get shortened URL ●Download QR code ●Wikidata item Print/export ●Download as PDF ●Printable version Appearance From Wikipedia, the free encyclopedia

Bayesian statistics
Part of a series on

Posterior = Likelihood × Prior ÷ Evidence
Background
Bayesian inference Bayesian probability Bayes' theorem Bernstein–von Mises theorem Coherence Cox's theorem Cromwell's rule Principle of indifference Principle of maximum entropy
Model building
Weak prior ... Strong prior Conjugate prior Linear regression Empirical Bayes Hierarchical model
Posterior approximation
Markov chain Monte Carlo Laplace's approximation Integrated nested Laplace approximations Variational inference Approximate Bayesian computation
Estimators
Bayesian estimator Credible interval Maximum a posteriori estimation
Evidence approximation
Evidence lower bound Nested sampling
Model evaluation
Bayes factor Model averaging Posterior predictive
Mathematics portal
v t e

Empirical Bayes methods are procedures for statistical inference in which the prior probability distribution is estimated from the data. This approach stands in contrast to standard Bayesian methods, for which the prior distribution is fixed before any data are observed. Despite this difference in perspective, empirical Bayes may be viewed as an approximation to a fully Bayesian treatment of a hierarchical model wherein the parameters at the highest level of the hierarchy are set to their most likely values, instead of being integrated out.^[1] Empirical Bayes, also known as maximum marginal likelihood,^[2] represents a convenient approach for setting hyperparameters, but has been mostly supplanted by fully Bayesian hierarchical analyses since the 2000s with the increasing availability of well-performing computation techniques. It is still commonly used, however, for variational methods in Deep Learning, such as variational autoencoders, where latent variable spaces are high-dimensional.

Introduction[edit]

Empirical Bayes methods can be seen as an approximation to a fully Bayesian treatment of a hierarchical Bayes model.

In, for example, a two-stage hierarchical Bayes model, observed data $y=\{y_{1},y_{2},\dots ,y_{n}\}$ are assumed to be generated from an unobserved set of parameters $\theta =\{\theta _{1},\theta _{2},\dots ,\theta _{n}\}$ according to a probability distribution $p(y\mid \theta )\,$ . In turn, the parameters $\theta$ can be considered samples drawn from a population characterised by hyperparameters $\eta \,$ according to a probability distribution $p(\theta \mid \eta )\,$ . In the hierarchical Bayes model, though not in the empirical Bayes approximation, the hyperparameters $\eta \,$ are considered to be drawn from an unparameterized distribution $p(\eta )\,$ .

Information about a particular quantity of interest $\theta _{i}\;$ therefore comes not only from the properties of those data $y$ that directly depend on it, but also from the properties of the population of parameters $\theta \;$ as a whole, inferred from the data as a whole, summarised by the hyperparameters $\eta \;$ .

Using Bayes' theorem,

{\displaystyle p(\theta \mid y)={\frac {p(y\mid \theta )p(\theta )}{p(

In general, this integral will not be tractable analyticallyorsymbolically and must be evaluated by numerical methods. Stochastic (random) or deterministic approximations may be used. Example stochastic methods are Markov Chain Monte Carlo and Monte Carlo sampling. Deterministic approximations are discussed in quadrature.

Alternatively, the expression can be written as

p(\theta \mid y)=\int p(\theta \mid \eta ,y)p(\eta \mid y)\;d\eta =\int {\frac {p(y\mid \theta )p(\theta \mid \eta )}{p(y\mid \eta )}}p(\eta \mid y)\;d\eta \,,

and the final factor in the integral can in turn be expressed as

p(\eta \mid y)=\int p(\eta \mid \theta )p(\theta \mid y)\;d\theta .

These suggest an iterative scheme, qualitatively similar in structure to a Gibbs sampler, to evolve successively improved approximations to $p(\theta \mid y)\;$ and $p(\eta \mid y)\;$ . First, calculate an initial approximation to $p(\theta \mid y)\;$ ignoring the $\eta$ dependence completely; then calculate an approximation to $p(\eta \mid y)\;$ based upon the initial approximate distribution of $p(\theta \mid y)\;$ ; then use this $p(\eta \mid y)\;$ to update the approximation for $p(\theta \mid y)\;$ ; then update $p(\eta \mid y)\;$ ; and so on.

When the true distribution $p(\eta \mid y)\;$ is sharply peaked, the integral determining $p(\theta \mid y)\;$ may be not much changed by replacing the probability distribution over $\eta \;$ with a point estimate $\eta ^{*}\;$ representing the distribution's peak (or, alternatively, its mean),

p(\theta \mid y)\simeq {\frac {p(y\mid \theta )\;p(\theta \mid \eta ^{*})}{p(y\mid \eta ^{*})}}\,.

With this approximation, the above iterative scheme becomes the EM algorithm.

The term "Empirical Bayes" can cover a wide variety of methods, but most can be regarded as an early truncation of either the above scheme or something quite like it. Point estimates, rather than the whole distribution, are typically used for the parameter(s) $\eta \;$ . The estimates for $\eta ^{*}\;$ are typically made from the first approximation to $p(\theta \mid y)\;$ without subsequent refinement. These estimates for $\eta ^{*}\;$ are usually made without considering an appropriate prior distribution for $\eta$ .

Point estimation[edit]

Robbins' method: non-parametric empirical Bayes (NPEB)[edit]

Robbins^[3] considered a case of sampling from a mixed distribution, where probability for each $y_{i}$ (conditional on $\theta _{i}$ ) is specified by a Poisson distribution,

p(y_{i}\mid \theta _{i})={{\theta _{i}}^{y_{i}}e^{-\theta _{i}} \over {y_{i}}!}

while the prior on θ is unspecified except that it is also i.i.d. from an unknown distribution, with cumulative distribution function $G(\theta )$ . Compound sampling arises in a variety of statistical estimation problems, such as accident rates and clinical trials.^{[citation needed]} We simply seek a point prediction of $\theta _{i}$ given all the observed data. Because the prior is unspecified, we seek to do this without knowledge of G.^[4]

Under squared error loss (SEL), the conditional expectation E(θ_i | Y_i = y_i) is a reasonable quantity to use for prediction. For the Poisson compound sampling model, this quantity is

\operatorname {E} (\theta _{i}\mid y_{i})={\int (\theta ^{y_{i}+1}e^{-\theta }/{y_{i}}!)\,dG(\theta ) \over {\int (\theta ^{y_{i}}e^{-\theta }/{y_{i}}!)\,dG(\theta })}.

This can be simplified by multiplying both the numerator and denominator by $({y_{i}}+1)$ , yielding

\operatorname {E} (\theta _{i}\mid y_{i})={{(y_{i}+1)p_{G}(y_{i}+1)} \over {p_{G}(y_{i})}},

where p_G is the marginal probability mass function obtained by integrating out θ over G.

To take advantage of this, Robbins^[3] suggested estimating the marginals with their empirical frequencies ( $\#\{Y_{j}\}$ ), yielding the fully non-parametric estimate as:

\operatorname {E} (\theta _{i}\mid y_{i})\approx (y_{i}+1){{\#\{Y_{j}=y_{i}+1\}} \over {\#\{Y_{j}=y_{i}\}}},

where $\#$ denotes "number of". (See also Good–Turing frequency estimation.)

Example – Accident rates

Suppose each customer of an insurance company has an "accident rate" Θ and is insured against accidents; the probability distribution of Θ is the underlying distribution, and is unknown. The number of accidents suffered by each customer in a specified time period has a Poisson distribution with expected value equal to the particular customer's accident rate. The actual number of accidents experienced by a customer is the observable quantity. A crude way to estimate the underlying probability distribution of the accident rate Θ is to estimate the proportion of members of the whole population suffering 0, 1, 2, 3, ... accidents during the specified time period as the corresponding proportion in the observed random sample. Having done so, it is then desired to predict the accident rate of each customer in the sample. As above, one may use the conditional expected value of the accident rate Θ given the observed number of accidents during the baseline period. Thus, if a customer suffers six accidents during the baseline period, that customer's estimated accident rate is 7 × [the proportion of the sample who suffered 7 accidents] / [the proportion of the sample who suffered 6 accidents]. Note that if the proportion of people suffering k accidents is a decreasing function of k, the customer's predicted accident rate will often be lower than their observed number of accidents.

This shrinkage effect is typical of empirical Bayes analyses.

Parametric empirical Bayes[edit]

If the likelihood and its prior take on simple parametric forms (such as 1- or 2-dimensional likelihood functions with simple conjugate priors), then the empirical Bayes problem is only to estimate the marginal $m(y\mid \eta )$ and the hyperparameters $\eta$ using the complete set of empirical measurements. For example, one common approach, called parametric empirical Bayes point estimation, is to approximate the marginal using the maximum likelihood estimate (MLE), or a moments expansion, which allows one to express the hyperparameters $\eta$ in terms of the empirical mean and variance. This simplified marginal allows one to plug in the empirical averages into a point estimate for the prior $\theta$ . The resulting equation for the prior $\theta$ is greatly simplified, as shown below.

There are several common parametric empirical Bayes models, including the Poisson–gamma model (below), the Beta-binomial model, the Gaussian–Gaussian model, the Dirichlet-multinomial model, as well specific models for Bayesian linear regression (see below) and Bayesian multivariate linear regression. More advanced approaches include hierarchical Bayes models and Bayesian mixture models.

Gaussian–Gaussian model[edit]

For an example of empirical Bayes estimation using a Gaussian-Gaussian model, see Empirical Bayes estimators.

Poisson–gamma model[edit]

For example, in the example above, let the likelihood be a Poisson distribution, and let the prior now be specified by the conjugate prior, which is a gamma distribution ( $G(\alpha ,\beta )$ ) (where $\eta =(\alpha ,\beta )$ ):

\rho (\theta \mid \alpha ,\beta )\,d\theta ={\frac {(\theta /\beta )^{\alpha -1}\,e^{-\theta /\beta }}{\Gamma (\alpha )}}\,(d\theta /\beta ){\text{ for }}\theta >0,\alpha >0,\beta >0\,\!.

It is straightforward to show the posterior is also a gamma distribution. Write

\rho (\theta \mid y)\propto \rho (y\mid \theta )\rho (\theta \mid \alpha ,\beta ),

where the marginal distribution has been omitted since it does not depend explicitly on $\theta$ . Expanding terms which do depend on $\theta$ gives the posterior as:

\rho (\theta \mid y)\propto (\theta ^{y}\,e^{-\theta })(\theta ^{\alpha -1}\,e^{-\theta /\beta })=\theta ^{y+\alpha -1}\,e^{-\theta (1+1/\beta )}.

So the posterior density is also a gamma distribution $G(\alpha ',\beta ')$ , where $\alpha '=y+\alpha$ , and $\beta '=(1+1/\beta )^{-1}$ . Also notice that the marginal is simply the integral of the posterior over all $\Theta$ , which turns out to be a negative binomial distribution.

To apply empirical Bayes, we will approximate the marginal using the maximum likelihood estimate (MLE). But since the posterior is a gamma distribution, the MLE of the marginal turns out to be just the mean of the posterior, which is the point estimate $\operatorname {E} (\theta \mid y)$ we need. Recalling that the mean $\mu$ of a gamma distribution $G(\alpha ',\beta ')$ is simply $\alpha '\beta '$ , we have

\operatorname {E} (\theta \mid y)=\alpha '\beta '={\frac {{\bar {y}}+\alpha }{1+1/\beta }}={\frac {\beta }{1+\beta }}{\bar {y}}+{\frac {1}{1+\beta }}(\alpha \beta ).

To obtain the values of $\alpha$ and $\beta$ , empirical Bayes prescribes estimating mean $\alpha \beta$ and variance $\alpha \beta ^{2}$ using the complete set of empirical data.

The resulting point estimate $\operatorname {E} (\theta \mid y)$ is therefore like a weighted average of the sample mean ${\bar {y}}$ and the prior mean $\mu =\alpha \beta$ . This turns out to be a general feature of empirical Bayes; the point estimates for the prior (i.e. mean) will look like a weighted averages of the sample estimate and the prior estimate (likewise for estimates of the variance).

References[edit]

This article includes a list of general references, but it lacks sufficient corresponding inline citations. Please help to improve this article by introducing more precise citations. (February 2012) (Learn how and when to remove this message)

^ Carlin, Bradley P.; Louis, Thomas A. (2002). "Empirical Bayes: Past, Present, and Future". In Raftery, Adrian E.; Tanner, Martin A.; Wells, Martin T. (eds.). Statistics in the 21st Century. Chapman & Hall. pp. 312–318. ISBN 1-58488-272-7.

^ C.M. Bishop (2005). Neural networks for pattern recognition. Oxford University Press ISBN 0-19-853864-2

^ ^a ^b Robbins, Herbert (1956). "An Empirical Bayes Approach to Statistics". Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics. Springer Series in Statistics: 157–163. doi:10.1007/978-1-4612-0919-5_26. ISBN 978-0-387-94037-3. MR 0084919.

^ Carlin, Bradley P.; Louis, Thomas A. (2000). Bayes and Empirical Bayes Methods for Data Analysis (2nd ed.). Chapman & Hall/CRC. pp. Sec. 3.2 and Appendix B. ISBN 978-1-58488-170-4.

External links[edit]

Retrieved from "https://en.wikipedia.org/w/index.php?title=Empirical_Bayes_method&oldid=1186825757" Category: ●Nonparametric Bayesian statistics Hidden categories: ●Articles with short description ●Short description is different from Wikidata ●All articles with unsourced statements ●Articles with unsourced statements from February 2012 ●Articles lacking in-text citations from February 2012 ●All articles lacking in-text citations ●This page was last edited on 25 November 2023, at 18:49 (UTC). ●Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. ●Privacy policy ●About Wikipedia ●Disclaimers ●Contact Wikipedia ●Code of Conduct ●Developers ●Statistics ●Cookie statement ●Mobile view

Contents