Jump to content

Main menu Navigation ●Main page ●Contents ●Current events ●Random article ●About Wikipedia ●Contact us ●Donate Contribute ●Help ●Learn to edit ●Community portal ●Recent changes ●Upload file

●Create account ●Log in ●Create account ● Log in Pages for logged out editors learn more ●Contributions ●Talk

(Top) 1 References

Laplace's approximation

●Català Edit links ●Article ●Talk ●Read ●Edit ●View history Tools Actions ●Read ●Edit ●View history General ●What links here ●Related changes ●Upload file ●Special pages ●Permanent link ●Page information ●Cite this page ●Get shortened URL ●Download QR code ●Wikidata item Print/export ●Download as PDF ●Printable version Appearance From Wikipedia, the free encyclopedia

Bayesian statistics
Part of a series on

Posterior = Likelihood × Prior ÷ Evidence
Background
Bayesian inference Bayesian probability Bayes' theorem Bernstein–von Mises theorem Coherence Cox's theorem Cromwell's rule Principle of indifference Principle of maximum entropy
Model building
Weak prior ... Strong prior Conjugate prior Linear regression Empirical Bayes Hierarchical model
Posterior approximation
Markov chain Monte Carlo Laplace's approximation Integrated nested Laplace approximations Variational inference Approximate Bayesian computation
Estimators
Bayesian estimator Credible interval Maximum a posteriori estimation
Evidence approximation
Evidence lower bound Nested sampling
Model evaluation
Bayes factor Model averaging Posterior predictive
Mathematics portal
v t e

Laplace's approximation provides an analytical expression for a posterior probability distribution by fitting a Gaussian distribution with a mean equal to the MAP solution and precision equal to the observed Fisher information.^[1]^[2] The approximation is justified by the Bernstein–von Mises theorem, which states that, under regularity conditions, the error of the approximation tends to 0 as the number of data points tends to infinity.^[3]^[4]

For example, consider a regression or classification model with data set $\{x_{n},y_{n}\}_{n=1,\ldots ,N}$ comprising inputs $x$ and outputs $y$ with (unknown) parameter vector $\theta$ of length $D$ . The likelihood is denoted $p({\bf {y}}|{\bf {x}},\theta )$ and the parameter prior $p(\theta )$ . Suppose one wants to approximate the joint density of outputs and parameters $p({\bf {y}},\theta |{\bf {x}})$ . Bayes' formula reads:

p({\bf {y}},\theta |{\bf {x}})\;=\;p({\bf {y}}|{\bf {x}},\theta )p(\theta |{\bf {x}})\;=\;p({\bf {y}}|{\bf {x}})p(\theta |{\bf {y}},{\bf {x}})\;\simeq \;{\tilde {q}}(\theta )\;=\;Zq(\theta ).

The joint is equal to the product of the likelihood and the prior and by Bayes' rule, equal to the product of the marginal likelihood $p({\bf {y}}|{\bf {x}})$ and posterior $p(\theta |{\bf {y}},{\bf {x}})$ . Seen as a function of $\theta$ the joint is an un-normalised density.

In Laplace's approximation, we approximate the joint by an un-normalised Gaussian ${\tilde {q}}(\theta )=Zq(\theta )$ , where we use $q$ to denote approximate density, ${\tilde {q}}$ for un-normalised density and $Z$ the normalisation constant of ${\tilde {q}}$ (independent of $\theta$ ). Since the marginal likelihood $p({\bf {y}}|{\bf {x}})$ doesn't depend on the parameter $\theta$ and the posterior $p(\theta |{\bf {y}},{\bf {x}})$ normalises over $\theta$ we can immediately identify them with $Z$ and $q(\theta )$ of our approximation, respectively.

Laplace's approximation is

p({\bf {y}},\theta |{\bf {x}})\;\simeq \;p({\bf {y}},{\hat {\theta }}|{\bf {x}})\exp {\big (}-{\tfrac {1}{2}}(\theta -{\hat {\theta }})^{\top }S^{-1}(\theta -{\hat {\theta }}){\big )}\;=\;{\tilde {q}}(\theta ),

where we have defined

{\begin{aligned}{\hat {\theta }}&\;=\;\operatorname {argmax} _{\theta }\log p({\bf {y}},\theta |{\bf {x}}),\\S^{-1}&\;=\;-\left.\nabla _{\theta }\nabla _{\theta }\log p({\bf {y}},\theta |{\bf {x}})\right|_{\theta ={\hat {\theta }}},\end{aligned}}

where ${\hat {\theta }}$ is the location of a mode of the joint target density, also known as the maximum a posteriori or MAP point and $S^{-1}$ is the $D\times D$ positive definite matrix of second derivatives of the negative log joint target density at the mode $\theta ={\hat {\theta }}$ . Thus, the Gaussian approximation matches the value and the log-curvature of the un-normalised target density at the mode. The value of ${\hat {\theta }}$ is usually found using a gradient based method.

In summary, we have

{\begin{aligned}q(\theta )&\;=\;{\cal {N}}(\theta |\mu ={\hat {\theta }},\Sigma =S),\\\log Z&\;=\;\log p({\bf {y}},{\hat {\theta }}|{\bf {x}})+{\tfrac {1}{2}}\log |S|+{\tfrac {D}{2}}\log(2\pi ),\end{aligned}}

for the approximate posterior over $\theta$ and the approximate log marginal likelihood respectively.

The main weaknesses of Laplace's approximation are that it is symmetric around the mode and that it is very local: the entire approximation is derived from properties at a single point of the target density. Laplace's method is widely used and was pioneered in the context of neural networks by David MacKay,^[5] and for Gaussian processes by Williams and Barber.^[6]

References[edit]

^ Kass, Robert E.; Tierney, Luke; Kadane, Joseph B. (1991). "Laplace's method in Bayesian analysis". Statistical Multiple Integration. Contemporary Mathematics. Vol. 115. pp. 89–100. doi:10.1090/conm/115/07. ISBN 0-8218-5122-5.

^ MacKay, David J. C. (2003). "Information Theory, Inference and Learning Algorithms, chapter 27: Laplace's method" (PDF).

^ Hartigan, J. A. (1983). "Asymptotic Normality of Posterior Distributions". Bayes Theory. Springer Series in Statistics. New York: Springer. pp. 107–118. doi:10.1007/978-1-4613-8242-3_11. ISBN 978-1-4613-8244-7.

^ Kass, Robert E.; Tierney, Luke; Kadane, Joseph B. (1990). "The Validity of Posterior Expansions Based on Laplace's Method". In Geisser, S.; Hodges, J. S.; Press, S. J.; Zellner, A. (eds.). Bayesian and Likelihood Methods in Statistics and Econometrics. Elsevier. pp. 473–488. ISBN 0-444-88376-2.

^ MacKay, David J. C. (1992). "Bayesian Interpolation" (PDF). Neural Computation. 4 (3). MIT Press: 415–447. doi:10.1162/neco.1992.4.3.415. S2CID 1762283.

^ Williams, Christopher K. I.; Barber, David (1998). "Bayesian classification with Gaussian Processes" (PDF). PAMI. 20 (12). IEEE: 1342–1351. doi:10.1109/34.735807.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Laplace%27s_approximation&oldid=1231721957" Categories: ●Statistical approximations ●Bayesian inference Hidden categories: ●Articles with short description ●Short description with empty Wikidata description ●This page was last edited on 29 June 2024, at 21:58 (UTC). ●Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. ●Privacy policy ●About Wikipedia ●Disclaimers ●Contact Wikipedia ●Code of Conduct ●Developers ●Statistics ●Cookie statement ●Mobile view