Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Reparameterization  



1.1  One-parameter case  





1.2  Multiple-parameter case  







2 Attributes  





3 Minimum description length  





4 Examples  



4.1  Gaussian distribution with mean parameter  





4.2  Gaussian distribution with standard deviation parameter  





4.3  Poisson distribution with rate parameter  





4.4  Bernoulli trial  





4.5  N-sided die with biased probabilities  







5 Generalizations  



5.1  Probability-matching prior  





5.2  α-parallel prior  







6 References  





7 Further reading  














Jeffreys prior






Deutsch
فارسی

Polski
Português
Русский
 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


InBayesian probability, the Jeffreys prior, named after Sir Harold Jeffreys,[1] is a non-informative prior distribution for a parameter space; its density function is proportional to the square root of the determinant of the Fisher information matrix:

It has the key feature that it is invariant under a change of coordinates for the parameter vector . That is, the relative probability assigned to a volume of a probability space using a Jeffreys prior will be the same regardless of the parameterization used to define the Jeffreys prior. This makes it of special interest for use with scale parameters.[2] As a concrete example, a Bernoulli distribution can be parametrized by the probability of occurrence p, or by the odds ratio. A naive uniform prior in this case is not invariant to this reparametrization, but the Jeffreys prior is.

Inmaximum likelihood estimationofexponential family models, penalty terms based on the Jeffreys prior were shown to reduce asymptotic bias in point estimates.[3][4]

Reparameterization[edit]

One-parameter case[edit]

If and are two possible parametrizations of a statistical model, and is a continuously differentiable function of , we say that the prior is "invariant" under a reparametrization if

that is, if the priors and are related by the usual change of variables theorem.

Since the Fisher information transforms under reparametrization as

defining the priors as and gives us the desired "invariance".[5]

Multiple-parameter case[edit]

Analogous to the one-parameter case, let and be two possible parametrizations of a statistical model, with a continuously differentiable function of . We call the prior "invariant" under reparametrization if

where is the Jacobian matrix with entries

Since the Fisher information matrix transforms under reparametrization as

we have that

and thus defining the priors as and gives us the desired "invariance".

Attributes[edit]

From a practical and mathematical standpoint, a valid reason to use this non-informative prior instead of others, like the ones obtained through a limit in conjugate families of distributions, is that the relative probability of a volume of the probability space is not dependent upon the set of parameter variables that is chosen to describe parameter space.

Sometimes the Jeffreys prior cannot be normalized, and is thus an improper prior. For example, the Jeffreys prior for the distribution mean is uniform over the entire real line in the case of a Gaussian distribution of known variance.

Use of the Jeffreys prior violates the strong version of the likelihood principle, which is accepted by many, but by no means all, statisticians. When using the Jeffreys prior, inferences about depend not just on the probability of the observed data as a function of , but also on the universe of all possible experimental outcomes, as determined by the experimental design, because the Fisher information is computed from an expectation over the chosen universe. Accordingly, the Jeffreys prior, and hence the inferences made using it, may be different for two experiments involving the same parameter even when the likelihood functions for the two experiments are the same—a violation of the strong likelihood principle.

Minimum description length[edit]

In the minimum description length approach to statistics the goal is to describe data as compactly as possible where the length of a description is measured in bits of the code used. For a parametric family of distributions one compares a code with the best code based on one of the distributions in the parameterized family. The main result is that in exponential families, asymptotically for large sample size, the code based on the distribution that is a mixture of the elements in the exponential family with the Jeffreys prior is optimal. This result holds if one restricts the parameter set to a compact subset in the interior of the full parameter space[citation needed]. If the full parameter is used a modified version of the result should be used.

Examples[edit]

The Jeffreys prior for a parameter (or a set of parameters) depends upon the statistical model.

Gaussian distribution with mean parameter[edit]

For the Gaussian distribution of the real value

with fixed, the Jeffreys prior for the mean is

That is, the Jeffreys prior for does not depend upon ; it is the unnormalized uniform distribution on the real line — the distribution that is 1 (or some other fixed constant) for all points. This is an improper prior, and is, up to the choice of constant, the unique translation-invariant distribution on the reals (the Haar measure with respect to addition of reals), corresponding to the mean being a measure of location and translation-invariance corresponding to no information about location.

Gaussian distribution with standard deviation parameter[edit]

For the Gaussian distribution of the real value

with fixed, the Jeffreys prior for the standard deviation is

Equivalently, the Jeffreys prior for is the unnormalized uniform distribution on the real line, and thus this distribution is also known as the logarithmic prior. Similarly, the Jeffreys prior for is also uniform. It is the unique (up to a multiple) prior (on the positive reals) that is scale-invariant (the Haar measure with respect to multiplication of positive reals), corresponding to the standard deviation being a measure of scale and scale-invariance corresponding to no information about scale. As with the uniform distribution on the reals, it is an improper prior.

Poisson distribution with rate parameter[edit]

For the Poisson distribution of the non-negative integer ,

the Jeffreys prior for the rate parameter is

Equivalently, the Jeffreys prior for is the unnormalized uniform distribution on the non-negative real line.

Bernoulli trial[edit]

For a coin that is "heads" with probability and is "tails" with probability , for a given the probability is . The Jeffreys prior for the parameter is

This is the arcsine distribution and is a beta distribution with . Furthermore, if then

That is, the Jeffreys prior for is uniform in the interval . Equivalently, is uniform on the whole circle .

N-sided die with biased probabilities[edit]

Similarly, for a throw of an -sided die with outcome probabilities , each non-negative and satisfying , the Jeffreys prior for is the Dirichlet distribution with all (alpha) parameters set to one half. This amounts to using a pseudocount of one half for each possible outcome.

Equivalently, if we write for each , then the Jeffreys prior for is uniform on the (N − 1)-dimensional unit sphere (i.e., it is uniform on the surface of an N-dimensional unit ball).

Generalizations[edit]

Probability-matching prior[edit]

In 1963, Welch and Peers showed that for a scalar parameter θ the Jeffreys prior is "probability-matching" in the sense that posterior predictive probabilities agree with frequentist probabilities and credible intervals of a chosen width coincide with frequentist confidence intervals.[6] In a follow-up, Peers showed that this was not true for the multi-parameter case,[7] instead leading to the notion of probability-matching priors with are only implicitly defined as the probability distribution solving a certain partial differential equation involving the Fisher information.[8]

α-parallel prior[edit]

Using tools from information geometry, the Jeffreys prior can be generalized in pursuit of obtaining priors that encode geometric information of the statistical model, so as to be invariant under a change of the coordinate of parameters.[9] A special case, the so-called Weyl prior, is defined as a volume form on a Weyl manifold.[10]  

References[edit]

  1. ^ Jeffreys H (1946). "An invariant form for the prior probability in estimation problems". Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences. 186 (1007): 453–461. Bibcode:1946RSPSA.186..453J. doi:10.1098/rspa.1946.0056. JSTOR 97883. PMID 20998741.
  • ^ Jaynes ET (September 1968). "Prior probabilities" (PDF). IEEE Transactions on Systems Science and Cybernetics. 4 (3): 227–241. doi:10.1109/TSSC.1968.300117.
  • ^ Firth, David (1992). "Bias reduction, the Jeffreys prior and GLIM". In Fahrmeir, Ludwig; Francis, Brian; Gilchrist, Robert; Tutz, Gerhard (eds.). Advances in GLIM and Statistical Modelling. New York: Springer. pp. 91–100. doi:10.1007/978-1-4612-2952-0_15. ISBN 0-387-97873-9.
  • ^ Magis, David (2015). "A Note on Weighted Likelihood and Jeffreys Modal Estimation of Proficiency Levels in Polytomous Item Response Models". Psychometrika. 80: 200–204. doi:10.1007/s11336-013-9378-5.
  • ^ Robert CP, Chopin N, Rousseau J (2009). "Harold Jeffreys's Theory of Probability Revisited". Statistical Science. 24 (2). arXiv:0804.3173. doi:10.1214/09-STS284.
  • ^ Welch, B. L.; Peers, H. W. (1963). "On Formulae for Confidence Points Based on Integrals of Weighted Likelihoods". Journal of the Royal Statistical Society. Series B (Methodological). 25 (2): 318–329. doi:10.1111/j.2517-6161.1963.tb00512.x.
  • ^ Peers, H. W. (1965). "On Confidence Points and Bayesian Probability Points in the Case of Several Parameters". Journal of the Royal Statistical Society. Series B (Methodological). 27 (1): 9–16. doi:10.1111/j.2517-6161.1965.tb00581.x.
  • ^ Scricciolo, Catia (1999). "Probability matching priors: a review". Journal of the Italian Statistical Society. 8. 83. doi:10.1007/BF03178943.
  • ^ Takeuchi, J.; Amari, S. (2005). "α-parallel prior and its properties". IEEE Transactions on Information Theory. 51 (3): 1011–1023. doi:10.1109/TIT.2004.842703.
  • ^ Jiang, Ruichao; Tavakoli, Javad; Zhao, Yiqiang (2020). "Weyl Prior and Bayesian Statistics". Entropy. 22 (4). 467. doi:10.3390/e22040467. PMC 7516948.
  • Further reading[edit]


    Retrieved from "https://en.wikipedia.org/w/index.php?title=Jeffreys_prior&oldid=1193498534"

    Category: 
    Bayesian statistics
    Hidden categories: 
    Articles with short description
    Short description is different from Wikidata
    All articles with unsourced statements
    Articles with unsourced statements from September 2018
     



    This page was last edited on 4 January 2024, at 03:29 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki