Jump to content

Main menu Navigation ●Main page ●Contents ●Current events ●Random article ●About Wikipedia ●Contact us ●Donate Contribute ●Help ●Learn to edit ●Community portal ●Recent changes ●Upload file

●Create account ●Log in ●Create account ● Log in Pages for logged out editors learn more ●Contributions ●Talk

(Top) 1 Chain rule for events 1.1 Two events 1.1.1 Example 1.2 Finitely many events 1.2.1 Example 1 1.2.2 Example 2 1.3 Statement of the theorem and proof 2 Chain rule for discrete random variables 2.1 Two random variables 2.2 Finitely many random variables 2.3 Example 3 Bibliography 4 References

Chain rule (probability)

●Беларуская ●فارسی ●Français ●עברית ●Türkçe ●Українська ●粵語 Edit links ●Article ●Talk ●Read ●Edit ●View history Tools Actions ●Read ●Edit ●View history General ●What links here ●Related changes ●Upload file ●Special pages ●Permanent link ●Page information ●Cite this page ●Get shortened URL ●Download QR code ●Wikidata item Print/export ●Download as PDF ●Printable version Appearance From Wikipedia, the free encyclopedia (Redirected from Chain rule of probability)

Inprobability theory, the chain rule^[1] (also called the general product rule^[2]^[3]) describes how to calculate the probability of the intersection of, not necessarily independent, events or the joint distributionofrandom variables respectively, using conditional probabilities. This rule allows you to express a joint probability in terms of only conditional probabilities.^[4] The rule is notably used in the context of discrete stochastic processes and in applications, e.g. the study of Bayesian networks, which describe a probability distribution in terms of conditional probabilities.

Chain rule for events[edit]

Two events[edit]

For two events $A$ and $B$ , the chain rule states that

{\displaystyle \mathbb {P} (A\cap B)=\mathbb {P} (B\mid A)\mathbb {P} (

where $\mathbb {P} (B\mid A)$ denotes the conditional probabilityof $B$ given $A$ .

Example[edit]

An Urn A has 1 black ball and 2 white balls and another Urn B has 1 black ball and 3 white balls. Suppose we pick an urn at random and then select a ball from that urn. Let event $A$ be choosing the first urn, i.e. ${\displaystyle \mathbb {P} ($ , where ${\overline {A}}$ is the complementary eventof $A$ . Let event $B$ be the chance we choose a white ball. The chance of choosing a white ball, given that we have chosen the first urn, is $\mathbb {P} (B|A)=2/3.$ The intersection $A\cap B$ then describes choosing the first urn and a white ball from it. The probability can be calculated by the chain rule as follows:

{\displaystyle \mathbb {P} (A\cap B)=\mathbb {P} (B\mid A)\mathbb {P} (

Finitely many events[edit]

For events $A_{1},\ldots ,A_{n}$ whose intersection has not probability zero, the chain rule states

{\begin{aligned}\mathbb {P} \left(A_{1}\cap A_{2}\cap \ldots \cap A_{n}\right)&=\mathbb {P} \left(A_{n}\mid A_{1}\cap \ldots \cap A_{n-1}\right)\mathbb {P} \left(A_{1}\cap \ldots \cap A_{n-1}\right)\\&=\mathbb {P} \left(A_{n}\mid A_{1}\cap \ldots \cap A_{n-1}\right)\mathbb {P} \left(A_{n-1}\mid A_{1}\cap \ldots \cap A_{n-2}\right)\mathbb {P} \left(A_{1}\cap \ldots \cap A_{n-2}\right)\\&=\mathbb {P} \left(A_{n}\mid A_{1}\cap \ldots \cap A_{n-1}\right)\mathbb {P} \left(A_{n-1}\mid A_{1}\cap \ldots \cap A_{n-2}\right)\cdot \ldots \cdot \mathbb {P} (A_{3}\mid A_{1}\cap A_{2})\mathbb {P} (A_{2}\mid A_{1})\mathbb {P} (A_{1})\\&=\mathbb {P} (A_{1})\mathbb {P} (A_{2}\mid A_{1})\mathbb {P} (A_{3}\mid A_{1}\cap A_{2})\cdot \ldots \cdot \mathbb {P} (A_{n}\mid A_{1}\cap \dots \cap A_{n-1})\\&=\prod _{k=1}^{n}\mathbb {P} (A_{k}\mid A_{1}\cap \dots \cap A_{k-1})\\&=\prod _{k=1}^{n}\mathbb {P} \left(A_{k}\,{\Bigg |}\,\bigcap _{j=1}^{k-1}A_{j}\right).\end{aligned}}

Example 1[edit]

For $n=4$ , i.e. four events, the chain rule reads

{\begin{aligned}\mathbb {P} (A_{1}\cap A_{2}\cap A_{3}\cap A_{4})&=\mathbb {P} (A_{4}\mid A_{3}\cap A_{2}\cap A_{1})\mathbb {P} (A_{3}\cap A_{2}\cap A_{1})\\&=\mathbb {P} (A_{4}\mid A_{3}\cap A_{2}\cap A_{1})\mathbb {P} (A_{3}\mid A_{2}\cap A_{1})\mathbb {P} (A_{2}\cap A_{1})\\&=\mathbb {P} (A_{4}\mid A_{3}\cap A_{2}\cap A_{1})\mathbb {P} (A_{3}\mid A_{2}\cap A_{1})\mathbb {P} (A_{2}\mid A_{1})\mathbb {P} (A_{1})\end{aligned}}

Example 2[edit]

We randomly draw 4 cards without replacement from deck with 52 cards. What is the probability that we have picked 4 aces?

First, we set ${\textstyle A_{n}:=\left\{{\text{draw an ace in the }}n^{\text{th}}{\text{ try}}\right\}}$ . Obviously, we get the following probabilities

\mathbb {P} (A_{1})={\frac {4}{52}},\qquad \mathbb {P} (A_{2}\mid A_{1})={\frac {3}{51}},\qquad \mathbb {P} (A_{3}\mid A_{1}\cap A_{2})={\frac {2}{50}},\qquad \mathbb {P} (A_{4}\mid A_{1}\cap A_{2}\cap A_{3})={\frac {1}{49}}

Applying the chain rule,

\mathbb {P} (A_{1}\cap A_{2}\cap A_{3}\cap A_{4})={\frac {4}{52}}\cdot {\frac {3}{51}}\cdot {\frac {2}{50}}\cdot {\frac {1}{49}}

Statement of the theorem and proof[edit]

Let $(\Omega ,{\mathcal {A}},\mathbb {P} )$ be a probability space. Recall that the conditional probability of an $A\in {\mathcal {A}}$ given $B\in {\mathcal {A}}$ is defined as

{\displaystyle {\begin{aligned}\mathbb {P} (A\mid B):={\begin{cases}{\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (

Then we have the following theorem.

Chain rule — Let $(\Omega ,{\mathcal {A}},\mathbb {P} )$ be a probability space. Let $A_{1},...,A_{n}\in {\mathcal {A}}$ . Then

{\begin{aligned}\mathbb {P} \left(A_{1}\cap A_{2}\cap \ldots \cap A_{n}\right)&=\mathbb {P} (A_{1})\mathbb {P} (A_{2}\mid A_{1})\mathbb {P} (A_{3}\mid A_{1}\cap A_{2})\cdot \ldots \cdot \mathbb {P} (A_{n}\mid A_{1}\cap \dots \cap A_{n-1})\\&=\mathbb {P} (A_{1})\prod _{j=2}^{n}\mathbb {P} (A_{j}\mid A_{1}\cap \dots \cap A_{j-1}).\end{aligned}}

Proof

The formula follows immediately by recursion

{\displaystyle {\begin{aligned}(

where we used the definition of the conditional probability in the first step.

Chain rule for discrete random variables[edit]

Two random variables[edit]

For two discrete random variables $X,Y$ , we use the events $A:=\{X=x\}$ and $B:=\{Y=y\}$ in the definition above, and find the joint distribution as

\mathbb {P} (X=x,Y=y)=\mathbb {P} (X=x\mid Y=y)\mathbb {P} (Y=y),

{\displaystyle \mathbb {P} _{(X,Y)}(x,y)=\mathbb {P} _{X\mid Y}(x\mid y)\mathbb {P} _{Y}(

where ${\displaystyle \mathbb {P} _{X}($ is the probability distributionof $X$ and $\mathbb {P} _{X\mid Y}(x\mid y)$ conditional probability distributionof $X$ given $Y$ .

Finitely many random variables[edit]

Let $X_{1},\ldots ,X_{n}$ be random variables and $x_{1},\dots ,x_{n}\in \mathbb {R}$ . By the definition of the conditional probability,

\mathbb {P} \left(X_{n}=x_{n},\ldots ,X_{1}=x_{1}\right)=\mathbb {P} \left(X_{n}=x_{n}|X_{n-1}=x_{n-1},\ldots ,X_{1}=x_{1}\right)\mathbb {P} \left(X_{n-1}=x_{n-1},\ldots ,X_{1}=x_{1}\right)

and using the chain rule, where we set $A_{k}:=\{X_{k}=x_{k}\}$ , we can find the joint distribution as

{\begin{aligned}\mathbb {P} \left(X_{1}=x_{1},\ldots X_{n}=x_{n}\right)&=\mathbb {P} \left(X_{1}=x_{1}\mid X_{2}=x_{2},\ldots ,X_{n}=x_{n}\right)\mathbb {P} \left(X_{2}=x_{2},\ldots ,X_{n}=x_{n}\right)\\&=\mathbb {P} (X_{1}=x_{1})\mathbb {P} (X_{2}=x_{2}\mid X_{1}=x_{1})\mathbb {P} (X_{3}=x_{3}\mid X_{1}=x_{1},X_{2}=x_{2})\cdot \ldots \\&\qquad \cdot \mathbb {P} (X_{n}=x_{n}\mid X_{1}=x_{1},\dots ,X_{n-1}=x_{n-1})\\\end{aligned}}

Example[edit]

For $n=3$ , i.e. considering three random variables. Then, the chain rule reads

{\begin{aligned}\mathbb {P} _{(X_{1},X_{2},X_{3})}(x_{1},x_{2},x_{3})&=\mathbb {P} (X_{1}=x_{1},X_{2}=x_{2},X_{3}=x_{3})\\&=\mathbb {P} (X_{3}=x_{3}\mid X_{2}=x_{2},X_{1}=x_{1})\mathbb {P} (X_{2}=x_{2},X_{1}=x_{1})\\&=\mathbb {P} (X_{3}=x_{3}\mid X_{2}=x_{2},X_{1}=x_{1})\mathbb {P} (X_{2}=x_{2}\mid X_{1}=x_{1})\mathbb {P} (X_{1}=x_{1})\\&=\mathbb {P} _{X_{3}\mid X_{2},X_{1}}(x_{3}\mid x_{2},x_{1})\mathbb {P} _{X_{2}\mid X_{1}}(x_{2}\mid x_{1})\mathbb {P} _{X_{1}}(x_{1}).\end{aligned}}

Bibliography[edit]

René L. Schilling (2021), Measure, Integral, Probability & Processes - Probab(ilistical)ly the Theoretical Minimum (1 ed.), Technische Universität Dresden, Germany, ISBN 979-8-5991-0488-9{{citation}}: CS1 maint: location missing publisher (link)
William Feller (1968), An Introduction to Probability Theory and Its Applications, vol. I (3 ed.), New York / London / Sydney: Wiley, ISBN 978-0-471-25708-0
Russell, Stuart J.; Norvig, Peter (2003), Artificial Intelligence: A Modern Approach (2nd ed.), Upper Saddle River, New Jersey: Prentice Hall, ISBN 0-13-790395-2, p. 496.

References[edit]

^ Schilling, René L. (2021). Measure, Integral, Probability & Processes - Probab(ilistical)ly the Theoretical Minimum. Technische Universität Dresden, Germany. p. 136ff. ISBN 979-8-5991-0488-9.{{cite book}}: CS1 maint: location missing publisher (link)

^ Schum, David A. (1994). The Evidential Foundations of Probabilistic Reasoning. Northwestern University Press. p. 49. ISBN 978-0-8101-1821-8.

^ Klugh, Henry E. (2013). Statistics: The Essentials for Research (3rd ed.). Psychology Press. p. 149. ISBN 978-1-134-92862-0.

^ Virtue, Pat. "10-606: Mathematical Foundations for Machine Learning" (PDF).

Retrieved from "https://en.wikipedia.org/w/index.php?title=Chain_rule_(probability)&oldid=1228720091" Categories: ●Bayesian inference ●Bayesian statistics ●Mathematical identities ●Probability theory Hidden categories: ●CS1 maint: location missing publisher ●Articles with short description ●Short description is different from Wikidata ●This page was last edited on 12 June 2024, at 20:12 (UTC). ●Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. ●Privacy policy ●About Wikipedia ●Disclaimers ●Contact Wikipedia ●Code of Conduct ●Developers ●Statistics ●Cookie statement ●Mobile view