J u m p t o c o n t e n t
M a i n m e n u
M a i n m e n u
N a v i g a t i o n
● M a i n p a g e
● C o n t e n t s
● C u r r e n t e v e n t s
● R a n d o m a r t i c l e
● A b o u t W i k i p e d i a
● C o n t a c t u s
● D o n a t e
C o n t r i b u t e
● H e l p
● L e a r n t o e d i t
● C o m m u n i t y p o r t a l
● R e c e n t c h a n g e s
● U p l o a d f i l e
S e a r c h
Search
A p p e a r a n c e
● C r e a t e a c c o u n t
● L o g i n
P e r s o n a l t o o l s
● C r e a t e a c c o u n t
● L o g i n
P a g e s f o r l o g g e d o u t e d i t o r s l e a r n m o r e
● C o n t r i b u t i o n s
● T a l k
( T o p )
1
C h a i n r u l e f o r e v e n t s
T o g g l e C h a i n r u l e f o r e v e n t s s u b s e c t i o n
1 . 1
T w o e v e n t s
1 . 1 . 1
E x a m p l e
1 . 2
F i n i t e l y m a n y e v e n t s
1 . 2 . 1
E x a m p l e 1
1 . 2 . 2
E x a m p l e 2
1 . 3
S t a t e m e n t o f t h e t h e o r e m a n d p r o o f
2
C h a i n r u l e f o r d i s c r e t e r a n d o m v a r i a b l e s
T o g g l e C h a i n r u l e f o r d i s c r e t e r a n d o m v a r i a b l e s s u b s e c t i o n
2 . 1
T w o r a n d o m v a r i a b l e s
2 . 2
F i n i t e l y m a n y r a n d o m v a r i a b l e s
2 . 3
E x a m p l e
3
B i b l i o g r a p h y
4
R e f e r e n c e s
T o g g l e t h e t a b l e o f c o n t e n t s
C h a i n r u l e ( p r o b a b i l i t y )
7 l a n g u a g e s
● Б е л а р у с к а я
● ف ا ر س ی
● F r a n ç a i s
● ע ב ר י ת
● T ü r k ç e
● У к р а ї н с ь к а
● 粵 語
E d i t l i n k s
● A r t i c l e
● T a l k
E n g l i s h
● R e a d
● E d i t
● V i e w h i s t o r y
T o o l s
T o o l s
A c t i o n s
● R e a d
● E d i t
● V i e w h i s t o r y
G e n e r a l
● W h a t l i n k s h e r e
● R e l a t e d c h a n g e s
● U p l o a d f i l e
● S p e c i a l p a g e s
● P e r m a n e n t l i n k
● P a g e i n f o r m a t i o n
● C i t e t h i s p a g e
● G e t s h o r t e n e d U R L
● D o w n l o a d Q R c o d e
● W i k i d a t a i t e m
P r i n t / e x p o r t
● D o w n l o a d a s P D F
● P r i n t a b l e v e r s i o n
A p p e a r a n c e
F r o m W i k i p e d i a , t h e f r e e e n c y c l o p e d i a
( R e d i r e c t e d f r o m C h a i n r u l e o f p r o b a b i l i t y )
Probability theory concept
Not to be confused with the
chain rule in calculus.
In probability theory , the chain rule [1] (also called the general product rule [2] [3] ) describes how to calculate the probability of the intersection of, not necessarily independent , events or the joint distribution of random variables respectively, using conditional probabilities . This rule allows you to express a joint probability in terms of only conditional probabilities.[4] The rule is notably used in the context of discrete stochastic processes and in applications, e.g. the study of Bayesian networks , which describe a probability distribution in terms of conditional probabilities.
Chain rule for events [ edit ]
Two events [ edit ]
For two events
A
{\displaystyle A}
and
B
{\displaystyle B}
, the chain rule states that
P
(
A
∩
B
)
=
P
(
B
∣
A
)
P
(
A
)
{\displaystyle \mathbb {P} (A\cap B)=\mathbb {P} (B\mid A)\mathbb {P} (A )}
,
where
P
(
B
∣
A
)
{\displaystyle \mathbb {P} (B\mid A)}
denotes the conditional probability of
B
{\displaystyle B}
given
A
{\displaystyle A}
.
Example [ edit ]
An Urn A has 1 black ball and 2 white balls and another Urn B has 1 black ball and 3 white balls. Suppose we pick an urn at random and then select a ball from that urn. Let event
A
{\displaystyle A}
be choosing the first urn, i.e.
P
(
A
)
=
P
(
A
¯
)
=
1
/
2
{\displaystyle \mathbb {P} (A )=\mathbb {P} ({\overline {A}})=1/2}
, where
A
¯
{\displaystyle {\overline {A}}}
is the complementary event of
A
{\displaystyle A}
. Let event
B
{\displaystyle B}
be the chance we choose a white ball. The chance of choosing a white ball, given that we have chosen the first urn, is
P
(
B
|
A
)
=
2
/
3.
{\displaystyle \mathbb {P} (B|A)=2/3.}
The intersection
A
∩
B
{\displaystyle A\cap B}
then describes choosing the first urn and a white ball from it. The probability can be calculated by the chain rule as follows:
P
(
A
∩
B
)
=
P
(
B
∣
A
)
P
(
A
)
=
2
3
⋅
1
2
=
1
3
.
{\displaystyle \mathbb {P} (A\cap B)=\mathbb {P} (B\mid A)\mathbb {P} (A )={\frac {2}{3}}\cdot {\frac {1}{2}}={\frac {1}{3}}.}
Finitely many events [ edit ]
For events
A
1
,
…
,
A
n
{\displaystyle A_{1},\ldots ,A_{n}}
whose intersection has not probability zero, the chain rule states
P
(
A
1
∩
A
2
∩
…
∩
A
n
)
=
P
(
A
n
∣
A
1
∩
…
∩
A
n
−
1
)
P
(
A
1
∩
…
∩
A
n
−
1
)
=
P
(
A
n
∣
A
1
∩
…
∩
A
n
−
1
)
P
(
A
n
−
1
∣
A
1
∩
…
∩
A
n
−
2
)
P
(
A
1
∩
…
∩
A
n
−
2
)
=
P
(
A
n
∣
A
1
∩
…
∩
A
n
−
1
)
P
(
A
n
−
1
∣
A
1
∩
…
∩
A
n
−
2
)
⋅
…
⋅
P
(
A
3
∣
A
1
∩
A
2
)
P
(
A
2
∣
A
1
)
P
(
A
1
)
=
P
(
A
1
)
P
(
A
2
∣
A
1
)
P
(
A
3
∣
A
1
∩
A
2
)
⋅
…
⋅
P
(
A
n
∣
A
1
∩
⋯
∩
A
n
−
1
)
=
∏
k
=
1
n
P
(
A
k
∣
A
1
∩
⋯
∩
A
k
−
1
)
=
∏
k
=
1
n
P
(
A
k
|
⋂
j
=
1
k
−
1
A
j
)
.
{\displaystyle {\begin{aligned}\mathbb {P} \left(A_{1}\cap A_{2}\cap \ldots \cap A_{n}\right)&=\mathbb {P} \left(A_{n}\mid A_{1}\cap \ldots \cap A_{n-1}\right)\mathbb {P} \left(A_{1}\cap \ldots \cap A_{n-1}\right)\\&=\mathbb {P} \left(A_{n}\mid A_{1}\cap \ldots \cap A_{n-1}\right)\mathbb {P} \left(A_{n-1}\mid A_{1}\cap \ldots \cap A_{n-2}\right)\mathbb {P} \left(A_{1}\cap \ldots \cap A_{n-2}\right)\\&=\mathbb {P} \left(A_{n}\mid A_{1}\cap \ldots \cap A_{n-1}\right)\mathbb {P} \left(A_{n-1}\mid A_{1}\cap \ldots \cap A_{n-2}\right)\cdot \ldots \cdot \mathbb {P} (A_{3}\mid A_{1}\cap A_{2})\mathbb {P} (A_{2}\mid A_{1})\mathbb {P} (A_{1})\\&=\mathbb {P} (A_{1})\mathbb {P} (A_{2}\mid A_{1})\mathbb {P} (A_{3}\mid A_{1}\cap A_{2})\cdot \ldots \cdot \mathbb {P} (A_{n}\mid A_{1}\cap \dots \cap A_{n-1})\\&=\prod _{k=1}^{n}\mathbb {P} (A_{k}\mid A_{1}\cap \dots \cap A_{k-1})\\&=\prod _{k=1}^{n}\mathbb {P} \left(A_{k}\,{\Bigg |}\,\bigcap _{j=1}^{k-1}A_{j}\right).\end{aligned}}}
Example 1 [ edit ]
For
n
=
4
{\displaystyle n=4}
, i.e. four events, the chain rule reads
P
(
A
1
∩
A
2
∩
A
3
∩
A
4
)
=
P
(
A
4
∣
A
3
∩
A
2
∩
A
1
)
P
(
A
3
∩
A
2
∩
A
1
)
=
P
(
A
4
∣
A
3
∩
A
2
∩
A
1
)
P
(
A
3
∣
A
2
∩
A
1
)
P
(
A
2
∩
A
1
)
=
P
(
A
4
∣
A
3
∩
A
2
∩
A
1
)
P
(
A
3
∣
A
2
∩
A
1
)
P
(
A
2
∣
A
1
)
P
(
A
1
)
{\displaystyle {\begin{aligned}\mathbb {P} (A_{1}\cap A_{2}\cap A_{3}\cap A_{4})&=\mathbb {P} (A_{4}\mid A_{3}\cap A_{2}\cap A_{1})\mathbb {P} (A_{3}\cap A_{2}\cap A_{1})\\&=\mathbb {P} (A_{4}\mid A_{3}\cap A_{2}\cap A_{1})\mathbb {P} (A_{3}\mid A_{2}\cap A_{1})\mathbb {P} (A_{2}\cap A_{1})\\&=\mathbb {P} (A_{4}\mid A_{3}\cap A_{2}\cap A_{1})\mathbb {P} (A_{3}\mid A_{2}\cap A_{1})\mathbb {P} (A_{2}\mid A_{1})\mathbb {P} (A_{1})\end{aligned}}}
.
Example 2 [ edit ]
We randomly draw 4 cards without replacement from deck with 52 cards. What is the probability that we have picked 4 aces?
First, we set
A
n
:=
{
draw an ace in the
n
th
try
}
{\textstyle A_{n}:=\left\{{\text{draw an ace in the }}n^{\text{th}}{\text{ try}}\right\}}
. Obviously, we get the following probabilities
P
(
A
1
)
=
4
52
,
P
(
A
2
∣
A
1
)
=
3
51
,
P
(
A
3
∣
A
1
∩
A
2
)
=
2
50
,
P
(
A
4
∣
A
1
∩
A
2
∩
A
3
)
=
1
49
{\displaystyle \mathbb {P} (A_{1})={\frac {4}{52}},\qquad \mathbb {P} (A_{2}\mid A_{1})={\frac {3}{51}},\qquad \mathbb {P} (A_{3}\mid A_{1}\cap A_{2})={\frac {2}{50}},\qquad \mathbb {P} (A_{4}\mid A_{1}\cap A_{2}\cap A_{3})={\frac {1}{49}}}
.
Applying the chain rule,
P
(
A
1
∩
A
2
∩
A
3
∩
A
4
)
=
4
52
⋅
3
51
⋅
2
50
⋅
1
49
{\displaystyle \mathbb {P} (A_{1}\cap A_{2}\cap A_{3}\cap A_{4})={\frac {4}{52}}\cdot {\frac {3}{51}}\cdot {\frac {2}{50}}\cdot {\frac {1}{49}}}
.
Statement of the theorem and proof [ edit ]
Let
(
Ω
,
A
,
P
)
{\displaystyle (\Omega ,{\mathcal {A}},\mathbb {P} )}
be a probability space. Recall that the conditional probability of an
A
∈
A
{\displaystyle A\in {\mathcal {A}}}
given
B
∈
A
{\displaystyle B\in {\mathcal {A}}}
is defined as
P
(
A
∣
B
)
:=
{
P
(
A
∩
B
)
P
(
B
)
,
P
(
B
)
>
0
,
0
P
(
B
)
=
0.
{\displaystyle {\begin{aligned}\mathbb {P} (A\mid B):={\begin{cases}{\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B )}},&\mathbb {P} (B )>0,\\0&\mathbb {P} (B )=0.\end{cases}}\end{aligned}}}
Then we have the following theorem.
Chain rule — Let
(
Ω
,
A
,
P
)
{\displaystyle (\Omega ,{\mathcal {A}},\mathbb {P} )}
be a probability space. Let
A
1
,
.
.
.
,
A
n
∈
A
{\displaystyle A_{1},...,A_{n}\in {\mathcal {A}}}
. Then
P
(
A
1
∩
A
2
∩
…
∩
A
n
)
=
P
(
A
1
)
P
(
A
2
∣
A
1
)
P
(
A
3
∣
A
1
∩
A
2
)
⋅
…
⋅
P
(
A
n
∣
A
1
∩
⋯
∩
A
n
−
1
)
=
P
(
A
1
)
∏
j
=
2
n
P
(
A
j
∣
A
1
∩
⋯
∩
A
j
−
1
)
.
{\displaystyle {\begin{aligned}\mathbb {P} \left(A_{1}\cap A_{2}\cap \ldots \cap A_{n}\right)&=\mathbb {P} (A_{1})\mathbb {P} (A_{2}\mid A_{1})\mathbb {P} (A_{3}\mid A_{1}\cap A_{2})\cdot \ldots \cdot \mathbb {P} (A_{n}\mid A_{1}\cap \dots \cap A_{n-1})\\&=\mathbb {P} (A_{1})\prod _{j=2}^{n}\mathbb {P} (A_{j}\mid A_{1}\cap \dots \cap A_{j-1}).\end{aligned}}}
Proof
The formula follows immediately by recursion
(
1
)
P
(
A
1
)
P
(
A
2
∣
A
1
)
=
P
(
A
1
∩
A
2
)
(
2
)
P
(
A
1
)
P
(
A
2
∣
A
1
)
P
(
A
3
∣
A
1
∩
A
2
)
=
P
(
A
1
∩
A
2
)
P
(
A
3
∣
A
1
∩
A
2
)
=
P
(
A
1
∩
A
2
∩
A
3
)
,
{\displaystyle {\begin{aligned}(1 )&&&\mathbb {P} (A_{1})\mathbb {P} (A_{2}\mid A_{1})&=&\qquad \mathbb {P} (A_{1}\cap A_{2})\\(2 )&&&\mathbb {P} (A_{1})\mathbb {P} (A_{2}\mid A_{1})\mathbb {P} (A_{3}\mid A_{1}\cap A_{2})&=&\qquad \mathbb {P} (A_{1}\cap A_{2})\mathbb {P} (A_{3}\mid A_{1}\cap A_{2})\\&&&&=&\qquad \mathbb {P} (A_{1}\cap A_{2}\cap A_{3}),\end{aligned}}}
where we used the definition of the conditional probability in the first step.
Chain rule for discrete random variables [ edit ]
Two random variables [ edit ]
For two discrete random variables
X
,
Y
{\displaystyle X,Y}
, we use the events
A
:=
{
X
=
x
}
{\displaystyle A:=\{X=x\}}
and
B
:=
{
Y
=
y
}
{\displaystyle B:=\{Y=y\}}
in the definition above, and find the joint distribution as
P
(
X
=
x
,
Y
=
y
)
=
P
(
X
=
x
∣
Y
=
y
)
P
(
Y
=
y
)
,
{\displaystyle \mathbb {P} (X=x,Y=y)=\mathbb {P} (X=x\mid Y=y)\mathbb {P} (Y=y),}
or
P
(
X
,
Y
)
(
x
,
y
)
=
P
X
∣
Y
(
x
∣
y
)
P
Y
(
y
)
,
{\displaystyle \mathbb {P} _{(X,Y)}(x,y)=\mathbb {P} _{X\mid Y}(x\mid y)\mathbb {P} _{Y}(y ),}
where
P
X
(
x
)
:=
P
(
X
=
x
)
{\displaystyle \mathbb {P} _{X}(x ):=\mathbb {P} (X=x)}
is the probability distribution of
X
{\displaystyle X}
and
P
X
∣
Y
(
x
∣
y
)
{\displaystyle \mathbb {P} _{X\mid Y}(x\mid y)}
conditional probability distribution of
X
{\displaystyle X}
given
Y
{\displaystyle Y}
.
Finitely many random variables [ edit ]
Let
X
1
,
…
,
X
n
{\displaystyle X_{1},\ldots ,X_{n}}
be random variables and
x
1
,
…
,
x
n
∈
R
{\displaystyle x_{1},\dots ,x_{n}\in \mathbb {R} }
. By the definition of the conditional probability,
P
(
X
n
=
x
n
,
…
,
X
1
=
x
1
)
=
P
(
X
n
=
x
n
|
X
n
−
1
=
x
n
−
1
,
…
,
X
1
=
x
1
)
P
(
X
n
−
1
=
x
n
−
1
,
…
,
X
1
=
x
1
)
{\displaystyle \mathbb {P} \left(X_{n}=x_{n},\ldots ,X_{1}=x_{1}\right)=\mathbb {P} \left(X_{n}=x_{n}|X_{n-1}=x_{n-1},\ldots ,X_{1}=x_{1}\right)\mathbb {P} \left(X_{n-1}=x_{n-1},\ldots ,X_{1}=x_{1}\right)}
and using the chain rule, where we set
A
k
:=
{
X
k
=
x
k
}
{\displaystyle A_{k}:=\{X_{k}=x_{k}\}}
, we can find the joint distribution as
P
(
X
1
=
x
1
,
…
X
n
=
x
n
)
=
P
(
X
1
=
x
1
∣
X
2
=
x
2
,
…
,
X
n
=
x
n
)
P
(
X
2
=
x
2
,
…
,
X
n
=
x
n
)
=
P
(
X
1
=
x
1
)
P
(
X
2
=
x
2
∣
X
1
=
x
1
)
P
(
X
3
=
x
3
∣
X
1
=
x
1
,
X
2
=
x
2
)
⋅
…
⋅
P
(
X
n
=
x
n
∣
X
1
=
x
1
,
…
,
X
n
−
1
=
x
n
−
1
)
{\displaystyle {\begin{aligned}\mathbb {P} \left(X_{1}=x_{1},\ldots X_{n}=x_{n}\right)&=\mathbb {P} \left(X_{1}=x_{1}\mid X_{2}=x_{2},\ldots ,X_{n}=x_{n}\right)\mathbb {P} \left(X_{2}=x_{2},\ldots ,X_{n}=x_{n}\right)\\&=\mathbb {P} (X_{1}=x_{1})\mathbb {P} (X_{2}=x_{2}\mid X_{1}=x_{1})\mathbb {P} (X_{3}=x_{3}\mid X_{1}=x_{1},X_{2}=x_{2})\cdot \ldots \\&\qquad \cdot \mathbb {P} (X_{n}=x_{n}\mid X_{1}=x_{1},\dots ,X_{n-1}=x_{n-1})\\\end{aligned}}}
Example [ edit ]
For
n
=
3
{\displaystyle n=3}
, i.e. considering three random variables. Then, the chain rule reads
P
(
X
1
,
X
2
,
X
3
)
(
x
1
,
x
2
,
x
3
)
=
P
(
X
1
=
x
1
,
X
2
=
x
2
,
X
3
=
x
3
)
=
P
(
X
3
=
x
3
∣
X
2
=
x
2
,
X
1
=
x
1
)
P
(
X
2
=
x
2
,
X
1
=
x
1
)
=
P
(
X
3
=
x
3
∣
X
2
=
x
2
,
X
1
=
x
1
)
P
(
X
2
=
x
2
∣
X
1
=
x
1
)
P
(
X
1
=
x
1
)
=
P
X
3
∣
X
2
,
X
1
(
x
3
∣
x
2
,
x
1
)
P
X
2
∣
X
1
(
x
2
∣
x
1
)
P
X
1
(
x
1
)
.
{\displaystyle {\begin{aligned}\mathbb {P} _{(X_{1},X_{2},X_{3})}(x_{1},x_{2},x_{3})&=\mathbb {P} (X_{1}=x_{1},X_{2}=x_{2},X_{3}=x_{3})\\&=\mathbb {P} (X_{3}=x_{3}\mid X_{2}=x_{2},X_{1}=x_{1})\mathbb {P} (X_{2}=x_{2},X_{1}=x_{1})\\&=\mathbb {P} (X_{3}=x_{3}\mid X_{2}=x_{2},X_{1}=x_{1})\mathbb {P} (X_{2}=x_{2}\mid X_{1}=x_{1})\mathbb {P} (X_{1}=x_{1})\\&=\mathbb {P} _{X_{3}\mid X_{2},X_{1}}(x_{3}\mid x_{2},x_{1})\mathbb {P} _{X_{2}\mid X_{1}}(x_{2}\mid x_{1})\mathbb {P} _{X_{1}}(x_{1}).\end{aligned}}}
Bibliography [ edit ]
René L. Schilling (2021), Measure, Integral, Probability & Processes - Probab(ilistical)ly the Theoretical Minimum (1 ed.), Technische Universität Dresden, Germany, ISBN 979-8-5991-0488-9 {{citation }}
: CS1 maint: location missing publisher (link )
William Feller (1968), An Introduction to Probability Theory and Its Applications , vol. I (3 ed.), New York / London / Sydney: Wiley, ISBN 978-0-471-25708-0
Russell, Stuart J. ; Norvig, Peter (2003), Artificial Intelligence: A Modern Approach (2nd ed.), Upper Saddle River, New Jersey: Prentice Hall, ISBN 0-13-790395-2 , p. 496.
References [ edit ]
^ Schilling, René L. (2021). Measure, Integral, Probability & Processes - Probab(ilistical)ly the Theoretical Minimum . Technische Universität Dresden, Germany. p. 136ff. ISBN 979-8-5991-0488-9 . {{cite book }}
: CS1 maint: location missing publisher (link )
^ Schum, David A. (1994). The Evidential Foundations of Probabilistic Reasoning . Northwestern University Press. p. 49. ISBN 978-0-8101-1821-8 .
^ Klugh, Henry E. (2013). Statistics: The Essentials for Research (3rd ed.). Psychology Press. p. 149. ISBN 978-1-134-92862-0 .
^ Virtue, Pat. "10-606: Mathematical Foundations for Machine Learning" (PDF) .
R e t r i e v e d f r o m " https://en.wikipedia.org/w/index.php?title=Chain_rule_(probability)&oldid=1228720091 "
C a t e g o r i e s :
● B a y e s i a n i n f e r e n c e
● B a y e s i a n s t a t i s t i c s
● M a t h e m a t i c a l i d e n t i t i e s
● P r o b a b i l i t y t h e o r y
H i d d e n c a t e g o r i e s :
● C S 1 m a i n t : l o c a t i o n m i s s i n g p u b l i s h e r
● A r t i c l e s w i t h s h o r t d e s c r i p t i o n
● S h o r t d e s c r i p t i o n i s d i f f e r e n t f r o m W i k i d a t a
● T h i s p a g e w a s l a s t e d i t e d o n 1 2 J u n e 2 0 2 4 , a t 2 0 : 1 2 ( U T C ) .
● T e x t i s a v a i l a b l e u n d e r t h e C r e a t i v e C o m m o n s A t t r i b u t i o n - S h a r e A l i k e L i c e n s e 4 . 0 ;
a d d i t i o n a l t e r m s m a y a p p l y . B y u s i n g t h i s s i t e , y o u a g r e e t o t h e T e r m s o f U s e a n d P r i v a c y P o l i c y . W i k i p e d i a ® i s a r e g i s t e r e d t r a d e m a r k o f t h e W i k i m e d i a F o u n d a t i o n , I n c . , a n o n - p r o f i t o r g a n i z a t i o n .
● P r i v a c y p o l i c y
● A b o u t W i k i p e d i a
● D i s c l a i m e r s
● C o n t a c t W i k i p e d i a
● C o d e o f C o n d u c t
● D e v e l o p e r s
● S t a t i s t i c s
● C o o k i e s t a t e m e n t
● M o b i l e v i e w