Probability with Measure
\(\newcommand{\footnotename}{footnote}\)
\(\def \LWRfootnote {1}\)
\(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\)
\(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\)
\(\let \LWRorighspace \hspace \)
\(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\)
\(\newcommand {\mathnormal }[1]{{#1}}\)
\(\newcommand \ensuremath [1]{#1}\)
\(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \)
\(\newcommand {\setlength }[2]{}\)
\(\newcommand {\addtolength }[2]{}\)
\(\newcommand {\setcounter }[2]{}\)
\(\newcommand {\addtocounter }[2]{}\)
\(\newcommand {\arabic }[1]{}\)
\(\newcommand {\number }[1]{}\)
\(\newcommand {\noalign }[1]{\text {#1}\notag \\}\)
\(\newcommand {\cline }[1]{}\)
\(\newcommand {\directlua }[1]{\text {(directlua)}}\)
\(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\)
\(\newcommand {\protect }{}\)
\(\def \LWRabsorbnumber #1 {}\)
\(\def \LWRabsorbquotenumber "#1 {}\)
\(\newcommand {\LWRabsorboption }[1][]{}\)
\(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\)
\(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\)
\(\def \mathcode #1={\mathchar }\)
\(\let \delcode \mathcode \)
\(\let \delimiter \mathchar \)
\(\def \oe {\unicode {x0153}}\)
\(\def \OE {\unicode {x0152}}\)
\(\def \ae {\unicode {x00E6}}\)
\(\def \AE {\unicode {x00C6}}\)
\(\def \aa {\unicode {x00E5}}\)
\(\def \AA {\unicode {x00C5}}\)
\(\def \o {\unicode {x00F8}}\)
\(\def \O {\unicode {x00D8}}\)
\(\def \l {\unicode {x0142}}\)
\(\def \L {\unicode {x0141}}\)
\(\def \ss {\unicode {x00DF}}\)
\(\def \SS {\unicode {x1E9E}}\)
\(\def \dag {\unicode {x2020}}\)
\(\def \ddag {\unicode {x2021}}\)
\(\def \P {\unicode {x00B6}}\)
\(\def \copyright {\unicode {x00A9}}\)
\(\def \pounds {\unicode {x00A3}}\)
\(\let \LWRref \ref \)
\(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\)
\( \newcommand {\multicolumn }[3]{#3}\)
\(\require {textcomp}\)
\(\newcommand {\intertext }[1]{\text {#1}\notag \\}\)
\(\let \Hat \hat \)
\(\let \Check \check \)
\(\let \Tilde \tilde \)
\(\let \Acute \acute \)
\(\let \Grave \grave \)
\(\let \Dot \dot \)
\(\let \Ddot \ddot \)
\(\let \Breve \breve \)
\(\let \Bar \bar \)
\(\let \Vec \vec \)
\(\DeclareMathOperator {\var }{var}\)
\(\DeclareMathOperator {\cov }{cov}\)
\(\newcommand {\nN }{n \in \mathbb {N}}\)
\(\newcommand {\Br }{{\cal B}(\R )}\)
\(\newcommand {\F }{{\cal F}}\)
\(\newcommand {\ds }{\displaystyle }\)
\(\newcommand {\st }{\stackrel {d}{=}}\)
\(\newcommand {\uc }{\stackrel {uc}{\rightarrow }}\)
\(\newcommand {\la }{\langle }\)
\(\newcommand {\ra }{\rangle }\)
\(\newcommand {\li }{\liminf _{n \rightarrow \infty }}\)
\(\newcommand {\ls }{\limsup _{n \rightarrow \infty }}\)
\(\newcommand {\limn }{\lim _{n \rightarrow \infty }}\)
\(\def \to {\rightarrow }\)
\(\def \iff {\Leftrightarrow }\)
\(\def \ra {\Rightarrow }\)
\(\def \sw {\subseteq }\)
\(\def \mc {\mathcal }\)
\(\def \mb {\mathbb }\)
\(\def \sc {\setminus }\)
\(\def \v {\textbf }\)
\(\def \E {\mb {E}}\)
\(\def \P {\mb {P}}\)
\(\def \R {\mb {R}}\)
\(\def \C {\mb {C}}\)
\(\def \N {\mb {N}}\)
\(\def \Q {\mb {Q}}\)
\(\def \Z {\mb {Z}}\)
\(\def \B {\mb {B}}\)
\(\def \~{\sim }\)
\(\def \-{\,;\,}\)
\(\def \qed {$\blacksquare $}\)
\(\def \1{\unicode {x1D7D9}}\)
\(\def \cadlag {c\`{a}dl\`{a}g}\)
\(\def \p {\partial }\)
\(\def \l {\left }\)
\(\def \r {\right }\)
\(\def \Om {\Omega }\)
\(\def \om {\omega }\)
\(\def \eps {\epsilon }\)
\(\def \de {\delta }\)
\(\def \ov {\overline }\)
\(\def \sr {\stackrel }\)
\(\def \Lp {\mc {L}^p}\)
\(\def \Lq {\mc {L}^q}\)
\(\def \Lone {\mc {L}^1}\)
\(\def \Ltwo {\mc {L}^2}\)
\(\def \toae {\sr {\rm a.e.}{\to }}\)
\(\def \toas {\sr {\rm a.s.}{\to }}\)
\(\def \top {\sr {\mb {\P }}{\to }}\)
\(\def \tod {\sr {\rm d}{\to }}\)
\(\def \toLp {\sr {\Lp }{\to }}\)
\(\def \toLq {\sr {\Lq }{\to }}\)
\(\def \eqae {\sr {\rm a.e.}{=}}\)
\(\def \eqas {\sr {\rm a.s.}{=}}\)
\(\def \eqd {\sr {\rm d}{=}}\)
\(\def \Sa {(S1)}\)
\(\def \Sb {(S2)}\)
\(\def \Sc {(S3)}\)
\(\def \Scp {(S3')}\)
\(\def \Ma {(M1)}\)
\(\def \Mb {(M2)}\)
\(\def \La {(L1)}\)
\(\def \Lb {(L2)}\)
\(\def \Lc {(L3)}\)
\(\def \Ld {(L4)}\)
\(\def \Le {(L5)}\)
6.2 The Paley-Zygmund inequality \((\Delta )\)
Recall that Markov’s and Chebyshev’s inequalities controlled how large \(\P [X\geq c]\) could become, using the moments of the random variable \(X\). The Paley-Zygmund inequality is similar in style, but it
seeks to control how large \(X\) can become relative to \(\E [X]\).
-
Let \(X\) be a non-negative random variable and suppose
that \(0<\E [X^2]<\infty \). Then for any \(\theta \in [0,1]\),
\[\P [X>\theta \E [X]] \geq (1-\theta )^2\frac {\E [X]^2}{\E [X^2]}.\]
Proof: Note that \(X=X\1_{\{X\leq \theta \E [X]\}} + X\1_{X>\theta \E [X]}\) and take
expectations to obtain
\(\seteqnumber{0}{6.}{1}\)
\begin{equation}
\label {eq:pz_0} \E [X]=\E \l [X\1_{\{X\leq \theta \E [X]\}}\r ] + \E \l [X\1_{\{X>\theta \E [X]\}}\r ].
\end{equation}
We will bound the two terms on the right hand side of the above. If \(\1_{\{X\leq \theta \E [X]\}}\) is non-zero then \(X\leq \theta \E [X]\), so also \(X\1_{\{X\leq \theta \E [X]\}}\leq \theta \E
[X]\). Hence we have \(\E [X\1_{X\leq \theta \E [X]}] \leq \theta \E [X]\). For the second term, we apply the Cauchy-Schwarz inequality (Exercise 5.12) and obtain that
\[\E \l [X\1_{\{X>\theta \E [X]\}}\r ]\leq \l (\E [X^2]\E [\1_{\{X>\theta \E [X]\}}^2]\r )^{1/2}.\]
Indicator functions are either zero or one, hence \(\1_A^2=\1_A\), and by (4.2) satisfy \(\E [\1_A]=\P
[A]\). We thus obtain that above equation is bounded above by \((\E [X^2]\P [X>\theta \E [X]])^{1/2}\). Putting all this into (6.2) we obtain
\[\E [X]\leq \theta \E [X] +\l (\E [X^2]\P [X>\theta \E [X]]\r )^{1/2}\]
which rearranges to the required result. ∎
The most common application of the Paley-Zymund inequality is to set \(\theta =0\) and obtain
\(\seteqnumber{0}{6.}{2}\)
\begin{equation}
\label {eq:paley_zygmund_0} \P [X>0]\geq \frac {\E [X]^2}{\E [X^2]}.
\end{equation}
In combination with upper bounds on \(\E [X]\) and lower bounds on \(\E [X^2]\), equation (6.3)
is often used to show that \(X\) is not identically zero. This technique is particularly useful when the random variable \(X\) is known to be a limit of some sequence \((X_n)\), and bounds on \(\E [X]\) and \(\E
[X^2]\) can be obtained from corresponding bounds on \(\E [X_n]\) and \(\E [X_n^2]\) via the monotone and dominated convergence theorems.
More generally (6.3) simply provides a lower bound on the probability that some random variable is
non-zero. It is often called the second moment method.
-
Consider a graph with \(n\) vertices. For each (unordered) pair of distinct vertices
\((v_1,v_2)\), the edge from \(v_1\) to \(v_2\) is present in the graph with probability \(p\), independent of all other pairs of vertices. This graph is known as the Erdős-Renyi graph, often denoted by
\(G(n,p)\). We are interested in whether \(G(n,p)\) contains isolated vertices, which are vertices that have no edges connected to them, as \(n\to \infty \).
Let \(E_i\) be the event that vertex \(i\) is isolated, and let \(Y_n=\sum _{i=1}^n \1_{E_i}\) be the number of isolated vertices. The probability of each vertex being isolated is \(\P [E_i]=(1-p)^{n-1}\), so
the expected number of isolated vertices is \(\E [Y_n]=\sum _{i=1}^n\P [E_i]=n(1-p)^{n-1}\).
A pair of distinct vertices, that is \(i\neq j\), is part of \(2n-3\) edges (and not \(2n-2\), because they are both part of the edge between them), so \(\P [E_i\cap E_j]=(1-p)^{2n-3}\). We can calculate the
second moment too, with the help of Exercise 2.5, as
\(\seteqnumber{0}{6.}{3}\)
\begin{align*}
\E [Y_n^2] &= \sum _{i,j=1}^n \P [E_i\cap E_j] \\ &= \sum _{i=1}^n\P [E_i] +\sum _{\stackrel {i,j=1}{i\neq j}}^n \P [E_i\cap E_j] \\ &=
n(1-p)^{n-1}+n(n-1)(1-p)^{2n-3}
\end{align*}
Note that since \(Y\) takes integer values, \(\P [Y>0]=\P [Y\geq 1]\). The Paley-Zygmund inequality gives that
\[\P [Y\geq 1]\geq \frac {n^2(1-p)^{2n-2}}{n(1-p)^{n-1}+n(n-1)(1-p)^{2n-3}} =\frac {1}{\frac {1}{n}(1-p)^{-n-3}+\frac {n-1}{n}(1-p)^{-1}}.\]
Allowing \(p\) to depend on \(n\), it follows that if \(\frac {1}{n(1-p)^n}\to 0\), or equivalently \(n(1-p)^n\to \infty \), as \(n\to \infty \) then we must have \(\P [Y\geq 1]\to 1\) as \(n\to
\infty \). If that condition holds, then for large \(n\) it is very likely that \(G(n,p)\) contains at least one isolated vertex.