last updated: May 9, 2024

Probability with Measure

Chapter B Solutions to exercises

Chapter 1
  • 1.1 For example, \(\{3\}\cup \{4\}=\{3,4\}\), which is not an element of \(A\), but \(\sigma \)-fields are closed under taking (finite and countable) unions.

  • 1.2

    • (a) To show \(\Sigma _{1} \cap \Sigma _{2}\) is a \(\sigma \)-field we must verify (S1) to (S3).

      (S1) Since \(S \in \Sigma _{1}\) and \(S \in \Sigma _{2}\), \(S \in \Sigma _{1} \cap \Sigma _{2}\).

      (S2) Suppose \((A_{n})\) is a sequence of sets in \(\Sigma _{1} \cap \Sigma _{2}\). Then \(A_{n} \in \Sigma _{1}\) for all \(\nN \) and so \(\bigcup _{n=1}^{\infty }A_{n} \in \Sigma _{1}\). But also \(A_{n} \in \Sigma _{2}\) for all \(\nN \) and so \(\bigcup _{n=1}^{\infty }A_{n} \in \Sigma _{2}\). Hence \(\bigcup _{n=1}^{\infty }A_{n} \in \Sigma _{1} \cap \Sigma _{2}\).

      (S3) If \(A \in \Sigma _{1} \cap \Sigma _{2}, A^{c} \in \Sigma _{1}\) and \(A^{c} \in \Sigma _{2}\). Hence \(A^{c} \in \Sigma _{1} \cap \Sigma _{2}\).

    • (b) \(\Sigma _{1} \cup \Sigma _{2}\) is not in general a \(\sigma \)-field, because if \(A \in \Sigma _{1}\) and \(B \in \Sigma _{2}\) there is no reason why \(A \cup B \in \Sigma _{1} \cup \Sigma _{2}\). For example let \(S = \{1,2,3\}, \Sigma _{1} = \{\emptyset , \{1\}, \{2,3\}, S\}, \Sigma _{2} = \{\emptyset , \{2\}, \{1,3\}, S\}, A = \{1\}, B = \{2\}\). Then \(A \cup B = \{1,2\}\) is neither in \(\Sigma _{1}\) nor \(\Sigma _{2}\).

  • 1.3 We check S1-S3. Note that we need to show that \(\Sigma _X\) is a \(\sigma \)-field on \(X\) (and not a \(\sigma \)-field on \(S\)).

    (S1) Taking \(A=\emptyset \in \Sigma \) we have \(\emptyset \cap X=\emptyset \in \Sigma _X\). Taking \(A=S\in \Sigma \), we have \(S\cap X = X\in \Sigma _X\).

    (S2) If \(A\in \Sigma _X\) then \(A\in \Sigma \), and \(X\sc (X\cap A)=X\sc A= X\cap (S\sc A)\in \Sigma _X\) because \(S\sc A\in \Sigma \).

    (S3) If \(A_n\cap X\in \Sigma _X\) for all \(n\in \N \) then \(A_n\in \Sigma \), and \(\bigcup _{n=1}^\infty A_n\cap X=X\cap \l (\bigcup _{n=1}^\infty A_n\r )\in \Sigma _X\) because \(\bigcup _{n=1}^\infty A_n\in \Sigma \).

  • 1.4

    • (a) We can write \(A \cup B = (A \sc B) \cup (B \sc A) \cup (A \cap B)\) as a disjoint union (draw a diagram!). Hence using finite additivity of measures we obtain

      \[ m(A \cup B) = m(A\sc B) + m(B\sc A) + m(A \cap B).\]

      Hence

      \begin{align*} m(A \cup B) + m(A \cap B) & = m(A\sc B) + m(B\sc A ) + 2m(A\cap B)\\ & = [m(A \sc B) + m(A \cap B)] + [m(B \sc A) + m(A \cap B)]\\ & = m(A) + m(B), \end{align*} where we use the fact that \(A\) is the disjoint union of \(A \sc B\) and \(A \cap B\), and the analogous result for \(B\sc A\).

      Note that the possibility that \(m(A \cap B) = \infty \) is allowed for within this proof, because we have not used subtraction and therefore the undefined quantity \(\infty -\infty \) does not arise.

    • (b) \(m(A \cup B) \leq m(A \cup B) + m(A \cap B) = m(A) + m(B)\) follows immediately from (a) as \(m(A \cap B) \geq 0\). The general case is proved by induction. We’ve just established \(n=2\). Now suppose the result holds for some \(n\). Then

      \begin{align*} m\left (\bigcup _{i=1}^{n+1}A_{i}\right ) = m\left (\bigcup _{i=1}^{n}A_{i} \cup A_{n+1}\right ) &\leq m\left (\bigcup _{i=1}^{n}A_{i}\right ) + m(A_{n+1}) \\ &\leq \sum _{i=1}^{n}m(A_{i}) +m(A_{n+1}) = \sum _{i=1}^{n+1}m(A_{i}). \end{align*} The first inequality is justified by the \(n=2\) case, and the second is justified by the inductive hypothesis.

  • 1.5

    • (a) We have that \((km)(\emptyset )=km(\emptyset )=0\) because \(m(\emptyset )=0\).

      If \((A_n)_{n\in \N }\) is a sequence of disjoint measurable sets then

      \[\sum \limits _{n=1}^\infty (km)(A_n)=k\sum \limits _{n=1}^\infty m(A_n)=k m\l (\bigcup _{n=1}^\infty A_n\r )=(km)\l (\bigcup _{n=1}^\infty A_n\r ).\]

      Note that in the second equality above we have used that \(m\) is \(\sigma \)-additive. Hence \(km\) is \(\sigma \)-additive.

      Thus \(km\) is a measure.

      If \(m\) is a finite measure, then by taking \(k=\frac {1}{m(S)}\) it follows immediately that \(\P (\cdot )=\frac {m(\cdot )}{m(S)}\) is a measure. Noting that \(\P (S)=\frac {m(S)}{m(S)}=1\), \(\P \) is a probability measure.

    • (b) We have \(m_B(\emptyset )=m(\emptyset \cap B)=m(\emptyset )=0\).

      If \((A_n)_{n\in \N }\) is a sequence of disjoint measurable sets then \((A_n\cap B)_{n\in \N }\) are also disjoint and measurable, hence

      \[\sum \limits _{n=1}^\infty m_B(A_n)=\sum \limits _{n=1}^\infty m(A_n\cap B)=m\l (\bigcup _{n=1}^\infty A_n\cap B\r )=m\l (\l (\bigcup _{n=1}^\infty A_n\r )\cap B\r )=m_B\l (\bigcup _{n=1}^\infty A_n\r ).\]

      Here to deduce the second equality we use the \(\sigma \)-additivity of \(m\).

      Thus \(m_B\) is a measure.

    • (c) Applying part (a) to \(m_B\), it is immediate that \(\P _B\) is a probability measure.

      If \(m\) itself is a probability measure, say we write \(m=\P \), then \(\P _B(A)\) is the conditional probability of the event \(A\) given that the event \(B\) occurs.

  • 1.6 By definition \((a,b) \in {\cal B}(\R )\). We’ve shown in the notes that \(\{a\}, \{b\} \in {\cal B}(\R )\) and so by S(ii), \([a,b] = \{a\} \cup (a,b) \cup \{b\} \in {\cal B}(\R )\).

  • 1.7

    • (a)

      • (i) We have that \(A\cap A^c=\emptyset \) and \(A\cup A^c=S\), so \(m(A)+m(A^c)=m(S)=M\). Because \(m(S)<\infty \) we have also that \(m(A)<\infty \), hence we may subtract \(m(A)\) and obtain \(m(A^c)=M-m(A)\).

      • (ii) Let \((A_n)\) be a decreasing sequence of sets. Then \(B_n=S\sc A_n\) defines an increasing sequence of sets, so by the first part of Lemma 1.7.1 we have \(m(B_n)\to m(B)\) where \(B=\cup _j B_j\).

        By part (a) we have

        \begin{align*} m(B_n)&=m(S\sc A_n)=m(S)-m(A_n)\\ m(B)&=m(\cup _j S\sc A_n)=m(S\sc \cap _j A_j)=m(S)-m(\cap _j A_j) \end{align*} Thus \(m(S)-m(A_n)\to m(S)-m(\cap _j A_j)\). Since \(m(S)<\infty \) we may subtract it, and after multiplying by \(-1\) we obtain that \(m(A_n)\to m(\cap _j A_j)\).

    • (b) Let \(S=\R \), \(\Sigma =\mc {B}(\R )\) and \(m=\lambda \) be Lebesgue measure on \(\R \). Set \(A_n=(-\infty ,-n]\). Note that \(\cap _n A_n=\emptyset \) so \(\lambda (\cap _n A_n)=0\). However, \(m(A_n)=\infty \) for all \(n\), so \(m(A_n)\nrightarrow m(\cap _n A_n)\) in this case.

  • 1.8 We have \(m(S\sc E_n)=0\) for all \(n\in \N \). Hence by set algebra and the union bound (Lemma 1.7.2)

    \[m\l (S\sc \l (\bigcap _{n=1}^\infty E_n\r )\r )=m\l (\bigcup _{n=1}^\infty S\sc E_n\r ) \leq \sum _{n=1}^\infty m(S\sc E_n)=0\]

    as required.

  • 1.8 The sets \(S\sc E_n\) are null sets, so by Lemma 1.8.1 we have that \(\bigcup _{n=1}^\infty S\sc E_n\) is null. By set algebra we have \(S\sc \bigcap _{n=1}^\infty E_n=\bigcup _{n=1}^\infty S\sc E_n\), hence \(\bigcap _{n=1}^\infty E_n=\bigcup _{n=1}^\infty \) has full measure.

  • 1.9 There are \(n \choose r\) subsets of size \(r\) for \(0 \leq r \leq n\) and so the total number of subsets is \(\sum _{r=0}^{n}{n \choose r} = (1 + 1)^{2} = 2^{n}\). Here we used the binomial theorem \((x + y)^{n} = \sum _{r=0}^{n}{n \choose r}x^{r}y^{n-r}\).

  • 1.10

    • (a) Note that each element of \(\Pi \) is a subset of \(S\). Hence \(\Pi \) itself is a subset of the power set \(\mc {P}(S)\) of \(S\). Since \(S\) is a finite set, \(\mc {P}(S)\) is also a finite set, hence \(\Pi \) is also finite.

      Part (b) requires you to keep a very clear head. To solve a question like this you have to explore what you have deduced from what else, with lots of thinking ‘if I knew this then I would also know that’ and then trying to fit a bigger picture together, connecting your start point to your desired end point. Analysis can often be like this.

    • (b)

      • (i) Suppose \(\Pi _i\cap \Pi _j \neq \emptyset \). Note that \(\Pi _i \cap \Pi _j\) is a subset of both \(\Pi _i\) and \(\Pi _j\).

        By definition of \(\Pi \), any subset of \(\Pi _i\) is either equal to \(\Pi _i\) or is equal to \(\emptyset \). Since we assume that \(\Pi _i\cap \Pi _j \neq \emptyset \), we therefore have \(\Pi _i=\Pi _i\cap \Pi _j\). Similarly, \(\Pi _j=\Pi _i\cap \Pi _j\).

        Hence \(\Pi _i=\Pi _j\), but this contradicts the fact that the \(\Pi _i\) are distinct from each other. Thus we have a contradiction and in fact we must have \(\Pi _i\cap \Pi _j = \emptyset \).

      • (ii) By definition of \(\Pi \) we have \(\cup _{i=1}^k \Pi _i \sw S\). Suppose \(\cup _{i=1}^k \Pi _i \neq S\). Then \(C=S\sc \cup _{i=1}^k \Pi _i\) is a non-empty set in \(\Sigma \).

        Since \(C\) is disjoint from all the \(\Pi _i\), we must have \(C\notin \Pi \). Noting that \(C\in \Sigma \), by definition of \(\Pi \) this implies that there is some1 \(B_1\subset C\) such that \(B_1\neq \emptyset \).

        We have that \(B_1\) is disjoint from all the \(\Pi _i\), so we must have \(B_1\notin \Pi \). Thus by the same reasoning (as we gave for \(C\)) there exists \(B_2\subset B_1\) such that \(B_2\neq \emptyset \). Iterating, we construct an infinite decreasing sequence of sets \(C\supset B_1\supset B_2 \supset B_3\ldots \) each strictly smaller than the previous one, none of which are empty. However, this is impossible because \(C\sw S\) is a finite set.

      • (iii) Let \(i\in I\). So \(\Pi _i\cap A\neq \emptyset \). Noting that \(\Pi _i\cap A\sw \Pi _i\), by definition of \(\Pi \) we must have \(\Pi _i\cap A=\Pi _i\). That is, \(\Pi _i\sw A\). Since we have this for all \(i\in I\), we have \(\cup _{i\in I} \Pi _i\sw A\).

        Now suppose that \(A\sc \cup _{i\in I} \Pi _i\neq \emptyset \). Since by (ii) we have \(S=\cup _{i=1}^k \Pi _i\), and the union is disjoint by (i), this means that there is some \(\Pi _j\) with \(j\notin I\) such that \(A\cap \Pi _j\neq \emptyset \). However \(A\cap \Pi _j\sw \Pi _j\) so by definition of \(\Pi \) we must have \(\Pi _j\cap A=\Pi _j\). That is \(\Pi _j\sw A\), but then we would have \(j \in I\), which is a contraction.

        Thus \(A\sc \cup _{i\in I} \Pi _i\) must be empty, and we conclude that \(A=\cup _{i\in I} \Pi _i\).

  • 1.11

    • (a) Recall that \(C_n\) is the union of \(2^n\) disjoint closed intervals, each with length \(3^{-n}\), and that \(C=\cap _n C_n\), with notation as in Example 1.1.1.

      Suppose, for a contradiction, that \((a,b)\sw C\) with \(a<b\). Then \((a,b)\sw C_n\) for all \(n\). Choose \(n\) such that \((\frac {2}{3})^{-n}<\frac 12(b-a)\). Let us write the \(2^n\) disjoint closed intervals making up \(C_n\) as \(I_1,\ldots ,I_{2^n}\). The point \(c=\frac {a+b}{2}\) must fall into precisely one of these intervals, say \(I_j\). Since \(I_j\) has length \((\frac {2}{3})^{-n}\), which is less than \(\frac 12(b-a)\), we must have \(I_j\sw (a,b)\) (draw a picture!). However, \(C_{n+1}\) does not contain all of \(I_j\), because the middle part of \(I_j\) will be removed – so we cannot have \((a,b)\sw C_{n+1}\). Thus we have reached a contradiction.

    • (b) For a counterexample, consider a variant of the construction of the Cantor set, where instead of removing the middle thirds at stage \(n\), we instead remove the middle \(1-e^{-1/n^2}\) (from each component of \(C_n\)). Then, by the same argument as in the proof of Lemma 1.5.4, we would have

      \[\lambda (C)=\lim _n \lambda (C_n)=\lim _{n\to \infty } e^{-1}e^{-1/4}e^{-1/9}\ldots e^{-1/n^2}=\lim _{n\to \infty } \exp \l (-\sum _1^n \frac {1}{i^2}\r ) =\exp \l (-\sum _1^\infty \frac {1}{n^2}\r ). \]

      We have that \(\lambda (C)\) is positive because \(\sum _1^\infty \frac {1}{n^2}<\infty \).

      A similar argument as in part (a) applies here, and shows that \(C\) does not contain any open intervals. The length of each interval within \(C_{n+1}\) is less than half the length of the intervals in \(C_{n}\) (because each interval of \(C_n\) has a middle part removed to become two intervals in \(C_{n+1}\)). Thus, by a trivial induction, each of the \(2^{n}\) disjoint closed intervals in \(C_n\) has length \(\leq (\frac 12)^{n}\). You can check that we can apply the same argument as in (v), but replacing \((\frac 23)^n\) with \((\frac 12)^{n}\).

1 \(X\subset Y\) means that \(X\sw Y\) and \(X\neq Y\) i.e. \(X\) is strictly smaller than the set \(Y\)

Chapter 2
  • 2.1 We will prove the forwards implication first. If \((a_n)\) is bounded the there exists \(M\in \R \) with \(|a_n|\leq M\) for all \(n\in \N \). Hence \(|\sup _{k\geq n} a_k|\leq M\) and \(|\inf _{k\geq n} a_k|\leq M\) for all \(n\in \N \), which (using that limits preserve weak inequalities) implies that \(|\limsup _n a_n|\leq M\) and \(|\liminf _n a_n|\leq M\). In particular, both are real valued.

    For the reverse implication, suppose that both \(\liminf _n a_n\) and \(\limsup _n a_n\) are elements of \(\R \). Hence, the sequences \(b_n=\sup _{k\geq n}a_k\) and \(c_n=\inf _{k\geq n} a_k\) are bounded (because sequences that converge in \(\R \) are necessarily bounded). In particular, \(b_1=\sup _{k\geq 1}a_k\) and \(c_1=\inf _{k\geq a}a_k\) are elements of \(\R \), which implies that the sequence \((a_n)\) is bounded.

  • 2.2 If \(x<0\) then \(f_n(x)=0\) for all \(n\). If \(x>0\) then for \(n\) large enough that \(\frac 1n<x\) we have \(f_n(x)=0\). Hence \(f_n(x)\to 0\) for all \(x\neq 0\), which means \(f_n\to 0\) almost everywhere.

    (Note that \(f_n(0)=n\), which does not tend to zero.)

  • 2.3 We have that \(f_n(x)\to f(x)\) for all \(x\in S\sc A\), and \(g_n(x)\to g(x)\) for all \(x\in S\sc B\), where \(m(A)=m(B)=0\). Hence \(m(A\cup B)\leq m(A)+m(B)=0\), so \(m(A\cup B)=0\). For all \(x\in S\sc (A\cup B)\) we have \(f_n(x)\to f(x)\) and \(g_n(x)\to g(x)\), hence for such \(x\) we have \((f_n+g_n)(x)\to (f+g)(x)\) and \((f_ng_n)(x)\to (fg)(x)\). Therefore \(f_n+g_n\to f+g\) and \(f_ng_n\to fg\), both almost everywhere.

  • 2.4

    • (a) For \(N\in \N \) we have \(\sum _{n=1}^N a_n + \sum _{n=1}^N b_n = \sum _{n=1}^N (a_n + b_n)\). Since \(a_n,b_n\geq 0\), all of these terms are monotone increasing in \(N\), and therefore have limits (in \(\ov {\R }\)) by Lemma 2.1.1 as \(N\to \infty \). The result follows by uniqueness of limits.

    • (b) We check (M1) and (M2). For (M1), we have that \((m_1+m_2)(\emptyset )=m_1(\emptyset )+m_2(\emptyset )=0+0=0\).

      For (M2), if \((A_n)_{n\in \N }\) is a sequence of disjoint measurable sets then by part (a) we have

      \begin{align*} \sum \limits _{n=1}^\infty (m_1+m_2)(A_n) &=\sum \limits _{n=1}^\infty m_1(A_n)+m_2(A_n) \\ &=\sum \limits _{n=1}^\infty m_1(A_n)+\sum \limits _{j=1}^\infty m_2(A_n) \\ &= m_1\l (\bigcup _{n=1}^\infty A_n\r )+m_2\l (\bigcup _{j=1}^\infty A_n\r ) \\ &=(m_1+m_2)\l (\bigcup _{n=1}^\infty A_n\r ). \end{align*} Thus \(m_1+m_2\) is a measure.

    • (c) Combining part (b) of this question with the result of Exercise 1.5 part (a), we can show (by a trivial induction) that if \(m_{1},m_{2} \ldots , m_{n}\) are measures and \(c_{1}, c_{2}, \ldots , c_{n}\) are non-negative numbers then \(c_{1}m_{1} + c_{2}m_{2} + \cdots + c_{n}m_{n}\) is a measure. Apply this with \(m_{j} = \de _{x_{j}} (1 \leq j \leq n)\) to obtain the result.

      To get a probability measure we need \(\sum _{j=1}^{n}c_{j} = 1\). Then, as \(\de _{x}\) is a probability measure for all \(x\), we have \(m(S) = \sum _{j=1}^{n}c_{j}\de _{x_{j}}(S) = \sum _{j=1}^{n}c_{j} = 1.\)

  • 2.5

    • (a)

      • (i) If \(x \in A\) and \(x \in B\), the equation becomes \(1 =1 + 1 -1 = 1\),

        If \(x \in A\) and \(x \notin B\), the equation becomes \(1 =1 + 0 -0 = 1\),

        If \(x \notin A\) and \(x \in B\), the equation becomes \(1 =0 + 1 -0 = 1\),

        If \(x \notin A\) and \(x \notin B\), the equation becomes \(0 =0 + 0 - 0 = 0\), so we have equality in all cases.

      • (ii) Since \(A = B \cup (A\sc B)\) and \(B \cap (A\sc B) = \emptyset \), we can apply (i) to obtain that \({\1}_{A} = {\1}_{B} + {\1}_{A\sc B}\).

      • (iii) Note that both sides are non-zero if and only if both \(x\in A\) and \(x\in B\), in which case both sides are equal to \(1\).

    • (b) The function \(\sum _1^\infty \1_{A_n}\) is defined as a pointwise limit as \(N\to \infty \) of the partial sums \(\sum _1^N \1_{A_n}\), which are themselves defined pointwise.

      For the last part, if \(x \notin A\) then \(x \notin A_{n}\) for all \(\nN \) and so lhs \(=\) rhs \(= 0\). If \(x \in A\) then \(x \in A_{n}\) for one and only one \(\nN \) and so lhs \(=\) rhs \(= 1\).

  • 2.6

    • (a)

      • (i) For all \(\nN \) we have \(\sup _{k \geq n}(a_{k} + b_{k}) \leq \sup _{k \geq n}a_{k} + \sup _{k \geq n}b_{k}\). Taking limits on both sides gives \(\limsup _n (a+n+b_n)\leq \limsup _n a_n + \limsup _n b_n\).

      • (ii) Note that \(\left (\sup _{k \geq n}a_{k}\right )\left (\sup _{k \geq n}b_{k}\right )\leq \sup _{k \geq n}(a_{k}b_{k})\) and take limits on both sides.

      • (iii) Note that \(\sup _{k \geq n} ca_{k} = c \sup _{k \geq n}a_{k}\) because \(c\geq 0\), and take limits on both sides.

      • (iv) Note that \(\sup _{k\geq n} a_k \leq \sup _{k\geq n} b_k\), and take limits on both sides.

    • (b)

      • (i) Putting \((-a_n)\) and \((-b_n)\) in place of \((a_n)\) and \((b_n)\) in part (a)(i), Lemma 2.2.3 gives that \((-\liminf _n a_n) + (-\liminf _n b_n) \leq -\liminf _n (a+n+b_n)\), which gives \(\liminf _n (a_n+b_n)\leq \liminf _n a_n + \liminf _n b_n\).

      • (ii) Note that \(\left (\inf _{k \geq n}a_{k}\right )\left (\inf {k \geq n}b_{k}\right )\geq \sup _{k \geq n}(a_{k}b_{k})\) and take limits on both sides as in (a)(ii).

      • (iii) Putting \((-a_n)\) in place of \((a_n)\) in part (a)(iii), Lemma 2.2.3 gives that \(-\liminf _n ca_n = c(-\liminf _n a_n)\), hence \(\liminf _n ca_n = c\liminf _n a_n\).

      • (iv) Note that \(\inf _{k\geq n} a_k \leq \inf _{k\geq n} b_k\), and take limits on both sides as in (a)(iv), to obtain \(\liminf _n a_n \leq \liminf _n b_n\).

  • 2.7 It is clear that \(e^{-nx^2}\to 0\) as \(n\to \infty \) for all \(x\neq 0\), and that \(e^{-n0^2}=e^0=1\) for all \(n\). Hence \(f_n\to \1_{\{0\}}\) pointwise.

    If the convergence was uniform then, from real analysis, for any sequence \(a_n\to a\) we would have that \(f_n(a_n)\to f(a)\) as \(n\to \infty \). However, if we take \(a_n=\frac {1}{\sqrt {n}}\) then \(f_n(a_n)=e^{-1}\), which does not converge to \(f(\lim _n a_n)=f(0)=1\).

  • 2.8

    • (a) It is clear that \(d(x,y)=d(y,x)\) and that \(d(x,x)=0\). If \(x\neq y\) then \(d(x,y)>0\), because \(\arctan \) is strictly increasing on \([-\infty ,\infty ]\). For the triangle law we use the triangle law for \(\R \) to deduce that

      \[d(x,z)=|\arctan (x)-\arctan (z)|\leq |\arctan (x)-\arctan (y)|+|\arctan (y)-\arctan (z)|=d(x,y)+d(y,z).\]

      There is nothing special about \(\arctan \) here. Any function that is a strictly increasing map from \(\ov {\R }\) to a bounded subset of \(\R \) will do.

    • (b) Recall that a metric space is compact if and only if it sequentially compact. Let \((a_n)\) be a sequence in \(\ov {\R }\). If \((a_n)\) within \(\R \) and \((a_n)\) is bounded, then \((a_n)\) has a subsequence convergent to some \(a\in \R \) (by the Heine-Borel theorem). Alternatively, if \((a_n)\) is unbounded then there it contains a subsequence \((a_{r_n})\) such that \(|a_{r_n}|\to \infty \). In particular there must be a subsequence that converges to \(+\infty \) or to \(-\infty \). Thus \(\ov {\R }\) is sequentially compact, and thus compact.

    • (c) If \((a_n)\) converges then any subsequence of \((a_n)\) converges to the same limit. Hence (i) \(\ra \) (ii).

      For the reverse implication, let \((a_n)\) be a sequence in \(\ov {\R }\) and assume (ii). Recall that a metric space is sequentially compact if and only if it is compact, hence \(\ov {\R }\) is sequentially compact. We will argue by contradiction: suppose that \((a_n)\) does not converge to \(a\). Then there exists \(\eps >0\) and an infinite subsequence \((a_{r_n})\) of \((a_n)\) such that \(|a_{r_n}-a|\geq \eps \) for all \(n\in \N \). By sequential compactness \((a_{r_n})\) has a convergent subsequence. By (ii), this convergent subsequence converges to \(a\), which is a contradiction. Hence in fact \(a_n\to a\).

  • 2.9 It suffices to show that \(\liminf _n a_n=\inf \mathscr {L}\). The corresponding result for \(\limsup \) follows by multiplying both sides by \(-1\) and using Lemma 2.2.3.

    Let \((a_{r_n})\) be a convergent subsequence of \(a_n\). Note that this implies \(r_n\geq n\). It follows immediately that \(\inf _{k\geq n} a_k\leq \inf _{k\geq n} a_{r_k}\). Hence also \(\liminf _n a_n \leq \liminf _n a_{r_n}\). By Lemma 2.2.2 we have \(\liminf _n a_{r_n}=\lim _n a_n\). Since \((a_{r_n})\) was an arbitrary convergent subsequence we thus have \(\liminf _n a_n\leq \inf \mathscr {L}\).

    To complete the proof we need to show the reverse inequality. We will argue by contradiction. Suppose that \(\liminf _n a_n < \inf \mathscr {L}\). Let us write \(a=\liminf _n a_n\). Let \(\eps >0\) be such that \(a+\eps < \inf \mathscr {L}\). By definition of \(\liminf \), there exists \(N\in \N \) such that for all \(n\geq \N \) we have \(\inf _{k\geq n} a_k \leq a+\eps \). Hence we can define a subsequence \((a_{r_n})\) of \((a_n)\) by setting

    \begin{align*} r_1 = \inf \{k\geq N\- a_k\leq a +\eps \},\\ r_{n+1} = \inf \{k>r_n\- a_n\leq a +\eps \}. \end{align*} We have \(a_{r_n}\leq a+\eps \) for all \(n\), and by part (c) of Exercise 2.8 the sequence \((a_{r_n})\) has a convergent subsequence. The limit of this subsequence must be less that or equal to \(a+\eps \), which implies that \(\mathscr {L}\leq a+\eps <\mathscr {L}\). This is a contradiction, which completes the proof.

Chapter 3
  • 3.1 Note that there are very many different ways to solve the various parts of this question, using the results in Sections 3.1 and 3.2.

    • (a) We have \(f^{-1}([c,\infty ))=\emptyset \) if \(\alpha <c\) and \(f^{-1}([c,\infty ))=\R \) if \(c\leq \alpha \). In both cases, \(f^{-1}([c,\infty ))\) is Borel.

    • (b) We can write \(g(x)=\1_{[0,\infty )}(x)e^x\). Indicator functions of measurable sets are measurable, and \(x\mapsto e^x\) is continuous, hence measurable. Products of measurable functions are measurable, hence \(g\) is measurable.

    • (c) \(x\mapsto \sin (\cos x)\) is a continuous function, because the composition of continuous functions is also continuous. Hence \(h\) is measurable.

    • (d) The function \(\sin \) is continuous, hence measurable. A similar method to (b) shows that \(x^2\1_{[0,\infty )}(x)\) is measurable. The composition of a Borel measurable functions is measurable, hence \(i\) is measurable.

  • 3.2 If \(f\) is measurable then \(f^{-1}((a,b))\in \Sigma \), as \((a,b)\in \mc {B}(\R )\).

    For the reverse implication, we have that \(f^{-1}((a,b))\in \Sigma \) for all \(-\infty \leq a<b\leq \infty \). Take \(b=\infty \) and we have \(f^{-1}((a,\infty ))\in \Sigma \) for all \(a\in \R \). From this, Lemma 3.1.4 gives that \(f\) is measurable.

  • 3.3 Note that \(|f|=\max (0,f)+\max (0,-f)\). Theorem 3.1.5 implies that all parts of this formula are operations that preserve measurability, hence \(|f|\) is measurable.

  • 3.4

    • (a) For any \(a>0\), we have

      \[h^{-1}((a,\infty ))=\{x\in \R \- f(x+y)>(a,\infty )\}=\{z-y\in \R \-f(z)>a\}=(f^{-1}((a,\infty )))_{-y}.\]

      Here we use the notation \(A_y=\{a+y\-a\in A\}\) from Section 1.4. Using that \(A_y\in \mc {B}(\R )\) whenever \(A\in \mc {B}(\R )\), we have that \(h^{-1}((a,\infty ))\in \mc {B}(\R )\), and hence \(h\) is measurable.

      Alternative: Write \(h = f \circ \tau _{y}\) where \(\tau _{y}(x) = x + y\). The mapping \(\tau _{y}\) is continuous and hence measurable and so \(h\) is measurable by Lemma 3.2.1.

    • (b) If \(f\) is differentiable then it is continuous, hence also measurable by Lemma 3.2.1.

      For each \(x \in \R \) we have \(f^{\prime }(x) = \lim _{h \rightarrow 0}\frac {f(x + h) - f(x)}{h}\). Note that \(x \rightarrow f(x+h)\) is measurable by part (a), so \(x \rightarrow \frac {f(x + h) - f(x)}{h}\) is measurable by Theorem 3.1.5. Hence \(f^{\prime }\) is also measurable, again by Theorem 3.1.5.

    • (c) Recall that we have shown all intervals (i.e. sets of the form \((a,b)\), \([a,b)\) and so on) are Borel sets. We use the description of intervals given in the hint.

      Suppose \(f\) is monotone increasing. Fix \(c\in \R \) and consider \(I=f^{-1}((c,\infty ))\). We want to show that \(I\) is an interval. Let \(a,b\in I\), so that we have \(f(a),f(b)>c\), and let \(a<d<b\). We have \(a\leq d\) so, by monotonicity of \(f\) we have \(f(a)\leq f(d)\). Thus \(f(d)>c\), so \(d\in I\). Hence \(I\) is an interval, so \(I\in \mc {B}(\R )\). We thus have that \(f^{-1}((c,\infty ))\in \Sigma \) for all \(c\in \R \), so \(f\) is measurable by Lemma 3.1.4.

  • 3.5 We need to show that if \(f\) and \(g\) are simple functions and \(\alpha ,\beta \in \R \), then \(\alpha f+\beta g\) is also a simple function. This follows from the same calculation as leads to equation (??), which is the first step in the proof of Lemma 4.1.2.

  • 3.6

    • (a) For any \(a\in \R \), we have \((f+\alpha )^{-1}((a,\infty ))=\{x\in \R \-f(x)+\alpha >a\}=\{x\in \R \-f(x)>a-\alpha \}=f^{-1}((a-\alpha ,\infty ))\). Hence \((f+\alpha )^{-1}((a,\infty ))\in \Sigma \) by measurability of \(f\), which by Lemma 3.1.4 implies that \(f\) is measurable.

    • (b) First note that if \(\alpha =0\) then \((\alpha f)(x)= 0\) for all \(x\), so in this case \(f\) is measure by 3.1 part (a).

      Consider when \(\alpha >0\). For any \(a\in \R \), we have \((\alpha f)^{-1}((a,\infty ))=\{x\in \R \-\alpha f(x)>a\}=\{x\in \R \-f(x)>a/\alpha \}=f^{-1}((a/\alpha ,\infty ))\). For \(k<0\), similarly \((\alpha f)^{-1}((a,\infty ))=\) \((\alpha f)^{-1}((a,\infty ))=\{x\in \R \-\alpha f(x)>a\}=\{x\in \R \-f(x)<a/\alpha \}=f^{-1}((-\infty ,a/\alpha ))\). In both cases there are measurable sets, so Lemma 3.1.4 gives that \(\alpha f\) is measurable.

    • (c) Note that \((G \circ f)^{-1}(A) = f^{-1}(G^{-1}(A))\). For \(A\in \mc {B}(\R )\), measurability of \(G\) implies that \(G^{-1}(A)\in \mc {B}(\R )\), and measurabilty of \(f\) thus implies that \(f^{-1}(G^{-1}(A))\in \Sigma \). Hence \(G\circ f\) is measurable.

  • 3.7

    • (a) Let \(x\in O_1\cup O_2\). Consider if \(x\in O_1\), then there is an open interval \(I_1\) containing \(x\). Thus \(I_1\) is an open interval within \(O_1\cup O_2\) containing \(x\). We can do the same for \(x\in O_2\), then with \(x\in I_2\sw O_2\), hence \(O_1\cup O_2\) is open.

      Now let \(x\in O_1\cap O_2\). Then for each \(i=1,2\) we have an open interval \(I_i\sw O_i\) containing x. Let us write \(I_1=(a_1,b_1), I_2=(a_2,b_2)\), and \(c_1=\max (a_1,b_1)\), \(c_2=\min (a_2,b_2)\). Then \((c_1,c_2)=I_1\cap I_2\), and since \(x\in I_1\cap I_2\) we have \(x\in (c_1,c_2)\). In particular this means \(c_1<c_2\), so \(I_1\cap I_2\) is an open interval. Also \(I_1\cap I_2\sw O_1\cap O_2\), so \(O_1\cap O_2\) is open.

    • (b)

      • 1. This is true. We can use exactly the same method as in part (a): let \(x\in \cup _n O_n\), and the assume \(x\in O_1\) (or use \(O_i\) in place of \(O_1\)), then we have an open interval \(I_1\sw O_1\) containing \(x\), then \(I_1\sw \cup _n O_n\), and we are done.

      • 2. This is false. A counterexample is given by \(O_n=(\frac {-1}{n},1+\frac {1}{n})\), for which \(\cap _n O_n=[0,1]\).

    • (c) Let \((C_n)_{n\in \N }\) be a sequence of closed sets. Then \(\R \sc C_n\) is open, for each \(n\). Using set operations we have

      \begin{align*} \R \sc (C_1\cup C_2)&=(\R \sc C_1)\cap (\R \sc C_2)\\ \R \sc (C_1\cap C_2)&=(\R \sc C_1)\cup (\R \sc C_2)\\ \R \sc \l (\bigcup _n C_n\r )&=\bigcap _n\l (\R \sc C_n\r )\\ \R \sc \l (\bigcap _n C_n\r )&=\bigcup _n\l (\R \sc C_n\r ) \end{align*} The first two equations combined with part (a) tell us that both the results of part (a) carry over to closed sets: both \(C_1\cap C_2\) and \(C_1\cup C_2\) are closed.

      From the fourth equation, since \(\R \sc C_n\) is open (for all \(n\)), using (b)(i) we see that \(\R \sc \l (\bigcup _n C_n\r )\) is also open, hence \(\bigcap _n C_n\) is closed.

      However, we can’t do the same for the third equation, because (b)(ii) was false. Instead, we can take complements of our counterexample in (b)(ii) to find a counterexample here, giving \(C_n=\R \sc (\frac {-1}{n},1+\frac {1}{n})=(-\infty ,\frac {-1}{n}]\cup [1+\frac {1}{n},\infty )\). Then \(\cup _nC_n=(-\infty ,0)\cup (1,\infty )\) which is not closed (because its complement \([0,1]\) is not open).

  • 3.8

    • (a) It is sufficient to consider the case \(x = a\). Then for any \(\eps > 0\) and arbitrary \(\de , f(a - \de ) = 0 < f(a) + \eps = 1 + \eps \) and \(f(a + \de ) = f(a) < f(a) + \eps = 1 + \eps \).

    • (b) Its sufficient to consider the case \(x = n\) for some integer \(n\). Again for any \(\eps > 0\) and arbitrary \(\de , f(n-\de ) = n-1 < f(n) + \eps = n + \eps \) and \(f(n+\de ) = n < f(n) + \eps = n + \eps \).

    • (c) Let \(U = f^{-1}((-\infty , a))\). We will show that \(U\) is open. Then it is a Borel set and \(f\) is measurable. Fix \(x \in U\) and let \(\eps = a - f(x)\). Then there exists \(\de > 0\) so that \(|x - y| < \de \Rightarrow f(y) < f(x) + \eps = a\) and so \(y \in U\). We have shown that for each \(x \in U\) there exists an open interval (of radius \(\de \)) so that if \(y\) is in this interval then \(y \in U\). Hence \(U\) is open.

Chapter 4
  • 4.1

    • (a) \(f=2\1_{[-2,-1]}-\1_{(-1,1)}+3\1_{[1,2)}-5\1_{[2,3)}\)

      \(\int _{\R } f\,dm = 2((-1)-(-2))-1(1-(-1))+3(2-1)-5(3-2) = -2\)

    • (b) \(f_+=2\1_{[-2,-1]}+3\1_{[1,2)}\) and \(f_-=\1_{(-1,1)}+5\1_{[2,3)}\)

      \(\int _{\R } f_+\,dm = 2((-1)-(-2))+3(2-1) = 5\) and \(\int _{\R } f_-\,dm = 1(1-(-1))+5(3-2) = 7\)

  • 4.2 Let us write \(f = \sum _{i=1}^{n}c_{i}{\1}_{A_{i}}\), where the \(A_i\) are disjoint measurable sets. Then

    \[{\1}_{A}f = \sum _{i=1}^{n}c_{i}{\1}_{A_{i}}{\1}_{A} = \sum _{i=1}^{n}c_{i}{\1}_{A_{i}\cap A},\]

    where we have used \(\1_A\1_B=\1_{A\cap B}\) (which might be obvious, but it is also from Exercise 2.5).

    Note that if \(i\neq j\) then \((A_i\cap A)\cap (A_j\cap A)=A\cap (A_i\cap A_j)=A\cap \emptyset =\emptyset \), so the \(A\cap A_i\) are disjoint sets. Hence \(\1_A f\) is a simple function.

  • 4.3 Note that \(f^2\) is a non-negative function. Applying Lemma 4.2.3 to \(f^2\) gives

    \[m(\{x\in S\- |f(x)|\geq c\})=m(\{x\in S\- |f(x)^2|\geq c^2\})\leq \frac {1}{c^2}\int _S f^2\,dm.\]

    Similarly, applying it to \(|f|^p\) gives \(m(\{x\in S\- |f(x)|\geq c\})\leq \frac {1}{c^p}\int _S |f|^p\,dm.\) Note that in general we need to use \(|f|\) instead of \(f\) to ensure non-negativity.

  • 4.4

    • (a) Noting \(f_+\) and \(f_-\) are both non-negative, with non-negative integrals, we have

      \[\l |\int _S f\,dm\r | = \l |\int _S f_+\,dm - \int _S f_-\,dm\r | \leq \int _S f_+\,dm + \int _S f_-\,dm = \int _S |f|\,dm.\]

    • (b) By the triangle inequality we have \(|f(x)+g(x)|\leq |f(x)|+|g(x)|\). Thus by monotonicity and linearity for integrals of non-negative functions (Lemmas 4.2.2 and Lemma 4.3.2) we have

      \[\int _S |f+g|\,dm\leq \int _S |f|+|g|\,dm = \int _S |f|\,dm + \int _S |g|\,dm.\]

    • (c) Using linearity for integrals of non-negative functions (Lemma 4.3.2), if \(c \geq 0\) then

      \[\int _{S}cf\,dm = \int _{S}cf_{+}\,dm - \int _{S}cf_{-}\,dm = c\int _{S}f_{+}\,dm - c\int _{S}f_{-}\,dm = c\int _{S}f \,dm.\]

      If \(c = -1\) then \((-f)_{+} = f_{-}\) and \((-f)_{-} = f_{+}\), so

      \[ \int _{S}(-f)\,dm = \int _{S}f_{-}\,dm - \int _{S}f_{+}\,dm = -\left (\int _{S}f_{+}\,dm - \int _{S}f_{-}\,dm\right ) = -\int _{S}f\,dm.\]

      Finally for general \(c < 0\) write \(c =(-1)\times d\) where \(d > 0\) and use the two cases we’ve just proved.

    • (d) If \(f \leq g\) then \(g - f \geq 0\) so by monotonicty for integrals of non-negative functions (Lemma 4.2.2), \(\int _{S}(g-f)dm \geq 0\). By linearity (c.f. the hint) this is equivalent to \(\int _{S}gdm - \int _{S}f\,dm \geq 0\), which gives \(\int _{S}g\,dm \geq \int _{S}fdm\) as required.

  • 4.5 Suppose that \(|f|\leq C\) where \(C\in [0,\infty )\). By monotoncity and linearity (from Lemmas 4.2.2 and 4.3.2) we have \(\int _S |f|\,dm \leq \int _S C\,dm=C\int _S 1\,dm = Cm(S)<\infty .\) Hence \(f\in \mc {L}^1\).

    Note that we do not use Theorem 4.5.3 here because, at the point where we need monotonicty and linearity, we do not yet know that \(f\in \mc {L}^1\).

  • 4.6 By Riemann integration, we have

    \[\int _{1/n}^1 \log x\,dx=\l [x\log x - x\r ]_{1/n}^1=(-1)-\l (\frac {1}{n}\log \frac {1}{n}-\frac {1}{n}\r )=\frac {1+\log n}{n} -1.\]

    Noting that \(\log x\in (-\infty ,0)\) for \(x\in (0,1)\), multiplying the above by \(-1\) gives

    \[\int _{1/n}^1|\log x|\,dx=1-\frac {1+\log n}{n}.\]

    We have that \(g_n(x)=|\log x|\1_{x\in (1/n,1)}\) is a monotone increasing sequence of non-negative functions, with pointwise convergence to \(g(x)=|\log x|\) for \(x\in (0,1)\). Hence, by the monotone convergence theorem,

    \[\int _0^1|\log x|\,dx=\lim _{n\to \infty }\l (1-\frac {1+\log n}{n}\r )=1.\]

    Thus \(\log x\) is in \(\mc {L}^1\) on \((0,1)\).

  • 4.7 Let \(x \in \R \) be arbitrary. Then we can find \(n_{0} \in \mathbb {N}\) so that \(\frac {1}{n_{0}} < |x|\) and then for all \(n \geq n_{0}, f_{n}(x) = n{\1}_{(0, 1/n)}(x) = 0\). So we have proved that \(\lim _{n \rightarrow }f_{n}(x) = 0\). But for all \(\nN \)

    \[ \int _{\R }|f_{n}(x) - 0|dx = n\int _{\R }{\1}_{(0, 1/n)}(x)dx = n\frac {1}{n} = 1,\]

    and so we cannot find any function in the sequence that gets arbitrarily close to \(0\) in the \({\cal L}_{1}\) sense.

    The MCT does not apply here because \((f_n)\) is not monotone. The DCT does not apply because, if it did, then the DCT would give \(\int _\R f_n \to \int _\R f=0\) which is not true! We conclude that there is no dominating integrable function for \((f_n)\), because all the other conditions for the DCT do hold.

    \((\star )\) Extension: Fatou’s lemma does apply, and would give \(\int _\R \liminf _n f_n=\int _\R 0 \leq \liminf _n\int _\R f_n = \liminf _n 1=1\) which we already knew because we could calculate the integrals explicitly in this case.

  • 4.8 Since \(|\cos (\cdot )| \leq 1\) we have \(|\cos (\alpha x)f(x)| \leq |f(x)|\) for all \(x \in \R \). Lemma 4.6.1 thus gives that this function is in \(\mc {L}^1\).

    Let \(f_n(x)=\cos (x/n)f(x)\). Note that \(\lim _{n \rightarrow \infty }\cos (x/n) = \cos (0) = 1\) for all \(x \in \R \). Hence \(f_n\to f\) pointwise. Moreover, \(f\in \mc {L}^1\) is a dominating function for the sequence \((f_n)\), so we may use the dominated convergence theorem to deduce that

    \[ \lim _{n \rightarrow \infty }\int _{\R }\cos (x/n)f(x)dx = \int _{\R } \lim _{n \rightarrow \infty }\cos (x/n)f(x)dx = \int _{\R }f(x)dx.\]

  • 4.9 Let \(g=|f|\1_A\) and \(g_n=|f|\1_{\bigcup _{i=1}^n A_i}\). Thus \(0\leq g_n\leq g_{n+1}\) and \(g_n\to g\) pointwise. From the monotone convergence theorem we have \(\int _S g_n\,dm \to \int _S g\,dm\). Note that \(\int _S g\,dm=\int _A |f|\,dm\). Since the \(A_i\) are disjoint we have \(\1_{\bigcup _{i=1}^n A_i}=\sum _{i=1}^n\1_{A_i}\). Hence by linearity we have \(g_n=|f|\sum _{i=1}^n \1_{A_i}=\sum _{i=1}^n |f|\1_{A_i}\), so

    \[\int _S g_n\,dm = \sum _{i=1}^n \int _S |f|\1_{A_i}\,dm = \sum _{i=1}^n \int _{A_i} f\,dm.\]

    Fitting all this together, we have shown that

    \[\lim _{n\to \infty }\sum _{i=1}^n\int _{A_i}|f|\,dm = \int _A |f|\,dm.\]

  • 4.10

    • (a) We can write

      \[a^{(N)}_n=\sum \limits _{i=1}^N a_i \1_{\{i\}}(n).\]

      Noting that \(\{i\}\) is measurable and \(a_i\in \R \), this is a simple function. Since \(\#(\{i\})=1\), using (4.3) we have

      \[\int _\N a^{(N)}\,d\#=\sum \limits _{i=1}^N a_i.\]

      We have \(a^{(N)}\to a\) pointwise as \(N\to \infty \), and \(0\leq a^{(N)}\leq a^{(N+1)}\), so by the monotone convergence theorem

      \[\int _\N a\,d\#=\lim _{N\to \infty }\int _\N a^{(N)}\,d\#=\sum \limits _{i=1}^\infty a_i,\]

      as required.

    • (b) We have that \(a\in \mc {L}^1\) if and only if \(|a|\in \mc {L}^1\), which from part (a) occurs if and only if \(\sum _n |a_n|<\infty \). That is, \(a\in \mc {L}^1\) if and only if it is absolutely convergent.

      Lastly, writing \(a=a_+-a_-\) we have

      \begin{align*} \int _\N a\,d\# &=\int _\N a_+\,d\#-\int _N a_-\,d\# \\ &=\sum \limits _{n=1}^\infty \max (a_n,0)-\sum \limits _{n=1}^\infty \max (-a_n,0) \\ &=\sum \limits _{n=1}^\infty \max (a_n,0)-\max (-a_n,0) \\ &=\sum \limits _{n=1}^\infty a_n. \end{align*} Here, the third line follows from the second using absolute convergence, which allows us to rearrange infinite series.

  • 4.11 Reflexivity is obvious as \(f(x) = f(x)\) for all \(x \in S\). So is symmetry, because \(f(x) \eqae g(x)\) if and only if \(g(x) \eqae f(x)\). For transitivity, let \(A = \{x \in S; f(x) \neq g(x)\}, B = \{x \in S; g(x) \neq h(x)\}\) and \(C = \{x \in S; f(x) \neq h(x)\}\). Then \(C \subseteq A \cup B\) and so \(m(C) \leq m(A) + m(B) = 0\). Thus if \(f\eqae g\) and \(g\eqae h\) we have \(f\eqae h\).

  • 4.12 Note that \(f - f_{n} \geq 0\) for all \(\nN \), so by Fatou’s lemma

    \[ \li \int _{S} (f - f_{n})\,dm \geq \int _{S} \li (f - f_{n})\,dm.\]

    Rearranging both sides,

    \begin{align*} \int _{S}f \,dm + \li \int _{S}(-f_{n})\, dm &\geq \int _{S}f \,dm + \int _{S}\li (- f_{n})\,dm, \\ \li -\left (\int _{S}f_{n}\,dm\right ) &\geq \int _{S}\li (- f_{n})\,dm. \end{align*} Multiplying both sides by \(-1\) reverses the inequality to yield

    \[ -\li -\left (\int _{S}f_{n}\,dm\right ) \leq \int _{S}\left (-\li (- f_{n})\right )\,dm.\]

    Hence by Lemma 2.2.3 we have

    \[ \ls \int _{S}f_{n}\,dm \leq \int _{S}\ls f_{n} \,dm.\]

  • 4.13 Dominated convergence theorem, complex version. Let \(f_n,f\) be functions from \(S\) to \(\C \). Suppose that \(f_n\) is measurable and:
    1. There is a function \(g\in \Lone _\C \) such that \(|f_{n}| \leq |g|\) almost everywhere.
    2. \(f_n\to f\) almost everywhere.
    Then \(f\in \Lone _\C \) and
    \[ \int _{S} f_{n}dm\to \int _{S} f dm \]
    as \(n\to \infty \).

    To prove this, note that \(f_n\to f\) implies that \(\Re (f_n)\to \Re (f)\) and \(\Im (f_n)\to \Im (f)\), all almost everywhere (in \(\C \) or \(\R \) as appropriate). The function \(|g|:S\to \R \) satisfies \(\int _S |g|\,dm<\infty \) because \(g\in \Lone _\C \). We have \(|\Re (f_n)|\leq |f|\leq |g|\) and \(|\Im (f_n)|\leq |f|\leq |g|\), hence \(|g|\) serves as a dominating function to apply the DCT to real and imaginary parts of \(f\). The result then follows by linearity.

  • 4.14 We have

    \begin{align*} \int _0^x ie^{iay}\,dy & = i\int _0^x \cos (ay)\,dy - \int _0^x \sin (ay)\,dy \\ &= i\frac {1}{a}(\sin (ax)-0)-\frac {1}{a}(-\cos (ax)-(-1)) \\ &= \frac {1}{a}\l (e^{iax}-1\r ). \end{align*}

  • 4.15 All of them do, where we treat \(|\cdot |\) as the complex modulus and replace functions with \(\C \) valued equivalents. Details are left for you.

  • 4.16 Define \(f_{n}(x) = f(t_{n}, x)\) for each \(\nN , x \in S\). Then \(|f_{n}(x)| \leq g(x)\) for all \(x \in S\). Since \(g\in \Lone \), by dominated convergence

    \begin{align*} \lim _{n \rightarrow \infty }\int _{S}f(t_{n}, x)dm(x) & = \int _{S}\lim _{n \rightarrow \infty }f_{n}(x)dm(x)\\ & = \int _{S}\lim _{n \rightarrow \infty }f(t_{n}, x) dm(x)\\ & = \int _{S}f(t, x)dm(x), \end{align*} where we used the continuity assumption (ii) in the last step.

  • 4.17 Let \((h_n)\) be an arbitrary sequence such that \(h_n\to 0\) and define \(a_{n,t}(x)=\frac {f(t_n+h,x)-f(t,x)}{h_n}\).

    Since \(\frac {\p f}{\p t}\) exists we have \(a_{n,t}(x)\to \frac {\p f}{\p t}(x,t)\) as \(n\to \infty \) for all \(x\). By the mean value theorem there exists \(\theta _n\in [0,1]\) such that \(a_{n,t}(x)=\frac {\p f}{\p t}(t+\theta _n h,x)\), hence \(|f_n(x)|\leq h(x)\). Thus by dominated convergence \(\int _S a_{n,t}(x)\,dm(x)\to \int _S \frac {\p f}{\p t}(t,x)\,dm(x)\).

    By linearity of the integral we have

    \begin{align*} \frac {\p }{\p t}\int _S f(t,x)\,dm(x) &=\lim _{n\to \infty }\frac {1}{h_n}\l (\int _S f(t+h_n,x)\,dm(x)-\int _Sf(t,x)\,dm(x)\r )\\ &=\lim _{n\to \infty }\int _S a_{n,t}(x)\,dm(x) \end{align*} and the result follows.

  • 4.18

    • (a) For each \(x \in \R , \nN \), the expression for \(f_{n}(x)\) is a telescopic sum. If you begin to write it out, you see that terms cancel in pairs and you obtain

      \[ f_{n}(x) = -2xe^{-x^{2}} + 2(n+1)^{2}xe^{-(n+1)^{2}x^{2}}.\]

      Using the fact that \(\lim _{N \rightarrow \infty }N^{2}e^{-yN^{2}} = 0\), for all \(y \in \R \) we find that

      \[ \lim _{n \rightarrow \infty }f_{n}(x) = f(x) = -2xe^{-x^{2}}.\]

    • (b) The functions \(f\) and \(f_{n}\) are continuous and so Riemann integrable over the closed interval \([0,a]\). We can calculate (which is left for you) that \(\int _{0}^{a}f(x)dx = -2\int _{0}^{a}xe^{-x^{2}}dx = e^{-a^{2}} - 1\). But on the other hand

      \[\begin {aligned} \int _{0}^{a}f_{n}(x)\,dx & = \sum _{r=1}^{n}\int _{0}^{a}\l (-2r^{2}xe^{-r^{2}x^{2}} + 2(r+1)^{2}xe^{-(r+1)^{2}x^{2}}\r )\,dx\\ & = \sum _{r=1}^{n}\l (e^{-r^{2}a} - e^{-(r+1)^{2}a}\r )\\ & = e^{-a^{2}} - e^{-(n+1)^{2}a} \rightarrow e^{-a^{2}}~\mbox {as}~n \rightarrow \infty . \end {aligned}\]

      So we conclude that \(\int _{0}^{a}f(x)\,dx \neq \lim _{n \rightarrow \infty }\int _{0}^{a}f_{n}(x)\,dx\).

  • 4.19 Using the fact that \(|e^{-ixy}| \leq 1\), by Lemma 4.2.2 extended to the complex case we have that

    \[ |\widehat {f}(y)| \leq \int _{\R }|e^{-ixy}|\;|f(x)|\,dx \leq \int _{\R }|f(x)|\,dx < \infty .\]

    For the linearity, we have

    \[\begin {aligned} \widehat {af + bg}(y) & = \int _{\R }e^{-ixy}(a f(x) + b g(x))\,dx \\ & = a\int _{\R }e^{-ixy}f(x)\,dx + b\int _{\R }e^{-ixy}g(x)\,dx \\ & = a \widehat {f}(y) + b \widehat {g}(y). \end {aligned}\]

  • 4.20 \(x \rightarrow {\1}_{\mathbb {Q}}(x)\cos (nx)\) is \(\mc {L}^1\) as \(|{\1}_{\mathbb {Q}}(x)\cos (nx)| \leq |\cos (nx)|\) for all \(x \in \R \) and \(x \rightarrow \cos (nx)\) is \(\mc {L}^1\) . Similarly \(x \rightarrow {\1}_{\mathbb {Q}}(x)\sin (nx)\) is \(\mc {L}^1\). So the Fourier coefficients \(a_{n}\) and \(b_{n}\) are well-defined as Lebesgue integrals. As \(|\cos (nx)| \leq 1\), we have \(a_{n} = 0\) for all \(n \in \mathbb {Z}_{+}\) since,

    \[\begin {aligned} |a_{n}| & \leq \frac {1}{\pi }\int _{-\pi }^{\pi }{\1}_{\mathbb {Q}}(x)|\cos (nx)|\,dx \\ & \leq \frac {1}{\pi }\int _{-\pi }^{\pi }{\1}_{\mathbb {Q}}(x)\,dx = 0. \end {aligned}\]

    By a similar argument, \(b_{n} = 0\) for all \(\nN \). So it is possible to associate a Fourier series to \({\1}_{\mathbb {Q}}\), but this Fourier series will be identically zero.

    This illustrates that pointwise convergence is not the right tool for examining convergence of Fourier series!

  • 4.21 Note that \(f_{a}\) is measurable by Problem 3.4(a). To show that \(f_a\in \mc \Lone \), we use

    \[ \int _{\R }|f_{a}(x)|\,dx = \int _{\R }|f(x - a)|\,dx = \int _{\R }|f(x)|\,dx < \infty .\]

    Then

    \[ \widehat {f_{a}}(y) = \int _{\R }e^{-ixy}f(x-a) dx,\]

    and the result follows on making a change of variable \(u = x-a\).

  • 4.22 Let \(y \in \R \) and \((y_{n})\) be an arbitrary sequence converging to \(y\) as \(n \rightarrow \infty \). We need to show that the sequence \((f(y_{n}))\) converges to \(f(y)\). We have

    \begin{align*} |\widehat {f}(y_{n}) - \widehat {f}(y)| & = \left |\int _{\R }e^{-ixy_{n}}f(x)dx - \int _{\R }e^{-ixy}f(x)\,dx\right |\\ & \leq \int _{\R }|e^{-ixy_{n}}- e^{-ixy}|.|f(x)|\,dx. \end{align*} Now \(|e^{-ixy_{n}}- e^{-ixy}| \leq |e^{-ixy_{n}}| + |e^{-ixy}| = 2\) and the function \(x \rightarrow 2f(x)\) is \(\mc {L}^1\). Also the mapping \(y \rightarrow e^{-ixy}\) is continuous, and so \(\lim _{n \rightarrow \infty }|e^{-ixy_{n}}- e^{-ixy}| = 0\). The result follows from these two facts, and the use of Lebesgue’s dominated convergence theorem.

  • 4.23 To prove that \(y \rightarrow \widehat {f}(y)\) is differentiable, we need to show that

    \(\lim _{h \rightarrow 0}(\widehat {f}(y + h) - \widehat {f}(y))/h\) exists for each \(y \in \R \). We have

    \[\begin {aligned} \frac {\widehat {f}(y + h) - \widehat {f}(y)}{h} & = \frac {1}{h}\int _{\R }(e^{-ix(y + h)} - e^{-ixy})f(x)\,dx \\ & = \int _{\R }e^{-ixy}\left (\frac {e^{-ihx} - 1}{h}\right )f(x)\,dx. \end {aligned}\]

    Since \(|e^{-ixy}| \leq 1\), and using the hint with \(b = hx\), we get

    \begin{align*} \left |\frac {\widehat {f}(y + h) - \widehat {f}(y)}{h}\right | \;\leq \; \int _{\R }\left |\frac {e^{-ihx} - 1}{h}\right |\;|f(x)|\,dx \;\leq \; \int _{\R }|x||f(x)|\,dx < \infty . \end{align*} Then we can use Lebesgue’s dominated convergence theorem to get

    \[\begin {aligned} \lim _{h \rightarrow 0}\frac {\widehat {f}(y + h) - \widehat {f}(y)}{h} & = \int _{\R }e^{-ixy}\lim _{h \rightarrow 0}\left (\frac {e^{-ihx} - 1}{h}\right )f(x)\,dx \\ & = -i\int _{\R }e^{-ixy}xf(x)\,dx = -i\widehat {g}(y), \end {aligned}\]

    and the result is proved. In the last step we used

    \[ \lim _{h \rightarrow 0}\frac {e^{-ihx} - 1}{h} = \left .\frac {d}{dy}e^{-ixy}\right |_{y=0} = -ix.\]

  • 4.24 First observe that by the hint and Problem 3.4 part (a), the mapping \((x, y) \rightarrow f(x-y)g(y)\) is measurable. Let \(K = \sup _{x \in \R }|g(x)| < \infty \), since \(g\) is bounded. Then since \(f\in \Lone \) we have

    \[ |(f*g)(x)| \leq \int _{\R }|f(x - y)|\,|g(y)|dy \leq K \int _{\R }|f(x - y)|dy = K \int _{\R }|f(y)\,|dy < \infty .\]

    We also have by Fubini’s theorem

    \begin{align*} \int _{\R }\int _{\R }|(f(x-y)g(y)|dydx &\leq \int _{\R }\left (\int _{\R }|f(x - y)|\,|g(y)|\,dy \right )dx \\ &= \int _{\R }\left (\int _{\R }|f(x-y)|dx \right )|g(y)|\,dy \\ &= \int _{\R }|f(x)|\,dx \int _{\R }|g(y)|\,dy < \infty , \end{align*} from which it follows that \(f*g\in \Lone \).

    By a similar argument using Fubini’s theorem, we have that

    \begin{align*} \widehat {f * g}(y) & = \int _{\R }e^{-ixy}\int _{\R }f(x -z)g(z)\,dz\, dx \\ & = \int _{\R }\left (\int _{\R }e^{-iy(u + z)}f(u)du \right )g(z)\,dz \\ & = \int _{\R }e^{-iyu}f(u)du. \int _{\R }e^{-iyz}g(z)\,dz\\ & = \widehat {f}(y)\widehat {g}(y), \end{align*} where we used that change of variable \(x = u + z\).

Chapter 5
  • 5.1 Monotone Convergence Theorem. Let \((X_{n})\) be an increasing sequence of non-negative random variables that converges almost surely to a random variable \(X\). Then \(\E [X_n]\to \E [X]\).

    Dominated Convergence Theorem. Let \((X_{n})\) be a sequence of random variables that converges pointwise to a random variable \(X\). Suppose that there exists a random variable \(Y\in \Lone \) such that \(|X_{n}| \leq Y\) almost surely, for all \(n\in \N \). Then \(X\in \Lone \) and \(\E [X_n]\to \E [X]\).

    Markov’s Inequality. Let \(X\) be a non-negative random variable and let \(c>0\). Then \(\P [X\geq c]\leq \frac {1}{c}\E [X]\).

    Chebyshev’s Inequality. Let \(X\) be a non-negative random variable and let \(c>0\). Then \(\P [X\geq c]\leq \frac {1}{c^2}\E [X^2]\).

    Theorem 4.4.1. Let \(X\) be a non-negative random variable on the probability space \((\Omega ,\mc {F},\P )\). Then \(\nu :\mc {F}\to [0,\infty ]\) by \(\nu (A)=\E [\1_A X]\) is a measure.

    Fatou’s Lemma. Let \((X_{n})\) be a sequence of non-negative random variables. Then \(\E [\liminf _n X_n]\leq \liminf _n \E (X_{n})\).

  • 5.2 Put \(|X-\E [X]|\) in place of \(X\) in the version of Chebyshev’s inequality in Exercise 5.1, to obtain \(\P [|X-\E [X]|\geq c]\leq \frac {1}{c}\E \l [(X-\E [X])^2\r ]=\frac {\var (X)}{c}\).

  • 5.3 \(\P [U\in A]=\int _A \frac {1}{b-a}\,dx=\frac {\lambda (A)}{b-a}\), where \(\lambda \) denotes Lebesgue measure.

  • 5.4

    • (a) \(\P [X>x]=1-\P [X\leq x]=F(x)\) and \(\P [x<X\leq y]=\P [X\leq y]-\P [X\leq x]=F(y)-F(x)\).

    • (b) Let \((a_{n})\) be an increasing sequence that tends to \(\infty \). Define \(A_{n} = \{\omega \in \Omega ; X(\omega ) \leq a_{n}\}\). Then \((A_{n})\) increases to \(\Omega \) and by Lemma 5.1.1,

      \[\lim _{x \rightarrow \infty }F(x) = \lim _{n \rightarrow \infty }\P [A_{n}] = \P [\Omega ] = 1.\]

      Similarly, let \(B_{n} = \{\omega \in \Omega ; X(\omega ) \leq - a_{n}\}\). Then \((B_{n})\) decreases to \(\emptyset \) and by Lemma 5.1.1,

      \[\lim _{x \rightarrow -\infty }F(x) = \lim _{n \rightarrow \infty }\P [B_{n}] = \P [\emptyset ] = 0.\]

  • 5.5 From Lemma 5.2.1 we have that \(x\mapsto F_X(x)=\P [X\leq x]\) is right-continuous and monotone increasing. At each \(x\) such that \(\P [X=x]>0\) the function \(x\mapsto F_X(x)\) has an upwards jump. The region it jumps through is \(Q_x=(F_X(x-),F_X(x))\), which is non-empty open interval, and in particular contains a rational number \(q_x\) (in fact, infinitely many rationals, but one will do).

    Hence, for each \(x\) with \(\P [X=x]\) we have a rational number \(q_x\), and because \(F\) is increasing we have \(q_x\neq q_y\) whenever \(x\neq y\). Therefore \(x\mapsto q_x\) is an injective map from \(\{x\in \R \-\P [X=x]>0\}\) to \(\Q \). Since \(\Q \) is countable, so is \(\{x\in \R \-\P [X=x]>0\}\).

  • 5.6 Recall that \(\E [X]=\int _\Omega X\,d\P \). If \(X:\Omega \to \R \) is a simple function given by \(X(\omega )=\sum _{i=1}^n c_i\1_{A_i}\) then

    \[\frac {1}{m(\Omega )}\int _\Omega X\,dm=\frac {1}{m(\Omega )}\sum _{i=1}^n c_im(A_i)=\sum _{i=1}^n c_i \frac {m(A_i)}{m(\Omega )}=\sum _{i=1}^n c_i \P [A_i]=\E [X].\]

    For non-negative measurable \(X:\Omega \to \R \), by Theorem 3.5.2 take a sequence of non-negative increasing simple functions \(X_n:\Omega \to \R \) such that \(X_n(\omega )\to X(\omega )\) for all \(\omega \in \Omega \). From what we have already proved, we have \(\frac {1}{m(\Omega )}\int _\Omega X_n\,dm = \E [X_n]\) for all \(n\in \N \). Letting \(n\to \infty \), the monotone convergence theorem gives that \(\frac {1}{m(\Omega )}\int _S X\,dm = \E [X]\).

    Lastly, if \(X\in \mc {L}^1\) then we may write \(X=X_+-X_-\), where \(X_+,X_-:S\to \R \) are non-negative and measurable. From what we have already proved we have

    \[\frac {1}{m(\Omega )}\int _\Omega X\,dm = \frac {1}{m(\Omega )}\int _\Omega X_+\,dm \;-\; \frac {1}{m(\Omega )}\int _\Omega X_-\,dm = \E [X_+]-\E [X_-]=\E [X].\]

    as required.

  • 5.7

    • (a) Define \(B_n=\cap _{i=1}^n A_i\). Then \((B_n)\) is a decreasing sequence of sets and, since \(\P \) is a finite measure, by Lemm a5.1.1 we have \(\P [B_n]\to \P [\cap _{i=1}^\infty B_i]\) as \(n\to \infty \). Since \(\cap _{i=1}^\infty A_i=\cap _{i=1}^\infty B_i\) we thus have \(\P [\cap _{i=1}^\infty A_i]=\lim _{n\to \infty }\P [\cap _{i=1}^n A_i]\). Using independence on the right hand side, we obtain

      \[\P [\cap _{i=1}^\infty A_i]=\lim _{n\to \infty }\P [A_1]\P [A_2]\ldots \P [A_n]=\prod _{i=1}^\infty \P [A_i]\]

      as required. Note that the limit on the right hand side exists because \(\P [A_1]\P [A_2]\ldots \P [A_n]\) is decreasing as \(n\) increases.

    • (b) There are many ways to answer this question, but they all focus around the possibility that \(\prod _{n=1}^\infty \P [A_n]\) might be zero, in which case (5.5) might not give us any information.

      For example: if \(0<\P [A_{n}] < 1-\kappa \) for infinitely many \(n\), where \(\kappa >0\) does not depend on \(n\), then \(\prod _{n=1}^{\infty }\P [A_{n}] = 0\) so \(\P \l [\bigcap _{n=1}^{\infty }A_{n}\r ] = \prod _{n=1}^{\infty }\P [A_{n}]\) would hold in, for example, the case where all the \((A_n)\) were disjoint. Disjoints events with non-zero probability are always dependent (note that if one occurs then all the others do not!) so clearly this ‘alternative’ definition is not what we want.

  • 5.8

    • (a) We have \(\P [A\cap B]=\P [A]\cap \P [B]\). Noting that \(A^c\cap B^c=(A\cup B)^c\), we have

      \begin{align*} \P [A^c\cap B^c]=\P [(A\cup B)^c] &=1-\P [A\cup B]\\ &=1-\P [A]-\P [B]+\P [A\cap B]\\ &=1-\P [A]-\P [B]-\P [A]\P [B]\\ &=(1-\P [A])(1-\P [B])\\ &=\P [A^c]\P [B^c]. \end{align*} Hence \(A^c\) and \(B^c\) are independent.

    • (b) If \(A,B \in {\cal B}(\R )\) and \(f,g\) are Borel measurable, then \(f^{-1}(A), g^{-1}(B) \in {\cal B}(\R )\) and so

      \[\begin {aligned} \P [f(X) \in A, g(Y) \in B] & = \P [X \in f^{-1}(A), Y \in g^{-1}(B)]\\ & = \P [X \in f^{-1}(A))\P (Y \in g^{-1}(B)]\\ & = \P [f(X) \in A)\P (g(Y) \in B). \end {aligned}\]

  • 5.9

    • (a) We have \(XY=U^2V(1-V)=0\) because \(V(1-V)=0\), hence \(\E [XY]=0\). Note that \(\E [U]=0\). By independence of \(U\) and \(V\), \(\E [X]=\E [U]\E [V]=0\) and \(\E [Y]=\E [U]\E [1-V]=0\). Hence \(\E [XY]=\E [X]\E [Y]\).

      To see that \(X\) and \(Y\) are not independent, note that \(\{X=0\}=\{V=0\}\) and \(\{Y=0\}=\{V=1\}\). Thus \(\P [X=Y=0]=0\) but that \(\P [X=0]=\P [Y=0]\frac 12\), so \(\P [X=Y=0]\neq \P [X=0]\P [Y=0]\).

    • (b) It is clear that \(X\) and \(Y\) are independent. Considering \(X\) and \(Z\), for any \(a,b\in \{-1,1\}\) we have \(\P [X=a,Z=b]=\P [X=a,XY=b]=\frac {1}{2}\). The same calculation applies to \(X\) and \(Y\). We thus have pairwise independence.

      However, \(\P [X=Y=Z=1]=\P [X=Y=1]=\P [X=Y=1]=\P [X=1]\P [Y=1]=\frac 12\frac 12=\frac 14\) and \(\P [X=1]\P [Y=1]\P [Z=1]=\frac 12\frac 12\frac 12=\frac 18\). Hence \(\{X,Y,Z\}\) is not a set of independent random variables.

  • 5.10 The constant \(M\) provides a dominating function for \((X_n)\) and we have \(\E [M]=M<\infty \), so (the constant function) \(M\) is in \(\Lone \). By the dominated convergence theorem we have \(\E [X_n]\to \E [X]\).

  • 5.11

    • (a) If \(X=k\) and \(k\in \N \) then

      \[\sum \limits _{n=1}^\infty \1_{\{X\geq n\}}=\sum \limits _{n=1}^\infty \1_{\{k\geq n\}}=k=X\]

      because the first \(k\) terms of the sum are \(1\) and the rest are \(0\). Since \(X\) only takes values in \(\N \), we have

      \[X=\sum _{n=1}^\infty \1_{\{X\geq n\}}.\]

      By the monotone convergence theorem

      \[ \E [X] = \E \l [\sum _{n=1}^{\infty }{\1}_{\{X \geq n\}}\r ] = \sum _{n=1}^{\infty }\E [{\1}_{\{X \geq n\}}] = \sum _{n=1}^{\infty }\P [X \geq n].\]

    • (b) Let \(X_1=\lfloor Y\rfloor \) and \(X_2=\lceil Y\rceil \), that is \(Y\) rounded up and down (respectively) to the nearest integer. We can apply part (a) to both \(X_1\) and \(X_2\), since they take values in \(\N \cup \{0\}\).

      Note that for \(n\in \N \) we have \(X_1\geq n\) if and only if \(Y\geq n\). Hence

      \[\sum \limits _{n=1}^\infty \P [Y\geq n] =\sum \limits _{n=1}^\infty \P [X_1\geq n] =\E [X_1] \leq \E [Y].\]

      Here, the last line follows by monotonicity, since \(X_1\leq Y\).

      For \(X_2\) we need to be slightly more careful. We have \(Y\leq X_2\leq Y+1\), hence \(\P [X_2\geq k]\leq \P [Y+1\geq k]\). Hence

      \[ \E [Y]\leq \E [X_2] =\sum \limits _{n=1}^\infty \P [X_2\geq n] \leq \sum \limits _{n=1}^\infty \P [Y+1\geq n] = \1_{\{Y\geq 0\}}+\sum \limits _{n=1}^\infty \P [Y\geq n] = 1+\sum \limits _{n=1}^\infty \P [Y\geq n]. \]

  • 5.12

    • (a) By linearity, the quadratic function \(g(t) = \E [X^{2}] + 2t\E [XY] + t^{2}\E [Y^{2}] \geq 0\) for all \(t \in \R \). A non-negative quadratic function has at most one real root, and hence has a non-positive discriminant (i.e. \(b^2-4ac\leq 0\)). Hence \(4(\E [XY])^{2} - 4\E [X^{2}]\E [Y^{2}]\leq 0\) and the result follows.

    • (b) Put \(Y=1\) in the Cauchy-Schwarz inequality from (a) to get \(\E [|X|] \leq \E [X^{2}]^{\frac {1}{2}} < \infty \). Thus \(X\in \Lone \). By Lemma 4.2.2 we have \(|\E [X]| \leq \E [|X|]\). Combining our two inequalities gives \(|\E [X]|^{2} \leq \E [X^{2}]\).

    • (c) If \(\E [X^2]<\infty \) then by part (b) \(\E [|X|]<\infty \), so by linearity we have

      \[ \var (X) = \E [(X - \mu )^{2}] = \E [X^{2}] - 2\mu \E [X] + \mu ^{2} = \E [X^{2}] - \mu ^{2}.\]

      Hence \(\var (X)<\infty \).

      Conversely, suppose that \(\var (X)<\infty \), and note that by assumption we also have \(\E [X]<\infty \). We can write \(X^2=(X-\E [X])^2+2X\E [X]-\E [X]^2\) and note that all terms here are in \(\Lone \) by our assumptions, thus

      \[\E [X^2]=\var (X)+2\E [X]\E [X]-\E [X]^2=\var (X)-\E [X]^2.\]

      Hence \(\E [X^2]\) is finite.

  • 5.13

    • (a) Since \(e^{-ax} \leq 1\) for all \(x \geq 0\) we have

      \[ \E \l [e^{-aX}\r ] = \int _{0}^{\infty }e^{-ax}dp_{X}(x) \leq \int _{0}^{\infty }dp_{X}(x) = 1.\]

    • (b) Using the fact that \(e^{a|x|} = \sum _{n=0}^{\infty }\frac {a^{n}|x|^{n}}{n!}\) for all \(x \in \R \) we see that for each \(\nN , |x|^{n} \leq \frac {n!}{a^{n}}e^{a|x|}\) and so by monotonicity

      \[ \E \l [|X|^{n}\r ] \leq \frac {n!}{a^{n}}\E \l [e^{a|X|}\r ] < \infty .\]

  • 5.14 If \(f\) is an indicator function: \(f = {\1}_{A}\) for some \(A \in {\cal B}(\R )\):

    \[ \int _{\Omega }{\1}_{A}(X(\omega ))d\P (\omega ) = \P (X \in A) = p_{X}(A) = \int _{\R }{\1}_{A}(x)p_{X}(dx),\]

    and so the result holds in this case. It extends to simple functions by linearity. If \(f\) is non-negative and bounded

    \[\begin {aligned} \int _{\Omega }f(X(\omega ))d\P (\omega ) & = \sup \left \{\int _{\Omega }g(\omega )d\P (\omega ); g~\mbox {simple on}~\Omega , 0 \leq g \leq f \circ X\right \}\\ & = \sup \left \{\int _{\Omega }h(X(\omega ))d\P (\omega ); h~\mbox {simple on}~\R , 0 \leq h \circ X \leq f \circ X\right \}\\ & = \sup \left \{\int _{\R }h(x)p_{X}(dx); h~\mbox {simple},~0 \leq h \leq f\right \}\\ & = \int _{\R }f(x)dp_{X}(x).\end {aligned}\]

    In the general case write \(f = f_{+} - f_{-}\) and similarly for \(X\) (details left for you).

    If \(f\) is non-negative but not necessarily bounded, the result still holds but both \(\int _\Omega f(X(\omega )\,d\P (\omega )\) and \(\int _\R f(x)\,dp_X(x)\) may be (simultaneously) infinite.

  • 5.15

    • (a) Since \(\P [E_n]\geq \eps \) we have \(\P [\Omega \sc E_n]\leq 1-\eps \). Hence for all \(N\in \N \) we have

      \begin{align*} \P [\cup _n E_n] \geq \P \l [\bigcup _{n=1}^N E_n\r ] = 1-\P \l [\Omega \sc \bigcup _{n=1}^N E_n\r ] = 1-\P \l [\bigcap _{n=1}^N \Omega \sc E_n\r ] = 1-\prod _{n=1}^N\P [\Omega \sc E_n] \geq 1-(1-\eps )^N. \end{align*} In the above the third step is obtained using independence of the \((E_n)\). As the above equation holds for all \(N\in \N \) we obtain that \(\P \l [\cup _{n} E_n\r ]=1.\)

    • (b) Suppose that there exists \(\omega \in \Omega \) such that \(\P [\omega ]>0\). Define a sequence of events \((E'_n)\) by setting \(E'_n=E_n\) if \(\omega \notin E_n\) and \(E'_n=\Omega \sc E_n\) if \(\omega \in E_n\). Clearly \(\omega \notin \cup _{n\in \N } E'_n\). By part (a) of exercise 5.8 the events \((E'_n)_{n\in \N }\) are independent of one another. We have \(\P [E'_n]\in (\eps ,1-\eps )\) for all \(n\in \N \) so from exercise 5.15 we have that \(\P [\cup _{n\in \N } E'_n]=1\). However \(\omega \notin \cup _{n\in \N } E'_n\) and \(\P [\omega ]>0\), so this is a contradiction to \(\P [\Omega ]=1\). Hence \(\P [\omega ]=0\) for all \(\omega \in \Omega \).

      For the last part, if \(\Omega \) was countable then we could write \(\Omega =\{\omega _1,\omega _2,\ldots \}\) and by definition of a measure we would have \(1=\P [\Omega ]=\sum _{i\in \N }\P [\omega _i]\). Hence at least one \(\omega _i\) must satisfy \(\P [\omega _i]>0\), but we have already shown that this may not happen.

Chapter 6
  • 6.1 Let us first calculate the moment generating function of the Poisson distribution (or you could look it up). If \(X\) has the Poisson\((\lambda )\) distribution the \(\P [X_n=k]=\frac {\lambda ^ke^{-\lambda }}{k!}\). Hence we have

    \begin{align*} \E [e^{tX}] =\sum _{k=0}^\infty \frac {\lambda ^ke^{-\lambda }}{k!} e^{tk} =e^{-\lambda }\sum _{k=0}^\infty \frac {(e^t\lambda )^k}{k!} =e^{-\lambda }e^{e^t\lambda } =e^{\lambda (e^t-1)}. \end{align*} Putting \(n\lambda \) as the parameter, we obtain \(\E [e^{tX_n}]=e^{n\lambda (e^t-1)}\) as required.

    To derive the Chernoff bound, note that by Markov’s inequality we have

    \[ \P [X_n\geq n\lambda ^2] =\P [e^{tX_n}\leq e^{tn\lambda ^2}] \leq e^{-tn\lambda ^2}\E [e^{tX_n}] = \exp \l (n\lambda (e^t-1-\lambda t)\r ). \]

    Differentiating the above with respect to \(t\) obtains \(n\lambda (e^t-\lambda )\exp \l (n\lambda (e^t-1-\lambda t)\r )\), which is minimized when \(\lambda =e^t\). We thus obtain the Chernoff bound

    \[ \P [X_n\geq n\lambda ^2] \leq \exp \l (n\lambda (\lambda -1-\lambda \log \lambda )\r ). \]

    Note that \(\lambda -1-\lambda \log \lambda \leq 0\) for \(\lambda >0\), with equality only when \(\lambda =1\), so this is a useful bound provided \(\lambda \geq 1\).

  • 6.2 By linearity we have \(\E [X_n]=\sum _{i=1}^n \P [E_i]\) and

    \begin{align*} \E [X_n^2] = \E \l [\sum _{i=1}^n \1_{E_i}^2 + \sum _{i=1}^n\sum _{\stackrel {j=1}{j\neq i}}^n\1_{E_i}\1_{E_j}\r ] = \E \l [\sum _{i=1}^n \1_{E_i} + \sum _{i=1}^n\sum _{\stackrel {j=1}{j\neq i}}^n\1_{E_i\cap E_j}\r ] = \sum _{i=1}^n \P [E_i] + \sum _{i=1}^n\sum _{\stackrel {j=1}{j\neq i}}^n \P [E_i\cap E_j]. \end{align*} Hence,

    \[\frac {\E [X_n]^2}{\E [X_n^2]} =\frac {\l (\sum _{i=1}^n \P [E_i]\r )^2} {\sum _{i=1}^n \P [E_i] + \sum _{i=1}^n\sum _{\stackrel {j=1}{j\neq i}}^n\P [E_i\cap E_j]} =\frac {1} {\l (\sum _{i=1}^n \P [E_i]\r )^{-1}+\frac {\sum _{i=1}^n\sum _{\stackrel {j=1}{j\neq i}}^n\P [E_i\cap E_j]}{\l (\sum _{i=1}^n \P [E_i]\r )^2}}. \]

    Using that \(\P [E_i\cap E_j]\leq \P [E_i]\P [E_j]\) we have

    \[ \frac {\sum _{i=1}^n\sum _{\stackrel {j=1}{j\neq i}}^n\P [E_i\cap E_j]}{\l (\sum _{i=1}^n \P [E_i]\r )^2} \leq \frac {\sum _{i=1}^n \P [E_i]^2+\sum _{i=1}^n\sum _{\stackrel {j=1}{j\neq i}}^n\P [E_i]\P [E_j]}{\l (\sum _{i=1}^n \P [E_i]\r )^2}=1, \]

    hence

    \[\frac {\E [X_n]^2}{\E [X_n^2]} \geq \frac {1}{\l (\sum _{i=1}^n \P [E_i]\r )^{-1}+1}.\]

    The right hand side of the above tends to \(1\) as \(n\to \infty \) because \(\sum _{i=1}^\infty \P [E_i]=\infty \). From the Paley-Zygmund inequality we have \(1\geq \P [X_n\geq 1] \geq \frac {\E [X]^2}{\E [X^2]}\), so by the sandwich rule \(\P [X_n\geq 1]\to 1\) as \(n\to \infty \).

  • 6.3 Here we use Lemma 6.3.2 (and Remark 6.3.4) to check for convexity, by calculating the second derivative. This part is left for you.

    • (a) The function \(x\mapsto x^{4}\) is convex, so Jensen’s inequality gives \(\E [X]^{4} \leq \E [X^{4}]\).

    • (b) The function \(x\mapsto x^{1/4}\) is not convex, but \(x\mapsto -x^{1/4}\) is convex for \(x\geq 0\), so Jensen’s inequality gives \(-\E [X]^{1/4} \leq \E [-X^{1/4}]\), that is \(\E [X^{1/4}]\leq \E [X]^{1/4}\).

    • (c) The function \(x\mapsto e^x\) is convex, so Jensen’s inequality gives \(e^{\E [X]}\leq \E [e^X]\).

    • (d) If we take the case \(\P [X=0]=\P [X=\frac {\pi }{2}]=\frac 12\) then we have \(\E [\cos (X)]=\frac 12(1)+\frac 12(0)=\frac 14\) and \(\cos (\E [X])=\cos (\frac {\pi }{4})=\frac {1}{\sqrt {2}}\). Hence \(\E [\cos (X)] < \cos (\E [X])\).

      If we take the case \(\P [X=\pi ]=\P [X=\frac {\pi }{2}]=\frac 12\) then we have \(\E [\cos (X)]=\frac 12(-1)+\frac 12(0)=-\frac 14\) and \(\cos (\E [X])=\cos (\frac {3\pi }{4})=-\frac {1}{\sqrt {2}}\). Hence \(\E [\cos (X)] > \cos (\E [X])\).

      Therefore no inequality holds in general between \(\E [\cos (X)]\) and \(\cos (\E [X])\).

      The point here is that \(x\mapsto \cos (x)\) is convex on \([\frac {\pi }{2},\pi ]\) and \(x\mapsto -\cos (x)\) is convex on \([0,\frac {\pi }{2}]\), which allows us to produce examples or random variables where the inequality goes both ways.

  • 6.4 The function \(g(x)=-\log (x)\) is convex for \(x>0\) by Lemma 6.3.2, because \(g''(x)=\frac {1}{x^2}\geq 0\). Applying Jensen’s inequality to \(X\), where \(X\) has the uniform distribution on \(\{x_1,\ldots ,x_n\}\), we obtain

    \[-\log \l (\frac {x_1+\ldots +x_n}{n}\r ) \leq -\frac {\log (x_1)+\ldots +\log (x_n)}{n}.\]

    Rearranging, we obtain \(\frac {1}{n}\log (x_1x_2\ldots x_n)=\log (\sqrt [n]{x_1x_2\ldots x_n})\leq \log (\frac {x_1+\ldots +x_n}{n})\). The required result follows since \(x\mapsto \log x\) is monotone increasing.

  • 6.5 Since \(p\leq q\) the function \(g(x)=x^{q/p}\) is convex for \(x\geq 0\), by Lemma 6.3.2 (you should check this). We apply Jensen’s inequality to \(|X|^p\) and \(g(x)\), which gives that

    \[(\E [|X|^p])^{q/p} \leq \E [(|X|^p)^{q/p}]=\E [|X|^q]<\infty .\]

    Hence \(\E [|X|^p]<\infty \).

  • 6.6 We have \(X\geq 0\). By monotonicity this implies \(\E [X]\geq 0\). If \(\E [X]=0\) then it would follow from Lemma 4.2.5 that \(X\eqas 0\), which would imply \(\E [X^2]=0\). This is a contradiction, hence in fact \(\E [X]>0\).

    Rearranging the Paley-Zygmund inequality from (6.3) gives that \(\P [X=0]\leq 1-\frac {\E [X]^2}{\E [X^2]}\).

    To obtain the other case of the minimum, note that \(X=0\) implies that \(|X-\E [X]|\leq \E [X]\), hence \(\P [X=0]\leq \P [|X-\E [X]|\geq \E [X]]\). Using Chebyshev’s inequality from Exercise 5.2, with \(c=\E [X]>0\), we therefore have \(\P [X=0]\leq \frac {\var (X)}{\E [X]^2}=\frac {\E [X^2]-\E [X]^2}{\E [X]^2}=\frac {\E [X]^2}{\E [X]^2}-1\).

Chapter 7
  • 7.1 Let \(E_{m}\) be the event that starting at the \(m\)th toss, \(k\) consecutive heads appear. Then \(\P [E_{m}] = 1/2^{k}\). Set \(A_n=E_{m+kn}\) and then the \((A_n)\) are independent. Moreover, \(\sum _{r=1}^{\infty }\P [A_n] = \infty \), so by the second Borel-Cantelli lemma \(\P [A_n\text { i.o.}]=1\).

  • 7.2

    • (a) You might reasonably think that this is obvious - if \((A_n)\) occurs eventually then it occurs for all \(n\) after some \(N\), and of course there are infinitely many such \(n\) so then \((A_n)\) occurs infinitely often. Let’s give a proof anyway.

      Suppose \(\omega \in \{A_n\text { e.v.}\}=\bigcup _m\bigcap _{n\geq m} A_n\). Then, for at least one value of \(m\), we have \(\omega \in A_n\) for all \(n\geq m\). Take any \(k\in \N \) and pick some \(n\geq \max (m,k)\). Then \(\omega \in \bigcup _{i\geq k}A_i\), but this holds for all \(k\), which implies \(\omega \in \bigcap _k\bigcup _{i\geq k} A_i=\{A_i\text { i.o.}\}\).

    • (b) By the laws of set algebra we have

      \begin{align*} \Omega \sc \{A_n\text { i.o.}\} =\Omega \sc \l (\bigcap _m\bigcup _{n\geq m}A_n\r ) =\bigcup _m\l (\Omega \sc \l (\bigcup _{n\geq m}A_n\r )\r ) =\bigcup _m\bigcap _{n\geq m}\Omega \sc A_n =\{\Omega \sc A_n\text { e.v.}\}. \end{align*} It follows immediately that \(1-\P [A_n\text { i.o.}]=\P [A_n^c\text { e.v.}].\)

    • (c) Define \(B_m=\cap _{n\geq m} A_n\). Note that \(B_m\) is increasing. Note that \(\P [B_m]\leq \P [A_m]\) because \(B_m\sw A_m\). Thus by Lemma 5.1.1 we have

      \begin{equation} \label {eq:evbound} \P [A_n\text { e.v.}]=\P [\cup _m B_m]=\lim _{m\to \infty }\P [B_m]=\liminf _{m\to \infty }\P [B_m]\leq \liminf _{m\to \infty }\P [A_m]. \end{equation}

      In the above, we must switch from \(\lim \) to \(\liminf \) before using \(\P [B_m]\leq \P [A_m]\), because we cannot be sure if \(\lim _n\P [A_n]\) exists (and in general it will not).

      Using (b), we then have

      \begin{equation} \label {eq:iobound} \P [A_n\text { i.o.}]=1-\P [A_n^c\text { e.v.}]\geq 1-\liminf _{m\to \infty }\P [A^c_m]=1-\liminf _{m\to \infty }(1-\P [A_m])=-\liminf _{m\to \infty }-\P [A_m]=\limsup _{m\to \infty }\P [A_m]. \end{equation}

      Note that \(\liminf _{m\to \infty }\P [A_m]\leq \limsup _{m\to \infty }\P [A_m]\) holds automatically. Putting (B.2) and (B.2) together completes the argument.

  • 7.3 The sequence is assumed independent and identically distributed, which means that \(\P [X_n\leq x]=\P [X_1\leq x]\) for all \(x\), and in particular \(\P [X_n\leq x]\to \P [X_1\leq x]\). Thus \(X_n\stackrel {d}{\to } X\) in distribution.

    Let \(a\in (0,1]\). Since \(X_n\) only takes the value \(0\) and \(1\), \(|X_n-X_1|\) only takes the values \(0\) and \(1\). Thus \(\{|X_n-X|>a\}=\{|X_n-X|=1\}=\{X_n=1,X_1=0\}\cup \{X_n=0,X_1=1\}\). For \(n>1\), since \(X_n\) and \(X_1\) are independent we thus have

    \[\P [|X_n-X|>a]=\P [X_n=1,X_1=0]+\P [X_n=0,X_1=1]=\frac 12\frac 12+\frac 12\frac 12=\frac 12\]

    which does not tend to zero as \(n\to \infty \). Thus \(X_n\) does not converge to \(X\) in probability.

  • 7.4

    • (a) We have

      \[\E [|X_n-0|]=\E [X_n]=n\frac {1}{n^2}+0\l (1-\frac 1{n^2}\r )=\frac {1}{n^2}\to 0\]

      so \(X_n\stackrel {L^1}{\to }0\). Since \(\sum \frac {1}{n^2}<\infty \), by the second Borel-Cantelli lemma we have \(\P [X_n=n\text { i.o.}]=0\). Since \(X_n\) is either equal to \(n^2\) or \(0\), this means that \(\P [X_n=0\text { e.v.}]=1\). Thus \(X_n\stackrel {a.s.}{\to } 0\).

    • (b) We have

      \[\E [|X_n-0|]=\E [X_n]=n\frac {1}{n}+0\l (1-\frac 1{n}\r )=1\]

      which does not tend to zero, so \(X_n\) does not converge to \(0\) in \(L^1\). Since \(\sum \frac {1}{n}=\infty \) and the \(X_n\) are independent, by the second Borel-Cantelli lemma we have \(\P [X_n=n\text { i.o.}]=1\). Thus \(X_n\) does not convergence almost surely to \(0\).

    • (c) We have

      \[\E [|X_n-0|]=\E [X_n]=n^2\frac {1}{n^2}+0\l (1-\frac 1{n^2}\r )=1\]

      which does not tend to zero, so \(X_n\) does not converge to \(0\) in \(L^1\). Since \(\sum \frac {1}{n^2}<\infty \), by the second Borel-Cantelli lemma we have \(\P [X_n=n^2\text { i.o.}]=0\). Since \(X_n\) is either equal to \(n^2\) or \(0\), this means that \(\P [X_n=0\text { e.v.}]=1\). Thus \(X_n\stackrel {a.s.}{\to } 0\).

    • (d) We have

      \[\E [|X_n-0|]=\E [X_n]=\sqrt {n}\frac {1}{n}+0\l (1-\frac 1{n}\r )=\frac {1}{\sqrt {n}}\to 0\]

      so \(X_n\stackrel {L^1}{\to }0\). Since \(\sum \frac {1}{n}=\infty \) and the \(X_n\) are independent, by the second Borel-Cantelli lemma we have \(\P [X_n=\sqrt {n}\text { i.o.}]=1\). Thus \(X_n\) does not convergence almost surely to \(0\).

    • (e) In cases (a), (c) and (d) this follows from Lemma 7.2.1. For case (b), since \(X_n\) only takes the values \(0\) and \(n\) we have that \(\{|X_n-0|>a\}=\{X_n=n\}\) whenever \(a<n\), in which case \(\P [|X_n-0|>a]=\P [X_n=n]=\frac {1}{n}\to 0\) as \(n\to \infty \). Thus \(X_n\stackrel {\P }{\to } 0\).

  • 7.5 Let us first show that \(X_n\stackrel {\P }{\to }0\). Given any \(\eps > 0\) and \(c > 0\) we can find \(m\in \N \) such that \(\frac {1}{2^{m}c} < \eps \). The key point is that for \(n>2^m\) the length of the interval \(A_n\) is less than or equal to \(\frac {1}{2^m}\), and since our probability measure is Lebesgue measure this gives \(\E [1_{A_n}]\leq \frac {1}{2^m}\). Hence, for all \(n > 2^m\), by Markov’s inequality

    \[ \P [|X_{n} - 0| > c] = \P [{\1}_{A_{n}} > c] \leq \frac {\E [{\1}_{A_{n}}]}{c} < \frac {1}{2^{m}c} < \eps .\]

    On the other hand \((X_{n})\) cannot converge to \(0\) almost surely since given any \(\nN \), we can find \(m > n\) so that \(A_{m}\) and \(A_{n}\) are disjoint, in which case for any \(\omega \in \Omega \) we have \(X_n(\omega )-X_m(\omega )={\1}_{A_{n}}(\omega ) - {\1}_{A_{m}}(\omega )\) is equal to either \(1-0\) or \(0-1\). In either case, \(|X_n(\omega )-X_m(\omega )|=1\). Since \(n\) was arbitrary and \(m\geq n\), this means \(X_n(\omega )\) cannot converge (to anything) as \(n\to \infty \). In particular, there is no almost sure convergence to zero.

    The best way to think about this question is to rewrite it in terms of probability. Lebesgue measure on \([0,1]\) is the distribution of a uniform random variable \(U\). Then \(X_n=\1_{A_n}\) is equal to \(1\) if that uniform random variable falls into \(A_n\), and zero otherwise. Fix some sampled value for \(U\), and then think about how the sequence \(X_n\) will behave.

  • 7.6

    • (a) We need to show that \(X\) and \(Y\) have the same distribution (i.e. they have the same distribution functions \(F_X(x)=F_Y(x)\)).

      If \(x\in \R \) is such that \(\P [X=x]=0\) then we have both \(\P [X_n\leq x]\to \P [X\leq x]\) and \(\P [X_n\leq x]\to \P [Y\leq x]\), so by uniqueness of limits for real sequences we have \(\P [X\leq x]=\P [Y\leq x]\).

      By exercise 5.5 there are at most countably many \(x\in \R \) such that \(\P [X=x]>0\). Therefore, for all but a countable set of \(x\in \R \), we have \(F_X(x)=F_Y(x)\). From Lemma 5.2.1 we have that both \(F_X\) and \(F_Y\) are right continuous. Hence, in fact, \(F_X(x)=F_Y(y)\) at all \(x\in \R \).

    • (b) By definition of convergence in probability, for any \(a>0\), for any \(\eps >0\) there exists \(N\in \N \) such that, for all \(n\geq N\),

      \[\P [|X_n-X|>a]<\eps \hspace {1pc}\text { and }\hspace {1pc}\P [|X_n-Y|>a]<\eps .\]

      By the triangle inequality we have

      \begin{equation} \label {eq:ps_uniq_limit} \P [|X-Y|>2a]=\P [|X-X_n+X_n-Y|>2a]\leq \P [|X-X_n|+|X_n-Y|>2a]. \end{equation}

      If \(|X-X_n|+|X_n-Y|>2a\) then \(|X-X_n|>a\) or \(|X_n-Y|>a\) (or possibly both). Hence, continuing (B.3),

      \[\P [|X-Y|>2a]\leq \P [|X_n-X|>a]+\P [|X_n-Y|>a]\leq 2\eps .\]

      Since this is true for any \(\eps >0\) and any \(a>0\), we have \(\P [X=Y]=1\).

  • 7.7 Without loss of generality (as in the argument given for the general case) we may assume that \(\E (X_{n}) = 0\) for all \(\nN \). If this is not the case, we consider \(X_{n} - \mu \) in place of \(X_{n}\).

    The proof proceeds in exactly the same way as when the random variables are independent, but needs the following extra calculation:

    \[\begin {aligned} \var (\overline {X_n}) & = \frac {1}{n^{2}}\E \l [\left (\sum _{i=1}^{n}X_{i}\right )^{2}\r ]\\ & = \frac {1}{n^{2}}\sum _{i=1}^{n}\sum _{j=1}^{n}\E [X_{i}X_{j}]\\ & = \frac {1}{n^{2}}\sum _{i=1}^{n}\E [X_{i}^{2}]\\ & = \frac {\sigma ^{2}}{n}. \end {aligned}\]

  • 7.8

    • (a) Write

      \[\min (1,X)=\min (1,X)\1_{\{X<a\}} + \min (1,X)\1_{\{X\geq a\}}\]

      and take expectations, giving

      \begin{align*} \E [\min (1,X)] &= \E [\min (1,X)\1_{\{X<a\}}] + \E [\min (1,X)\1_{\{X\geq a\}}] \\ &\leq \E [a] + \E [\1_{\{X\geq a\}}] \\ &=a + \P [X\geq a]. \end{align*} To deduce the second line of the above we use monotonicity of \(\E \).

    • (b) \((\Leftarrow ):\) Suppose that \(\E [\min (1,X)]\to 0\). For \(a\in (0,1]\) we have

      \[\P [X_n\geq a]=\P [\min (1,X_n)\geq a]\leq \frac {1}{a}\E [\min (1,X_n)]\]

      which tends to zero as \(n\to \infty \). Here we use Markov’s inequality.

      For \(a\geq 1\) we have \(\P [X\geq a]\leq \P [X\geq 1]\to 0\).

      \((\Rightarrow ):\) Suppose that \(X_n\stackrel {\P }{\to }0\). Let \(a\in (0,1]\). Then \(\P [X_n\geq a]\to 0\).

      From part (a) we have

      \[0\leq \E [\min (1,X_n)]\leq a+\P [X_n\geq a].\]

      Let \(\eps >0\). Choose \(a=\frac {\eps }{2}\) and let \(N\in \N \) be large enough that \(\P [X_n\geq a]\leq \frac {\eps }{2}\) for all \(n\geq \N \). Then \(0\leq \E [\min (1,X_n)]\leq \frac {\eps }{2}+\frac {\eps }{2}=\eps \) for all \(n\geq N\). Hence \(\E [\min (1,X_n)]\to 0\).

  • 7.9

    • (a) Write

      \[\begin {aligned} \phi (u) & = \frac {1}{\sqrt {2 \pi }}\int _{\R }e^{iuy}e^{-\frac {1}{2}y^{2}}dy \\ & = \frac {1}{\sqrt {2 \pi }}\int _{\R }\cos (uy)e^{-\frac {1}{2}y^{2}}dy + i \frac {1}{\sqrt {2 \pi }}\int _{\R }\sin (uy)e^{-\frac {1}{2}y^{2}}dy. \end {aligned}\]

      As \(|\cos (uy)ye^{-\frac {1}{2}u^{2}}| \leq |y|e^{-\frac {1}{2}y^{2}}\) and \(|\sin (uy)ye^{-\frac {1}{2}u^{2}}| \leq |y|e^{-\frac {1}{2}y^{2}}\) and \(y \rightarrow |y|e^{-\frac {1}{2}y^{2}}\) is in \(\Lone _\R \), we may apply Problem 4.17 to real and imaginary parts, to deduce that \(u \rightarrow \phi (u)\) is differentiable and its derivative at \(u \in \R \) is

      \[ \phi '(u) = \frac {i}{\sqrt {2 \pi }}\int _{\R }e^{iuy}ye^{-\frac {1}{2}y^{2}}dy.\]

      Now integrate by parts to find that

      \[\begin {aligned} \phi '(u) & = \frac {i}{\sqrt {2 \pi }}\left [-e^{iuy}e^{-\frac {1}{2}y^{2}}\right ]_{-\infty }^{\infty } - \frac {1}{\sqrt {2 \pi }}\int _{-\infty }^{\infty }ue^{iuy}e^{-\frac {1}{2}y^{2}}dy\\ & = -u \phi (u). \end {aligned}\]

      So we have the ordinary differential equation \(\ds \frac {d \phi (u)}{du} = -u \phi (u)\) with initial condition, \(\Phi _{Y}(0) = 1\) and the result follows by using the standard separation of variables technique.

    • (b) First suppose that we have established the case for \(Y\sim N(0,1)\) i.e. we know that \(\phi _{Y}(u) = e^{-\frac {1}{2}u^{2}}\) for all \(u \in \R \). Then since \(X = \mu + \sigma Y\), we have

      \[\begin {aligned} \phi _{X}(u) & = \E (e^{iu (\mu + \sigma Y)})\\ & = e^{iu\mu }\E (e^{i(u\sigma )Y}) = e^{i\mu u - \frac {1}{2}\sigma ^{2}u^{2}},\end {aligned}.\]

  • 7.10 In this case \(\mu = p\) and \(\sigma = \sqrt {p(1-p)}\) and so we can write

    \[ \frac {S_{n} - np}{\sqrt {np(1-p)}} \tod N(0,1)\]

    The random variable \(S_{n}\) is the sum of \(n\) i.i.d. Bernoulli random variables and so is binomial with mean \(np\) and variance \(np(1-p)\). Hence for large \(n\) it is approximately equal to \(N(np,np(1-p))\) in distribution. If we take \(p=\frac {1}{n}\) this allows us to approximate normal random variables with binomial random variables.

  • 7.11 We will apply the AM-GM inequality from Exercise 6.4 to \(x_1=x_2=\ldots =x_{n}=1+\frac {x}{n}\) and \(x_{n+1}=1\), where \(x\geq -n\) (so that \(x_i\geq 0\)). This gives

    \[\l (1+\frac {x}{n}\r )^{\frac {n}{n+1}}\leq \frac {n(1+\frac {x}{n})+1}{n+1}=1+\frac {x}{n+1}.\]

    Raising both sides to the power of \(n+1\) we obtain that \(f_n(x)\leq f_{n+1}(x)\), for \(x\geq -n\) and \(n\in \N \).

    The sequence \(f_n(x)=(1+\frac {x}{n})^n\) is thus a sequence of continuous functions that satisfies \(f_n(x)\leq f_{n+1}(x)\) for all \(x\geq -n\). The pointwise limit is \(f(x)=e^x\). Hence Dini’s theorem (applied to the sequence \((f_n)){n\geq M}\)) gives that the convergence is uniform on all intervals \([-M,M]\) where \(M\in (0,\infty )\).

    Note: we don’t have uniform convergence on \(\R \). We have to work around this difficulty by restricting to an interval \([-M,M]\) instead.

    Uniform convergence implies that \(f_n(x_n)\to f(x)\) whenever \(x_n\to n\). In the notation of Lemma 7.5.2, if we set \(x_n=y+\alpha _n\) then we have \(x_n\to y\). Choosing \(M=\sup _n |x_n|\), which is finite because the convergent sequence \((x_n)\) is bounded, we obtain that \(f_n(y+\alpha _n)\to f(y)\), that is \((1+\frac {y+\alpha _n}{n})^n\to e^y\), as required.

  • 7.12

    • (a) For the deterministic random variable \(X=c\), the only discontinuity of its distribution function is at the value \(c\), where it jumps from \(0\) to \(1\). Therefore, from \(X_n\stackrel {d}{\to } c\) we have that for all \(\eps >0\), \(\P [X_n\leq c-\eps ]\to \P [c<c-\eps ]=0\) and \(\P [X_n\leq c+\eps ]\to \P [c\leq c+\eps ]=1\), as \(n\to \infty \). From the second statement we may deduce that \(\P [X_n\geq c+\eps ]\to 0\) for all \(\eps >0\). We thus have

      \[\P [|X_n-c|\geq \eps ]] = \P [X_n\leq c-\eps ]+\P [X_n\geq c+\eps ]\to 0 \]

      as \(n\to \infty \), which is to say that \(X_n\stackrel {\P }{\to } c\).

    • (b) Let \((X_n)\) be a sequence of independent random variables such that \(X_n\stackrel {\P }{\to } X\). We will argue by contradiction: suppose that \(X\) is not almost equal to a constant. There there exists \(c\in \R \) and \(\eps ,\de >0\) such that \(\P [X\leq c-\eps ]\geq \de \) and \(\P [X\geq c+\eps ]\geq \de \).

      By Lemma 7.2.4 there is a subsequence \((Y_n)\) of \((X_n)\) such that \(Y_n\stackrel {a.s.}{\to } X\). By Lemma 7.2.1 we have that \(Y_n\stackrel {\P }{\to } X\).

      Since \(Y_n\stackrel {\P }{\to } X\), there exists \(N\in \N \) such that for all \(n\geq N\) we have \(\P [|Y_n-X|\geq \eps /2]\leq \de /2\). For \(n\geq N\) we thus have \(\P [Y_n\leq c-\eps /2]\geq \de /2\) and \(\P [Y_n\geq c+\eps /2]\geq \de /2\). Hence also

      \[ \sum _n\P [Y_n\leq c-\eps /2]=\infty \quad \quad \text { and }\quad \quad \sum _n\P [Y_n\geq c+\eps /2]=\infty . \]

      The \((X_n)\) are independent, hence so are the elements of the subsequence \((Y_n)\). From the second Borel-Cantelli lemma we obtain that

      \[\P [Y_n\leq c-\eps /2\text { infinitely often, and }Y_n\geq c+\eps /2\text { infinitely often}]=1.\]

      However, this contradicts the fact that \(Y_n\stackrel {a.s}{\to }X\).

      We have therefore reached a contradiction, so in fact there exists some \(c\in \R \) such that \(\P [X=c]=1\).

  • 7.13

    • (a) We first show that complete convergence implies almost sure convergence. This part does not require independence. Let \(A_\eps =\{|X_n-X|\leq \eps \;\text { e.v.}\}\) and note that \(A_{1/m}\) is a decreasing sequence of sets (as \(m\in \N \) increases), and that

      \[\bigcap _{m\in \N }A_{1/m}=\bigcap _{\eps >0}A_\eps =\{X_n\to X\}.\]

      If \(X_n\) converges completely to \(X\) then, by the first Borel-Cantelli lemma, \(\P [|X_n-X|\geq \eps \;\text { i.o.}]=0\) which implies that \(\P [A_{1/m}]=1\) for all \(m\in \N \). Since \((A_{1/m})\) is decreasing we obtain that \(\P [\cap _{m\in \N }A_{1/m}]=1\), and hence \(\P [X_n\to X]=1\), so we have almost sure convergence.

      Let us now show that if the \((X_n)\) are independent then almost sure convergence implies complete convergence. By part (b) of exercise 7.12 \(X\) we have that \(X_n\stackrel {a.s.}{\to } X=c\) where \(c\in \R \) is deterministic. For any \(\eps >0\), the sequence of events \(E_n(\eps )=\{|X_n-c|\geq \eps \}\) are independent. The fact that \(X_n\stackrel {a.s.}{\to } c\) means that \(\P [E_n(\eps )\text { i.o.}]=0\). Hence by the second Borel-Cantelli lemma (here we use independence) we must have \(\sum _n \P [E_n(\eps )]<\infty \), as required.

    • (b) Let \(U\) be a uniform random variable on \([0,1]\) and define

      \[X_n= \begin {cases} 1 & \text { if }U\leq \frac 1n \\ 0 & \text { otherwise.} \end {cases} \]

      Then \(\P [X_n\to 0]=\P [U>0]=1\), so \(X_n\stackrel {a.s.}{\to } 0\).

      For \(\eps \in (0,1]\) we have \(\P [|X_n-0|\geq \eps ]=\P [X_n=1]=\frac 1n\), so \(\sum _n\P [|X_n-0|\geq \eps ]=\infty \), hence \(X_n\) does not converge completely to \(0\).