Probability with Measure

Chapter 5 Probability with Measure

In this chapter we will examine probability theory from the measure theoretic perspective. The realisation that measure theory is the foundation of probability is due to the Russian mathematician A. N. Kolmogorov (1903-1987) who in 1933 published the hugely influential “Grundbegriffe der Wahrscheinlichkeitsrechnung” (in English: Foundations of the Theory of Probability). Since that time, measure theory has underpinned all mathematically rigorous work in probability theory and has been a vital tool in enabling the theory to develop both conceptually and in applications.

We have already noted that a probability is a measure, random variables are measurable functions and expectation is a Lebesgue integral – but it is not fair to claim that “probability theory” can be reduced to a subset of “measure theory”. This is because in probability we model chance and unpredictability, which brings in a set of intuitions and ideas that go well beyond those of weights and measures.

The Polish mathematician Mark Kac (1914-1984) famously described probability theory as “measure theory with a soul.” A less eloquent observation is that the notation tends to be much easier to handle in probability. We introduced probability measures as an example in Section 1.3.1, but let us give a formal definition here.

Definition 5.0.1 A measure \(m\) is said to be a probability measure if it has total mass \(1\).

A measure space \((S,\Sigma ,m)\) is said to be a probability space if \(m\) is a probability measure.

5.1 Probability

In Chapters 5-6 we will work over general probability spaces of the form \((\Omega , {\cal F}, \P )\). An event is a measurable set \(A\in \mc {F}\). We have \(\P [\Omega ]=1\) so

\[ \P [\Omega ] = 1~~~\mbox {and}~~~0 \leq \P [A] \leq 1~\mbox {for all}~A \in {\cal F}.\]

Intuitively, \(\P [A]\) is the probability that the event \(A \in {\cal F}\) takes place. We will generally assign a special status to probability measures and expectations by writing their arguments in square brackets e.g. \(\P [A]\) instead of \(\P (A)\). This just a convention – there is no difference in mathematical meaning.

In probability we often use ‘complement’ notation, that is \(A^c=\Omega \sc A\). The standard formulae \(\P [A^c]=1-\P [A]\) and \(\P [A\cup B]=\P [A]+\P [B]-\P [A\cup B]\) are simply restatements of equations (1.3) and (1.4) in probabilistic notation. We sometimes write \(\P [A\text { and }B]=\P [A\cap B]\) and \(\P [A\text { or }B]=\P [A\cap B]\).

Let us first update the results of Section 1.7 into the language of probability. Recall that a sequence of sets \((A_{n})\) with \(A_{n} \in {\cal F}\) for all \(\nN \) is increasing if \(A_{n} \subseteq A_{n+1}\) for all \(\nN \), and decreasing if \(A_{n} \supseteq A_{n+1}\) for all \(\nN \).

Lemma 5.1.1 Let \(A_n,B_n\in \mc {F}\).
- 1. Suppose \((A_{n})\) is increasing and \(A=\bigcup _n A_n\). Then \(\P [A] = \lim _{n \rightarrow \infty }\P [A_{n}]\).
- 2. Suppose \((B_{n})\) is decreasing and \(B=\bigcap _n B_n\). Then \(\P [B] = \lim _{n \rightarrow \infty }\P [B_{n}]\).

Proof: This is just Lemma 1.7.1 rewritten in the notation of probability. Note that the condition of part 2 holds automatically here, because in probability all events (i.e. measurable sets) have finite measure. ∎

The intuition for the above theorem should be clear. The set \(A_n\) gets bigger as \(n\to \infty \) and, in doing so, gets ever closer to \(A\); the same is true of their probabilities. Similarly for \(B_n\), which gets smaller and closer to \(B\). This result is a probabilistic analogue of the well known fact that monotone increasing (resp. decreasing) sequences of real numbers converge to the respective \(\sup \)s and \(\inf \)s.

Definition 5.1.2 A random variable \(X\) is a measurable function \(X:\Omega \to \R \), where we use the measure spaces \((\Omega , {\cal F})\) and \((\R , {\cal B}(\R ))\).

If \(A \in {\cal B}(\R )\), it is standard to use the notation \(\{X \in A\}\) to denote the event \(X^{-1}(A)\). This event is an element of \(\cal F\), by Definition 5.1.2, which in turn means that the probability \(\P [X\in A]\) is defined. Writing \(X\in A\) allows us to think of \(X\) as an object that takes a random value, and this random value might (or might not) fall into the set \(A\sw \R \). We can thus connect our intuition for probability to the formal machinery of measure theory.

The law or distribution of \(X\) is given by \(p_{X}(B) = \P [X^{-1}(B)]\) for \(B \in {\cal B}(\R )\). Thus

\[ p_{X}(B) = \P [X \in B] = \P [X^{-1}(B)] = \P [\{\omega \in \Omega ; X(\omega ) \in B\}].\]

This equation is the fundamental connection between probability and measure theory. As the next lemma shows, random variables are just another way to think about measures, designed to make it easy for us to think about ‘objects that take a random value’.

Lemma 5.1.3 Let \(X:\Omega \to \R \) be a random variable. The law of \(X\) is a probability measure on \((\R ,\mc {B}(\R ))\).

Proof: We have \(p_X(\R )=\P [X\in R]=1\) and \(p_X(\emptyset )=\P [X\in \emptyset ]=0\), so the total mass is \(1\) and (M1) holds. It remains to check (M2). Let \((A_n)_{n\in \N }\) be a sequence of disjoint Borel sets and set \(A=\cup _n A_n\). Define \(B_n=\cup _{i=1}^n A_n\), which makes \((B_n)\) an increasing sequence of subsets of \(\R \) such that \(\cup _n B_n=\cup _n A_n\). Hence \(\{X\in B_n\}\) is an increasing sequence of subsets of \(\Omega \), with \(\cup _n\{X\in B_n\}=\{X\in \cup _n B_n\}=\{X\in \cup _n A_n\}\). From Lemma 5.1.1 we have

\begin{equation} \label {eq:law_is_meas_1} p_X(B_n)=\P [X\in B_n]\to \P [X\in A]=p_X(A). \end{equation}

Also,

\begin{equation} \label {eq:law_is_meas_2} p_X(B_n)=\P [X\in B_n]=\sum _{i=1}^n \P [X\in A_i]=\sum _{i=1}^n p_X(A_i). \end{equation}

Combining (5.1) and (5.2) gives \(\sum _{i=1}^\infty p_X(A_i)=p_X(A)\), which proves (M2). ∎

Definition 5.1.4 The expectation of \(X\) is the Lebesgue integral

\[ \E [X] = \int _{\Omega }X(\omega )\,d\P (\omega ).\]

According to Definition 4.5.1 this is possibly undefined, and when it is defined it is an extended real number. Two cases cases are worth pointing out:

• \(X\geq 0\), in which case Definition 4.2.1 defines \(\E [X]\in [0,\infty ]\).
• \(X\in \Lone \), which occurs precisely when \(\E [X]\in \R \).

Note also that for all \(A \in {\cal F}\)

\[ \E [{\1}_{A}]= \P [A]\]

by Definition 4.1.1 because \(\1_A:\Omega \to \R \) is a simple function.

By Theorem 3.1.5, essentially anything we can think of doing with random variables will just give us back more random variables. In particular, any Borel measurable function \(f\) from \(\R \) to \(\R \) enables us to construct a new random variable \(f(X)\), which is defined pointwise via \(f(X)(\omega ) = f(X(\omega ))\) for all \(\omega \in \Omega \). For example we may take \(f(x) = x^{n}\) for any \(\nN \), giving rise to the random variable \(X^n\). If \(\E [X_n]\) exists then it is known as the \(n\)^th moment of \(X\).

We often write \(\mu _X=\E [X]\), when it is defined. If \(X\) has a finite second moment then we also write \(\var (X)=\E [(X-\mu )^{2}]\), called the variance of \(X\), which is always defined in this case as a consequence of Problem 5.12. When it is clear which random variable we mean, we might write simply \(\mu \) and \(\sigma \) in place of \(\mu _X,\sigma _X\).

We will study convergence of random variables in Section 7.2. For now, note that in probability we use the term almost surely in place of the measure theoretic almost everywhere. The meaning is the same, for example \(X\eqas Y\) means that \(\P [X=Y]=1\), and \(X_n\toas X\) means that \(\P [X_n\to X]=1\).

The monotone and dominated convergence theorems, Markov’s inequality, all the properties of integrals, and so on, can all be re-written in the language of probability. This is for you to do, with several examples in Exercise 5.1.

\((\Delta )\) Those of you taking MAS61022 can now begin your independent reading of Chapter 6, after solving Exercises 5.1 and 5.2. Chapter 6 does not depend on the rest of Chapter 5.