Probability with Measure

1.3 Measure

The next question we need to ask is: what does it mean to measure an object? We want a general framework that we can use for concepts such as length, weight and volume. From the last section, we know that we are looking for a function \(m:\Sigma \to [0,\infty ]\) where \(\Sigma \) will be an appropriately chosen \(\sigma \)-field.

Definition 1.3.1 Let \((S, \Sigma )\) be a measurable space. A mapping \(m: \Sigma \rightarrow [0, \infty ]\) is known as a measure if it satisfies
- (M1) \(m(\emptyset ) = 0\).
- (M2) If \((A_{n})_{n\in \N }\) is a sequence of sets where each \(A_{n} \in \Sigma \) and if these sets are pairwise disjoint (meaning that \(A_{n} \cap A_{m} = \emptyset \) if \(m \neq n\)) then
  
  \[m\left (\bigcup _{n=1}^{\infty }A_{n}\right ) = \sum _{n=1}^{\infty }m(A_{n}).\]

Note that (M2) relies on property (S3), to make sure that \(\bigcup _{n=1}^{\infty }A_{n}\in \Sigma \). Property (M2) is often known as \(\sigma \)-additivity. Crucially, it will allow us to take limits in ways that involve measures, thanks to the fact that (M2) considers a (countably) infinite sequence of sets \((A_n)\). Limits are how we rigorously justify that approximations work – consequently we need them, if we are to create a theory that will, ultimately, be useful to experimentalists and modellers.

Property (M2) encapsulates the idea that if we take a collection of objects, then their total measure should equal to the sum of their individual measures – providing they don’t overlap with each other. For example, we might take 1kg of flour and divide it into 3 piles weighing 100g, 250g and 650g. We could also imagine dividing our 1kg of flour into an infinite sequence of piles, with sizes 500g, 250g, 125g, 67.5g, …, that sum (as an infinite series) to 1kg.

Property (M1) is much less remarkable. It simply states that the empty set has zero measure. This represents our feeling that an empty region of space has zero length/weight/volume/etc.

Definition 1.3.2 A triplet \((S, \Sigma , m)\) where \(S\) is a set, \(\Sigma \) is a \(\sigma \)-field on \(S\), and \(m:\Sigma \to [0,\infty ]\) is a measure is known as a measure space.

Definition 1.3.3 The extended real number \(m(S)\) is called the total mass of \(m\). The measure \(m\) is said to be finite if \(m(S) < \infty \).

Let us now assume that \((S,\Sigma ,m)\) is a measure space, and record some useful properties of measures.

• If \(A_{1}, \ldots , A_{n} \in \Sigma \) and are pairwise disjoint then

\[m(A_1\cup \ldots \cup A_n)=m(A_1)+\ldots +m(A_n).\]

This is known as finite additivity of measures. We’ll often think of it as part of (M2).

To prove it we use the same idea on (M2) as we used, for \(\sigma \)-fields, on (S3). Define \(A'_i=A_i\) for \(i\leq n\) and \(A'_i=\emptyset \) for \(i>n\). By (M2) we have \(m\l (\bigcup _{i=1}^\infty A'_i\r )=\sum _{i=1}^\infty m(A'_i)\). By (M1) we have \(m(\emptyset )=0\), so this reduces to \(m(\bigcup _{i=1}^n A_i)=\sum _{i=1}^n m(A_i)\).
• If \(A, B \in \Sigma \) with \(A\sw B\) then \(m(A)\leq m(B)\). This property is known as the monotonicity property of measures.

To prove it write \(B\) as the disjoint union \(B = (B\sc A) \cup A\) and then use that from part 1 we have \(m(B)=m((B\sc A)\cup A)=m(B\sc A)+m(A)\).

Note that if \(m(A)\) is finite then we can subtract \(m(A)\) from both sides, and obtain that
\(\seteqnumber{0}{1.}{2}\)
\begin{equation} \label {eq:meas_complements} m(B\sc A) = m(B) - m(A) \end{equation}

However, this only works if \(m(A)\) is finite!
• If \(A, B \in \Sigma \) are arbitrary (i.e. not necessarily disjoint) then
\(\seteqnumber{0}{1.}{3}\)
\begin{equation} \label {eq:meas_in_out} m(A \cup B) + m(A \cap B) = m(A) + m(B). \end{equation}

The proof of this is Problem 1.4 part (a). Note that if \(m(A \cap B) < \infty \) we have \(m(A \cup B) = m(A) + m(B) - m(A \cap B),\) which you might recognize as similar to something you’ve seen before in probability.

1.3.1 Examples of measures

Here are three important first examples of measure spaces. We can’t yet introduce examples based on length or volume; this will come later in the course.

1. Counting Measure Let \(S\) any set and take \(\Sigma = {\cal P}(S)\). For each \(A \subseteq S\) the counting measure \(m=\#\) is given by

\[\#(A) = \text {the number of elements in}~A.\]

I hope its intuitively obvious to you that this is a measure. We’ll omit checking the details.
2. Dirac Measure This measure is named after the physicist Paul Dirac. Let \((S, \Sigma )\) be an arbitrary measurable space and fix \(x \in S\). The Dirac measure \(m=\de _{x}\) is defined by

\[ \de _{x}(A) = \begin {cases} 1 & \text {if}~x \in A\\ 0 & \text {if}~x \notin A \end {cases} \]

Checking properties (M1) and (M2) in this case is left for you.

A useful fact: if \(S\) is countable then we can write the counting measure \(\#\) in terms of Dirac measures, as \(\#(A) = \sum _{x \in S}\de _{x}(A).\)
3. Probability

Consider a finite set \(S=\{x_1,\ldots ,x_n\}\), which we’ll call the sample space and call each of the \(x_i\) an outcome. Let \(\Sigma \) be the set of all subsets of \(S\). Let \((p_i)_{i=1}^n\) be set of numbers in \([0,1]\) such that \(\sum _{i=1}^n p_i=1\). For \(A\in \Sigma \) we define a measure \(m=\P \) by setting
\(\seteqnumber{0}{1.}{4}\)
\begin{equation} \label {eq:prob_dirac} \P [A]=\sum _{i=1}^n p_i\de _{x_i}(A). \end{equation}

In words, to each outcome \(x_i\) we assign probability \(p_i\), that is \(\P [\{x_i\}]=p_i\). If a set \(A\) contains several outcomes, then its outcome is precisely the sum of their individual probabilities. Finding the probability of an event is just another kind of measuring!

We could treat a countable set \(S\) similarly, with a countable sequence of \(p_i\) and a countable summation (i.e. an infinite series) in (1.5). Probability, however, mostly requires uncountable sample spaces (e.g. the normal distribution on the real line). In this case (1.5) breaks down completely, because there is no such thing as an uncountable sum. One of the outcomes of this course will be a rigorous basis for probability theory with uncountable sample spaces.

In general, a measure \(m\) is said to be a probability measure if its total mass is \(1\) i.e. \(m(S)=1\).
4. Integration

In previous analysis courses you viewed Riemann integration as a way of calculating area – that is, measuring the area of two-dimensional shapes. You’ve probably also viewed various types of integrals as ways of calculating volumes, at some point. So, we should expect integration to fit naturally into our theory of measures.

In Chapter 4 we will introduce Lebesgue integration. Lebesgue integration is ‘the’ modern theory of integration on which mathematical modelling now relies. We will see that Lebesgue integration interacts nicely with measure theory, whilst Riemann integration doesn’t. In fact, Lebesgue integration will also be the key tool for setting up a rigorous basis for probability theory.