Stochastic Processes and Financial Mathematics
(part one)
Chapter 3 Conditional expectation and martingales
We will introduce conditional expectation, which provides us with a way to estimate random quantities based on only partial information. We will also introduce martingales, which are the mathematical way to capture the concept of a fair game.
3.1 Conditional expectation
Suppose \(X\) and \(Z\) are random variables that take on only finitely many values \(\{x_1,\ldots ,x_m\}\) and \(\{z_1,\ldots ,z_n\}\), respectively. In earlier courses, ‘conditional expectation’ was defined as follows:
\(\seteqnumber{0}{3.}{0}\)\begin{eqnarray} \P [X=x_i\|Z=z_j] &=& \P [X=x_i,Z=z_j] / \P [Z=z_j] \notag \\ \E [X\|Z=z_j] &=& \sum _i x_i \P [X=x_i\|Z=z_j] \notag \\ Y = \E [X\|Z] \mbox { where:} && \mbox { if $Z(\om )=z_j$, then $Y(\om )=\E [X\|Z=z_j]$} \label {eq:naive_cond_exp} \end{eqnarray}
You might also have seen a second definition, using probability density functions, for continuous random variables. These definitions are problematic, for several reasons, chiefly (1) its not immediately clear how the two definitions interact and (2) we don’t want to be restricted to handling only discrete or only continuous random variables.
In this section, we define the conditional expectation of random variables using \(\sigma \)-fields. In this setting we are able to give a unified definition which is valid for general random variables. The definition is originally due to Kolmogorov (in 1933), and is sometimes referred to as Kolmogorov’s conditional expectation. It is one of the most important concepts in modern probability theory.
Conditional expectation is a mathematical tool with the following function. We have a probability space \((\Omega ,\mc {F},\P )\) and a random variable \(X:\Omega \to \R \). However, \(\mc {F}\) is large and we want to work with a sub-\(\sigma \)-algebra \(\mc {G}\), instead. As a result, we want to have a random variable \(Y\) such that
-
1. \(Y\) is \(\mc {G}\)-measurable
-
2. \(Y\) is ‘the best’ way to approximate \(X\) with a \(\mc {G}\)-measurable random variable
The second statement on this wish-list does not fully make sense; there are many different ways in which we could compare \(X\) to a potential \(Y\).
Why might we want to do this? Imagine we are conducting an experiment in which we gradually gain information about the result \(X\). This corresponds to gradually seeing a larger and larger \(\mc {G}\), with access to more and more information. At all times we want to keep a prediction of what the future looks like, based on the currently available information. This prediction is \(Y\).
It turns out there is only one natural way in which to realize our wish-list (which is convenient, and somewhat surprising). It is the following:
-
Theorem 3.1.1 (Conditional Expectation) Let \(X\) be an \(L^1\) random variable on \((\Om ,\F ,\P )\). Let \(\G \) be a sub-\(\sigma \)-field of \(\F \). Then there exists a random variable \(Y\in L^1\) such that
-
1. \(Y\) is \(\G \)-measurable,
-
2. for every \(G\in \G \), we have \(\E [Y\1_G]=\E [X\1_G]\).
Moreover, if \(Y'\in L^1\) is a second random variable satisfying these conditions, \(\P [Y=Y']=1\).
-
The first and second statements here correspond respectively to the items on our wish-list.
Since any two such \(Y\) are almost surely equal so we sometimes refer to \(Y\) simply as the conditional expectation of \(X\). This is a slight abuse of notation, but it is commonplace and harmless.
Proof of Theorem 3.1.1 is beyond the scope of this course. Loosely speaking, there is an abstract recipe which constructs \(\E [X|\mc {G}]\). It begins with the random variable \(X\), and then averages out over all the information that is not accessible to \(\mc {G}\), leaving only as much randomness as \(\mc {G}\) can support, resulting in \(\E [X|\mc {G}]\). In this sense the map \(X\mapsto \E [X|\mc {G}]\) simplifies (i.e. reduces the amount of randomness in) \(X\) in a very particular way, to make it \(\mc {G}\) measurable.
It is important to remember that \(\E [X|\mc {G}]\) is (in general) a random variable. It is also important to remember that the two objects
\[\E [X|\mc {G}]\hspace {1pc}\text { and }\hspace {1pc}\E [X|Z=z]\]
are quite different. They are both useful. We will explore the connection between them in Section 3.1. Before doing so, let us look at a basic example.
Let \(X_1,X_2\) be independent random variables such that \(\P [X_i=-1]=\P [X_i=1]=\frac {1}{2}\). Set \(\mc {F}=\sigma (X_1,X_2)\). We will show that
\(\seteqnumber{0}{3.}{1}\)\begin{equation} \label {eq:condexpexguess} \E [X_1+X_2|\sigma (X_1)]=X_1. \end{equation}
To do so, we should check that \(X_1\) satisfies the two conditions in Theorem 3.1.1, with
\(\seteqnumber{0}{3.}{2}\)\begin{align*} X&=X_1+X_2\\ Y&=X_1\\ \mc {G}&=\sigma (X_1). \end{align*} The first condition is immediate, since by Lemma 2.2.5 \(X_1\) is \(\sigma (X_1)\)-measurable i.e. \(Y\in m\mc {G}\). To see the second condition, let \(G\in \sigma (X_1)\). Then \(\1_G\in \sigma (X_1)\) by Lemma 2.4.2 and \(X_2\in \sigma (X_2)\), and these \(\sigma \)-fields are independent, so \(\1_G\) and \(X_2\) are independent. Hence
\(\seteqnumber{0}{3.}{2}\)\begin{align*} \E [(X_1+X_2)\1_G] &=\E [X_1\1_G]+\E [1_G X_2]\\ &=\E [X_1\1_G]+\E [1_G]\E [X_2]\\ &=\E [X_1\1_G]+\P [G].0\\ &=\E [X_1\1_G]. \end{align*} This equation says precisely that \(\E [X\1_G]=\E [Y\1_G]\). We have now checked both conditions, so by Theorem 3.1.1 we have \(\E [X|\mc {G}]=Y\), meaning that \(\E [X_1+X_2|\sigma (X_1)]=X_1\), which proves our claim in (3.2).
The intuition for this, which is plainly visible in our calculation, is that \(X_2\) is independent of \(\sigma (X_1)\) so, thinking of conditional expectation as an operation which averages out all randomness in \(X=X_1+X_2\) that is not \(\mc {G}=\sigma (X_1)\) measurable, we would average out \(X_2\) completely i.e. \(\E [X_2]=0\).
We could equally think of \(X_1\) as being our best guess for \(X_1+X_2\), given only information in \(\sigma (X_1)\), since \(\E [X_2]=0\). In general, guessing \(\E [X|\mc {G}]\) is not so easy!
Relationship to the naive definition \(\offsyl \)
Conditional expectation extends the ‘naive’ definition of (3.1). Naturally, the ‘new’ conditional expectation is much more general (and, moreover, it is what we require later in the course), but we should still take the time to relate it to the naive definition.
To see the connection, we focus on the case where \(X,Z\) are random variables with finite sets of values \(\{x_1,\ldots ,x_n\}\), \(\{z_1,\ldots ,z_m\}\). Let \(Y\) be the naive version of conditional expectation defined in (3.1). That is,
\[Y(\omega )=\sum \limits _{j}\1_{\{Z(\omega )=z_j\}}\E [X|Z=z_j].\]
We can use Theorem 3.1.1 to check that, in fact, \(Y\) is a version of \(\E [X|\sigma (Z)]\). We want to check that \(Y\) satisfies the two properties listed in Theorem 3.1.1.
-
• Since \(Z\) only takes finitely many values \(\{z_1,\ldots ,z_m\}\), from the above equation we have that \(Y\) only takes finitely many values. These values are \(\{y_1,\ldots , y_m\}\) where \(y_j=\E [X|Z=z_j]\). We note
\(\seteqnumber{0}{3.}{2}\)\begin{align*} Y^{-1}(y_j)&=\{\omega \in \Omega \-Y(\omega )=\E [X|Z=z_j]\}\\ &=\{\omega \in \Omega \-Z(\omega )=z_j\}\\ &=Z^{-1}(z_j)\in \sigma (Z). \end{align*} This is sufficient (although we will omit the details) to show that \(Y\) is \(\sigma (Z)\)-measurable.
-
• We can calculate
\(\seteqnumber{0}{3.}{2}\)\begin{align*} \E [Y\1\{Z=z_j\}]&=y_j\E [\1\{Z=z_j\}]\\ &=y_j\P [Z=z_j]\\ &=\sum \limits _i x_i\P [X=x_i|Z=z_j]\P [Z_j=z_j]\\ &=\sum \limits _i x_i \P [X=x_i\text { and }Z=z_j]\\ &=\sum \limits _{i,j}x_i\1_{\{Z=z_j\}}\P [X=x_i\text { and }Z=z_j]\\ &=\E [X\1_{\{Z=z_j\}}]. \end{align*} Properly, to check that \(Y\) satisfies the second property in Theorem 3.1.1, we need to check \(\E [Y\1_G]=\E [X\1_G]\) for a general \(G\in \sigma (Z)\) and not just \(G=\{Z=z_j\}\). However, for reasons beyond the scope of this course, in this case (thanks to the fact that \(Z\) is finite) its enough to consider only \(G\) of the form \(\{Z=z_j\}\).
Therefore, we have \(Y=\E [X|\sigma (Z)]\) almost surely. In this course we favour writing \(\E [X|\sigma (Z)]\) instead of \(\E [X|Z]\), to make it clear that we are looking at conditional expectation with respect to a \(\sigma \)-field.