last updated: December 10, 2024

Bayesian Statistics

\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \) \(\require {colortbl}\) \(\let \LWRorigcolumncolor \columncolor \) \(\renewcommand {\columncolor }[2][named]{\LWRorigcolumncolor [#1]{#2}\LWRabsorbtwooptions }\) \(\let \LWRorigrowcolor \rowcolor \) \(\renewcommand {\rowcolor }[2][named]{\LWRorigrowcolor [#1]{#2}\LWRabsorbtwooptions }\) \(\let \LWRorigcellcolor \cellcolor \) \(\renewcommand {\cellcolor }[2][named]{\LWRorigcellcolor [#1]{#2}\LWRabsorbtwooptions }\) \(\require {mathtools}\) \(\newenvironment {crampedsubarray}[1]{}{}\) \(\newcommand {\smashoperator }[2][]{#2\limits }\) \(\newcommand {\SwapAboveDisplaySkip }{}\) \(\newcommand {\LaTeXunderbrace }[1]{\underbrace {#1}}\) \(\newcommand {\LaTeXoverbrace }[1]{\overbrace {#1}}\) \(\newcommand {\LWRmultlined }[1][]{\begin {multline*}}\) \(\newenvironment {multlined}[1][]{\LWRmultlined }{\end {multline*}}\) \(\let \LWRorigshoveleft \shoveleft \) \(\renewcommand {\shoveleft }[1][]{\LWRorigshoveleft }\) \(\let \LWRorigshoveright \shoveright \) \(\renewcommand {\shoveright }[1][]{\LWRorigshoveright }\) \(\newcommand {\shortintertext }[1]{\text {#1}\notag \\}\) \(\newcommand {\vcentcolon }{\mathrel {\unicode {x2236}}}\) \(\renewcommand {\intertext }[2][]{\text {#2}\notag \\}\) \(\newenvironment {fleqn}[1][]{}{}\) \(\newenvironment {ceqn}{}{}\) \(\newenvironment {darray}[2][c]{\begin {array}[#1]{#2}}{\end {array}}\) \(\newcommand {\dmulticolumn }[3]{#3}\) \(\newcommand {\LWRnrnostar }[1][0.5ex]{\\[#1]}\) \(\newcommand {\nr }{\ifstar \LWRnrnostar \LWRnrnostar }\) \(\newcommand {\mrel }[1]{\begin {aligned}#1\end {aligned}}\) \(\newcommand {\underrel }[2]{\underset {#2}{#1}}\) \(\newcommand {\medmath }[1]{#1}\) \(\newcommand {\medop }[1]{#1}\) \(\newcommand {\medint }[1]{#1}\) \(\newcommand {\medintcorr }[1]{#1}\) \(\newcommand {\mfrac }[2]{\frac {#1}{#2}}\) \(\newcommand {\mbinom }[2]{\binom {#1}{#2}}\) \(\newenvironment {mmatrix}{\begin {matrix}}{\end {matrix}}\) \(\newcommand {\displaybreak }[1][]{}\) \( \def \offsyl {(\oslash )} \def \msconly {(\Delta )} \) \( \DeclareMathOperator {\var }{var} \DeclareMathOperator {\cov }{cov} \DeclareMathOperator {\Bin }{Bin} \DeclareMathOperator {\Geo }{Geometric} \DeclareMathOperator {\Beta }{Beta} \DeclareMathOperator {\Unif }{Uniform} \DeclareMathOperator {\Gam }{Gamma} \DeclareMathOperator {\Normal }{N} \DeclareMathOperator {\Exp }{Exp} \DeclareMathOperator {\Cauchy }{Cauchy} \DeclareMathOperator {\Bern }{Bernoulli} \DeclareMathOperator {\Poisson }{Poisson} \DeclareMathOperator {\Weibull }{Weibull} \DeclareMathOperator {\IGam }{IGamma} \DeclareMathOperator {\NGam }{NGamma} \DeclareMathOperator {\ChiSquared }{ChiSquared} \DeclareMathOperator {\Pareto }{Pareto} \DeclareMathOperator {\NBin }{NegBin} \DeclareMathOperator {\Studentt }{Student-t} \DeclareMathOperator *{\argmax }{arg\,max} \DeclareMathOperator *{\argmin }{arg\,min} \) \( \def \to {\rightarrow } \def \iff {\Leftrightarrow } \def \ra {\Rightarrow } \def \sw {\subseteq } \def \mc {\mathcal } \def \mb {\mathbb } \def \sc {\setminus } \def \wt {\widetilde } \def \v {\textbf } \def \E {\mb {E}} \def \P {\mb {P}} \def \R {\mb {R}} \def \C {\mb {C}} \def \N {\mb {N}} \def \Q {\mb {Q}} \def \Z {\mb {Z}} \def \B {\mb {B}} \def \~{\sim } \def \-{\,;\,} \def \qed {$\blacksquare $} \CustomizeMathJax {\def \1{\unicode {x1D7D9}}} \def \cadlag {c\`{a}dl\`{a}g} \def \p {\partial } \def \l {\left } \def \r {\right } \def \Om {\Omega } \def \om {\omega } \def \eps {\epsilon } \def \de {\delta } \def \ov {\overline } \def \sr {\stackrel } \def \Lp {\mc {L}^p} \def \Lq {\mc {L}^p} \def \Lone {\mc {L}^1} \def \Ltwo {\mc {L}^2} \def \toae {\sr {\rm a.e.}{\to }} \def \toas {\sr {\rm a.s.}{\to }} \def \top {\sr {\mb {\P }}{\to }} \def \tod {\sr {\rm d}{\to }} \def \toLp {\sr {\Lp }{\to }} \def \toLq {\sr {\Lq }{\to }} \def \eqae {\sr {\rm a.e.}{=}} \def \eqas {\sr {\rm a.s.}{=}} \def \eqd {\sr {\rm d}{=}} \def \approxd {\sr {\rm d}{\approx }} \def \Sa {(S1)\xspace } \def \Sb {(S2)\xspace } \def \Sc {(S3)\xspace } \)

1.2 Equality in distribution

Let \(X\) be a random variable taking values in \(\R ^d\). The law or distribution of \(X\) is the function \(A\mapsto \P [X\in A]\), which tells us how likely the value of \(X\) is to be within the set \(A\sw \R ^d\).

  • Definition 1.2.1 Let \(X\) and \(Y\) be random variables taking values in \(\R ^d\). We say that \(X\) and \(Y\) are equal in distribution if \(\P [X\in A]=\P [Y\in A]\) for all \(A\sw \R ^d\). We write this relationship as \(X\eqd Y\).

In the case \(d=1\) we also have the cumulative distribution function \(F_X(x)=\P [X\leq x]\) which tells us how likely the value of \(X\) is to be less than or equal to \(x\in \R \). For random variables \(X\) and \(Y\) taking values in \(\R \),

\begin{equation} \label {eq:cdf_df_d=1} F_X=F_Y\text { if and only if }X\eqd Y. \end{equation}

We won’t prove (1.2) within this course, although it is hopefully not surprising to you.

  • Example 1.2.2 It is important to understand that Definition 1.2.1 is not the same thing as equality. For example, let \(X\sim \Normal (0,1)\) and let \(Y=-X\). Then

    \[\P [Y\leq x] =\P [-x\leq X] =\int _{-x}^\infty \frac {1}{\sqrt {2\pi }}e^{-y^2/2}\,dy =\int _{-\infty }^x \frac {1}{\sqrt {2\pi }}e^{-z^2/2}\,dz =\P [X\leq x], \]

    where we have made the substitution \(z=-y\). Hence \(F_X=F_Y\) so from (1.2) we have \(X\eqd Y\). But \(X=Y\) only happens when \(X=Y=0\), which has probability zero.

    A perhaps simpler example: if \(X\) and \(Y\) are independent \(\Normal (0,1)\) random variables then \(X\eqd Y\), but \(\P [X=Y]=\P [X-Y=0]\) and \(X-Y\sim \Normal (0,1+1)\eqd \Normal (0,2)\) so \(\P [X=Y]=0\).

Note that we have used the notation \(\sim \Normal (0,1)\) in Example 1.2.2. We might wonder what the difference between the symbols \(\sim \) and \(\eqd \) is. Formally, they have the same meaning, but we tend to use \(\sim \) when we are referring to a named distribution, and \(\eqd \) when we are comparing two existing random variables. That is a convention and not a rule, so you can use \(\sim \) and \(\eqd \) interchangeably if you wish.

1.2.1 Identifying distributions

In this section we give some results that help to identify the relationship \(X\eqd Y\). Note that this also helps us identify when random variables have named distributions. The discrete case is easily dealt with.

  • Lemma 1.2.3 Suppose that \(X\) and \(Y\) are discrete random variables. Then \(X\eqd Y\) if and only if \(p_X=p_Y\).

Note that the statement \(p_X=p_Y\) means that the functions \(p_X\) and \(p_Y\) are equal, that is \(p_X(x)=p_Y(x)\) for all \(x\in \R ^d\). The proof is left for you in Problem 1.9.

The situation for continuous random variables is a bit more complicated. If \(X\) and \(Y\) are continuous random variables with \(f_X=f_Y\), then it is clear from (1.1) that \(X\eqd Y\), but it is possible to have \(X\eqd Y\) and for \(f_X\) and \(f_Y\) to be ‘different in an unimportant way’. You should have already seen examples of this situation, like the following.

  • Example 1.2.4 The probability density functions

    \[ f_X(x)= \begin {cases} 1 & \text { for }x\in (0,1) \\ 0 & \text { otherwise,} \end {cases} \qquad f_Y(y)= \begin {cases} 1 & \text { for }y\in [0,1] \\ 0 & \text { otherwise,} \end {cases} \]

    define random variables \(X\) and \(Y\). Note that \(f_X(x)=f_Y(x)\) for all \(x\in \R \) except for \(x=0\) and \(x=1\), so \(f_X\) and \(f_Y\) are different, but only very slightly! You might think of these as the continuous uniform distributions \(X\sim \Unif ((0,1))\) and \(Y\sim \Unif ([0,1])\), but they are really the same distribution because \(\P [X=0]=\P [X=1]=\P [Y=0]=\P [Y=1]=0\).

We need to handle this point carefully because, in this course, we don’t assume enough mathematical background to explain precisely what we mean by ‘different in an unimportant way’. We do need to know the following facts, however:

  • For a random variable \(X\) with range in \(\R \), changing the value of \(f_X(x)\) on a finite set of \(x\in \R \) will not change the distribution of \(X\) (as in Example 1.2.4).

  • For a random variable \(X\) with range in \(\R ^2\), the same is true, but we can also change the value of \(f_X(x)\) on a finite set of lines (in \(\R ^2\)) without changing the distribution of \(X\).

    Similar things work in higher dimensions too, but we won’t need those.

We sometimes think of random variables as being defined by probability mass functions or probability density functions. This is a slight abuse of terminology: as we have discussed above, the p.m.f. and p.d.f. specify the distribution. If you are asked to ‘find’ the random variable \(X\), or to find the distribution of \(X\), then a statement of the p.m.f. or p.d.f. will suffice. You should always specify the range of values for which the p.m.f. of p.d.f. is non-zero.

1.2.2 Normalizing constants

Often you will find that the p.m.f. or p.d.f of some random variable \(X\) appears in the form

\begin{equation} \label {eq:normalizing_constants} \P [X=x]=\frac {1}{Z}g(x)\qquad \text { or }\qquad f_X(x)=\frac {1}{Z}g(x) \end{equation}

where \(Z\) does not depend on \(x\). In such cases \(Z\) is known as a normalizing constant. Its role is to make sure that p.m.f. sums (over \(x\in R_X\)) to one, and the p.d.f. integrates (again, over \(x\in R_X\)) to one. We have written \(\frac {1}{Z}\) because normalizing constants often appear in a denominator e.g. \(\frac {1}{\sqrt {2\pi }}\) in \(f_{\Normal (0,1)}(x)=\frac {1}{\sqrt {2}\pi }e^{-x^2/2}\), but they don’t have to appear that way up e.g. \(\lambda \) in \(f_{\Exp (\lambda )}(x)=\lambda e^{-\lambda x}\).

  • Lemma 1.2.5 Suppose that \(X\) and \(Y\) are random variables.

    • 1. If \(X\) and \(Y\) are discrete, with probability mass functions in the form \(p_X(x)=\frac {1}{Z}g(x)\) and \(p_Y(x)=\frac {1}{Z'}g(x)\) then \(X\eqd Y\).

    • 2. If \(X\) and \(Y\) are continuous, with probability density functions in the form \(f_X(x)=\frac {1}{Z}g(x)\) and \(f_Y(x)=\frac {1}{Z'}g(x)\) then \(X\eqd Y\).

Proof: \(\offsyl \) Note that in (1.3) the normalizing constant \(Z\) is determined by \(g(x)\); in the discrete case we have \(Z=\sum _{x\in R}g(x)\) and in the continuous case we have \(Z=\int _\R g(x)\,dx\). Hence in both cases, the fact that \(X\) and \(Y\) are random variables implies that \(Z=Z'\). The lemma now follows from Lemma 1.2.3 for the discrete case, and from our discussion below Lemma 1.2.3 for the continuous case.   ∎

  • Example 1.2.6 If \(X\sim \Gam (\alpha ,\beta )\) then \(f_X(x)=\frac {\beta ^\alpha }{\Gamma (a)}x^{\alpha -1}e^{-\beta x}\) for \(x>0\), where \(\Gamma :(0,\infty )\to (0,\infty )\) is the \(\Gamma \)-function. By Lemma 1.2.5, if \(Y\) is any other random variable in the form \(f_Y(y)=(\text {constant})\times x^{\alpha -1}e^{-\beta x}\), then we have \(Y\sim \Gam (\alpha ,\beta )\).

When we have \(p_X(x)=\frac {1}{Z}g(x)\) and \(p_Y(x)=\frac {1}{Z'}g(x)\), as in part 1 of Lemma 1.2.5, it is common to summarize this relationship as \(p_X \propto p_Y\). In words, \(p_X\) is proportional to \(p_Y\). The same applies to part 2 of Lemma 1.2.5. This notation can save time, but we will avoid using it while we are focused on understanding conditioning and Bayesian models in Chapters 1-3. We will begin to use it in Section 4.1, where it will become very helpful in keeping our calculations tidy, and we will discuss it further at that point.