last updated: October 24, 2024

Bayesian Statistics

\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \) \(\require {colortbl}\) \(\let \LWRorigcolumncolor \columncolor \) \(\renewcommand {\columncolor }[2][named]{\LWRorigcolumncolor [#1]{#2}\LWRabsorbtwooptions }\) \(\let \LWRorigrowcolor \rowcolor \) \(\renewcommand {\rowcolor }[2][named]{\LWRorigrowcolor [#1]{#2}\LWRabsorbtwooptions }\) \(\let \LWRorigcellcolor \cellcolor \) \(\renewcommand {\cellcolor }[2][named]{\LWRorigcellcolor [#1]{#2}\LWRabsorbtwooptions }\) \(\require {mathtools}\) \(\newenvironment {crampedsubarray}[1]{}{}\) \(\newcommand {\smashoperator }[2][]{#2\limits }\) \(\newcommand {\SwapAboveDisplaySkip }{}\) \(\newcommand {\LaTeXunderbrace }[1]{\underbrace {#1}}\) \(\newcommand {\LaTeXoverbrace }[1]{\overbrace {#1}}\) \(\newcommand {\LWRmultlined }[1][]{\begin {multline*}}\) \(\newenvironment {multlined}[1][]{\LWRmultlined }{\end {multline*}}\) \(\let \LWRorigshoveleft \shoveleft \) \(\renewcommand {\shoveleft }[1][]{\LWRorigshoveleft }\) \(\let \LWRorigshoveright \shoveright \) \(\renewcommand {\shoveright }[1][]{\LWRorigshoveright }\) \(\newcommand {\shortintertext }[1]{\text {#1}\notag \\}\) \(\newcommand {\vcentcolon }{\mathrel {\unicode {x2236}}}\) \(\renewcommand {\intertext }[2][]{\text {#2}\notag \\}\) \(\newenvironment {fleqn}[1][]{}{}\) \(\newenvironment {ceqn}{}{}\) \(\newenvironment {darray}[2][c]{\begin {array}[#1]{#2}}{\end {array}}\) \(\newcommand {\dmulticolumn }[3]{#3}\) \(\newcommand {\LWRnrnostar }[1][0.5ex]{\\[#1]}\) \(\newcommand {\nr }{\ifstar \LWRnrnostar \LWRnrnostar }\) \(\newcommand {\mrel }[1]{\begin {aligned}#1\end {aligned}}\) \(\newcommand {\underrel }[2]{\underset {#2}{#1}}\) \(\newcommand {\medmath }[1]{#1}\) \(\newcommand {\medop }[1]{#1}\) \(\newcommand {\medint }[1]{#1}\) \(\newcommand {\medintcorr }[1]{#1}\) \(\newcommand {\mfrac }[2]{\frac {#1}{#2}}\) \(\newcommand {\mbinom }[2]{\binom {#1}{#2}}\) \(\newenvironment {mmatrix}{\begin {matrix}}{\end {matrix}}\) \(\newcommand {\displaybreak }[1][]{}\) \( \def \offsyl {(\oslash )} \def \msconly {(\Delta )} \) \( \DeclareMathOperator {\var }{var} \DeclareMathOperator {\cov }{cov} \DeclareMathOperator {\Bin }{Bin} \DeclareMathOperator {\Geo }{Geometric} \DeclareMathOperator {\Beta }{Beta} \DeclareMathOperator {\Unif }{Uniform} \DeclareMathOperator {\Gam }{Gamma} \DeclareMathOperator {\Normal }{N} \DeclareMathOperator {\Exp }{Exp} \DeclareMathOperator {\Cauchy }{Cauchy} \DeclareMathOperator {\Bern }{Bernoulli} \DeclareMathOperator {\Poisson }{Poisson} \DeclareMathOperator {\Weibull }{Weibull} \DeclareMathOperator {\IGam }{IGamma} \DeclareMathOperator {\NGam }{NGamma} \DeclareMathOperator {\ChiSquared }{ChiSquared} \DeclareMathOperator {\Pareto }{Pareto} \DeclareMathOperator {\NBin }{NegBin} \DeclareMathOperator {\Studentt }{Student-t} \DeclareMathOperator *{\argmax }{arg\,max} \DeclareMathOperator *{\argmin }{arg\,min} \) \( \def \to {\rightarrow } \def \iff {\Leftrightarrow } \def \ra {\Rightarrow } \def \sw {\subseteq } \def \mc {\mathcal } \def \mb {\mathbb } \def \sc {\setminus } \def \wt {\widetilde } \def \v {\textbf } \def \E {\mb {E}} \def \P {\mb {P}} \def \R {\mb {R}} \def \C {\mb {C}} \def \N {\mb {N}} \def \Q {\mb {Q}} \def \Z {\mb {Z}} \def \B {\mb {B}} \def \~{\sim } \def \-{\,;\,} \def \qed {$\blacksquare $} \CustomizeMathJax {\def \1{\unicode {x1D7D9}}} \def \cadlag {c\`{a}dl\`{a}g} \def \p {\partial } \def \l {\left } \def \r {\right } \def \Om {\Omega } \def \om {\omega } \def \eps {\epsilon } \def \de {\delta } \def \ov {\overline } \def \sr {\stackrel } \def \Lp {\mc {L}^p} \def \Lq {\mc {L}^p} \def \Lone {\mc {L}^1} \def \Ltwo {\mc {L}^2} \def \toae {\sr {\rm a.e.}{\to }} \def \toas {\sr {\rm a.s.}{\to }} \def \top {\sr {\mb {\P }}{\to }} \def \tod {\sr {\rm d}{\to }} \def \toLp {\sr {\Lp }{\to }} \def \toLq {\sr {\Lq }{\to }} \def \eqae {\sr {\rm a.e.}{=}} \def \eqas {\sr {\rm a.s.}{=}} \def \eqd {\sr {\rm d}{=}} \def \approxd {\sr {\rm d}{\approx }} \def \Sa {(S1)\xspace } \def \Sb {(S2)\xspace } \def \Sc {(S3)\xspace } \)

8.3 Markov chain Monte Carlo

We will assume that the parameter space \(\Pi \) is a subset of \(\R ^d\). Recall from Theorems 2.4.1 and 3.1.2 that \(\Pi \) is also the range of both the prior \(\Theta \) and the posterior \(\Theta |_{\{X=x\}}\). As usual, we will assume that the prior distribution \(\Theta \) is a continuous distribution with p.d.f. \(f_\Theta \).

We want to use the Metropolis-Hasting algorithm from Section 8.2 for the purposes of Bayesian inference. Our target is the posterior distribution \(Y=\Theta |_{\{X=x\}}\). Let us think about what we capabilities we need, in order to do this.

  • 1. We first need a choice of joint distribution \((Y,Q)\), and the ability to take samples \(\tilde {y}\) from the proposal distribution \(Q|_{\{Y=y\}}\), for any value of \(y\).

    .

    Random walk case:

    General case:

    Choose \(Q=\Theta |_{\{X=x\}}+Z\), where \(Z\) satisfies \(f_Z(z)=f_Z(-z)\), as in Example 8.2.4.

    Then \(\wt {y}=y+Z\) has the distribution \(Q|_{\{\Theta |_{\{X=x\}}=y\}}\).

    This only works if the parameter \(\theta \) takes values in \(\R \) (or more generally, \(\R ^d\)).

    If we can’t use the symmetric case because e.g. \(\theta \) takes values in some bounded or half-infinite interval, then there is a more difficult choice to make here.

  • 2. The second requirement is that we can calculate the value of \(\alpha \) in (8.1)/(8.4).

    .

    Random walk case:

    General case:

    We can calculate the p.d.f. \(f_{\Theta |_{\{X=x\}}}\) using Theorems 2.4.1 and 3.1.2. In the symmetric case this is all that we need.

    If we can’t use the symmetric case, then we also need to evaluate \(f_{Y|_{\{Q=q\}}}\) and \(f_{Q|_{\{Y=y\}}}\), from our chosen joint distribution \((Y,Q)\). Our choice of \((Y,Q)\) (in the step above) will usually result in us being able to write down the joint p.d.f. \(f_{Y,Q}\), from which we can use Lemma 1.6.1 to calculate \(f_{Y|_{\{Q=q\}}}\) and \(f_{Q|_{\{Y=y\}}}\).

  • 3. The last thing that we require is that the proposals are accepted fairly often, in step 3 of the MH algorithm. As a rough guide, an acceptance rate of \(15-50\%\) is generally viewed as sufficient.

    This happens naturally in many cases. When it does not, some additional techniques are required, some of which appear in the MSc material next semester.

We’ll focus on the symmetric case, for the remainder of Section 8.3, because the choice of \(Q\) and associated complications can be quite difficult in the general case. We’ll write down a full description of the MCMC algorithm for Bayesian inference, for the random walk case, in Section 8.3.1

8.3.1 MCMC algorithm for the random walk case

We start with a (discrete or continuous) Bayesian model \((X,\Theta )\), where the parameter space is \(\Pi =\R ^d\). We want to obtain samples of \(\Theta |_{\{X=x\}}\), and we know the p.d.f. \(f_{\Theta |_{\{X=x\}}}\) from Theorems 2.4.1/3.1.2.

We must first choose an initial point \(y_0\). The value doesn’t matter too much, anywhere ‘in the middle’ of \(\Pi \) or near the mass of \(\Theta \) will do. We must also choose a continuous distribution for \(Z\) satisfying \(f_Z(z)=f_Z(-z)\) for all \(z\in \R \). A common choice is \(Z\sim N(0,\sigma ^2)\), as in Example 8.2.4.

Then, given \(y_m\), we define \(y_{m+1}\) as follows.

  • 1. Sample \(z\) from \(Z\) and set \(\tilde {y}=y_m+z\).

  • 2. Calculate \(\alpha =\min \l (1,\frac {f_{\Theta |_{\{X=x\}}}(\tilde {y})}{f_{\Theta |_{\{X=x\}}}(y_m)}\r )\).

  • 3. Then, set \(y_{m+1}=\begin {cases} \tilde {y} & \text { with probability }\alpha , \\ y_m & \text { with probability }1-\alpha . \end {cases} \)

We repeat steps 1-3 until \(m\) is large enough that values taken by the \(y_m\) are no longer affected by the choice of \(y_0\). This often has to be judged by eye, from a plot of the sequence \((y_m)\).

Repeating the whole procedure obtains multiple samples of \(\Theta |_{\{X=x\}}\), which we can plot in a histogram to get an approximation of the distribution. This is exactly what we already did in Example 8.2.3.