last updated: October 24, 2024

Bayesian Statistics

\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \) \(\require {colortbl}\) \(\let \LWRorigcolumncolor \columncolor \) \(\renewcommand {\columncolor }[2][named]{\LWRorigcolumncolor [#1]{#2}\LWRabsorbtwooptions }\) \(\let \LWRorigrowcolor \rowcolor \) \(\renewcommand {\rowcolor }[2][named]{\LWRorigrowcolor [#1]{#2}\LWRabsorbtwooptions }\) \(\let \LWRorigcellcolor \cellcolor \) \(\renewcommand {\cellcolor }[2][named]{\LWRorigcellcolor [#1]{#2}\LWRabsorbtwooptions }\) \(\require {mathtools}\) \(\newenvironment {crampedsubarray}[1]{}{}\) \(\newcommand {\smashoperator }[2][]{#2\limits }\) \(\newcommand {\SwapAboveDisplaySkip }{}\) \(\newcommand {\LaTeXunderbrace }[1]{\underbrace {#1}}\) \(\newcommand {\LaTeXoverbrace }[1]{\overbrace {#1}}\) \(\newcommand {\LWRmultlined }[1][]{\begin {multline*}}\) \(\newenvironment {multlined}[1][]{\LWRmultlined }{\end {multline*}}\) \(\let \LWRorigshoveleft \shoveleft \) \(\renewcommand {\shoveleft }[1][]{\LWRorigshoveleft }\) \(\let \LWRorigshoveright \shoveright \) \(\renewcommand {\shoveright }[1][]{\LWRorigshoveright }\) \(\newcommand {\shortintertext }[1]{\text {#1}\notag \\}\) \(\newcommand {\vcentcolon }{\mathrel {\unicode {x2236}}}\) \(\renewcommand {\intertext }[2][]{\text {#2}\notag \\}\) \(\newenvironment {fleqn}[1][]{}{}\) \(\newenvironment {ceqn}{}{}\) \(\newenvironment {darray}[2][c]{\begin {array}[#1]{#2}}{\end {array}}\) \(\newcommand {\dmulticolumn }[3]{#3}\) \(\newcommand {\LWRnrnostar }[1][0.5ex]{\\[#1]}\) \(\newcommand {\nr }{\ifstar \LWRnrnostar \LWRnrnostar }\) \(\newcommand {\mrel }[1]{\begin {aligned}#1\end {aligned}}\) \(\newcommand {\underrel }[2]{\underset {#2}{#1}}\) \(\newcommand {\medmath }[1]{#1}\) \(\newcommand {\medop }[1]{#1}\) \(\newcommand {\medint }[1]{#1}\) \(\newcommand {\medintcorr }[1]{#1}\) \(\newcommand {\mfrac }[2]{\frac {#1}{#2}}\) \(\newcommand {\mbinom }[2]{\binom {#1}{#2}}\) \(\newenvironment {mmatrix}{\begin {matrix}}{\end {matrix}}\) \(\newcommand {\displaybreak }[1][]{}\) \( \def \offsyl {(\oslash )} \def \msconly {(\Delta )} \) \( \DeclareMathOperator {\var }{var} \DeclareMathOperator {\cov }{cov} \DeclareMathOperator {\Bin }{Bin} \DeclareMathOperator {\Geo }{Geometric} \DeclareMathOperator {\Beta }{Beta} \DeclareMathOperator {\Unif }{Uniform} \DeclareMathOperator {\Gam }{Gamma} \DeclareMathOperator {\Normal }{N} \DeclareMathOperator {\Exp }{Exp} \DeclareMathOperator {\Cauchy }{Cauchy} \DeclareMathOperator {\Bern }{Bernoulli} \DeclareMathOperator {\Poisson }{Poisson} \DeclareMathOperator {\Weibull }{Weibull} \DeclareMathOperator {\IGam }{IGamma} \DeclareMathOperator {\NGam }{NGamma} \DeclareMathOperator {\ChiSquared }{ChiSquared} \DeclareMathOperator {\Pareto }{Pareto} \DeclareMathOperator {\NBin }{NegBin} \DeclareMathOperator {\Studentt }{Student-t} \DeclareMathOperator *{\argmax }{arg\,max} \DeclareMathOperator *{\argmin }{arg\,min} \) \( \def \to {\rightarrow } \def \iff {\Leftrightarrow } \def \ra {\Rightarrow } \def \sw {\subseteq } \def \mc {\mathcal } \def \mb {\mathbb } \def \sc {\setminus } \def \wt {\widetilde } \def \v {\textbf } \def \E {\mb {E}} \def \P {\mb {P}} \def \R {\mb {R}} \def \C {\mb {C}} \def \N {\mb {N}} \def \Q {\mb {Q}} \def \Z {\mb {Z}} \def \B {\mb {B}} \def \~{\sim } \def \-{\,;\,} \def \qed {$\blacksquare $} \CustomizeMathJax {\def \1{\unicode {x1D7D9}}} \def \cadlag {c\`{a}dl\`{a}g} \def \p {\partial } \def \l {\left } \def \r {\right } \def \Om {\Omega } \def \om {\omega } \def \eps {\epsilon } \def \de {\delta } \def \ov {\overline } \def \sr {\stackrel } \def \Lp {\mc {L}^p} \def \Lq {\mc {L}^p} \def \Lone {\mc {L}^1} \def \Ltwo {\mc {L}^2} \def \toae {\sr {\rm a.e.}{\to }} \def \toas {\sr {\rm a.s.}{\to }} \def \top {\sr {\mb {\P }}{\to }} \def \tod {\sr {\rm d}{\to }} \def \toLp {\sr {\Lp }{\to }} \def \toLq {\sr {\Lq }{\to }} \def \eqae {\sr {\rm a.e.}{=}} \def \eqas {\sr {\rm a.s.}{=}} \def \eqd {\sr {\rm d}{=}} \def \approxd {\sr {\rm d}{\approx }} \def \Sa {(S1)\xspace } \def \Sb {(S2)\xspace } \def \Sc {(S3)\xspace } \)

Chapter 7 Testing and parameter estimation

In this chapter we discuss aspects of statistical testing and parameter inference, using the Bayesian models set up in earlier chapters. Throughout this chapter we work in the situation of a discrete or absolutely continuous Bayesian model \((X,\Theta )\), where we have data \(x\) and posterior \(\Theta |_{\{X=x\}}\). We keep all of our usual notation: the parameter space is \(\Pi \), the model family is \((M_\theta )_{\theta \in \Pi }\), and the range of the model is \(R\). Note that \(M_\theta \) could have the form \(M_\theta \sim (Y_\theta )^{\otimes n}\) for some random variable \(Y_\theta \) with parameter \(\theta \), corresponding to \(n\) i.i.d. data points.

We have noted in Chapter 5 that an well chosen prior distribution can lead to a more accurate posterior distribution. Statistical testing is often used in situations where multiple different perspectives are involved and this makes the specification of prior beliefs more complicated. For example, trials of medical treatments involve patients, pharmaceutical companies and regulators, all of whom have different levels of trust in each other as wel as potentially different prior beliefs. It is common practice to check how much the results of statistical tests depend upon the choice of prior, often by varying the prior or comparing to a weakly informative prior.

7.1 Hypothesis testing

Hypothesis testing is surprisingly simple within the Bayesian framework. We first need to introduce the way to present the results.

  • Definition 7.1.1 Let \(A\) and \(B\) be events such that \(\P [A\cup B]=1\) and \(A\cap B=\emptyset \). The odds ratio of \(A\) against \(B\) is

    \[O_{A,B}=\frac {\P [A]}{\P [B]}.\]

    It expresses how much more likely \(A\) is than \(B\). For example, \(O_{A,B}=2\) means that \(A\) is twice as likely to occur than \(B\); if \(O_{A,B}=1\) then \(A\) and \(B\) are equally likely.

Take a Bayesian model \((X,\Theta )\) with parameter space \(\Pi \). We split the parameter space into two pieces: \(\Pi =\Pi _0\cup \Pi _1\) where \(\Pi _0\cap \Pi _1=\emptyset \). We consider two competing hypothesis: \(H_0\) is that the unknown parameter \(\theta \) is within the set \(\Pi _0\), and \(H_1\) is that the unknown parameter \(\theta \) is within the set \(\Pi _1\).

  • Definition 7.1.2 The prior odds of \(H_0\) against \(H_1\) is defined to be

    \[\frac {\P [\Theta \in \Pi _0]}{\P [\Theta \in \Pi _1]}.\]

    Given the data \(x\), the posterior odds of \(H_0\) against \(H_1\) is defined to be

    \[\frac {\P [\Theta |_{\{X=x\}}\in \Pi _0]}{\P [\Theta |_{\{X=x\}}\in \Pi _1]}.\]

Note that the prior odds involves the prior \(\Theta \), and the posterior odds involve the posterior \(\Theta |_{\{X=x\}}\), both otherwise the formulae are identical. We assume implicitly that \(\P [\Theta \in \Pi _0]\) and \(\P [\Theta \in \Pi _1]\) are both non-zero, which by Theorems 2.4.1 and 3.1.2 implies that the same is true for \(\Theta |_{\{X=x\}}\). Note also that the prior and posterior odds are only well defined for proper prior and posterior distributions, or else we cannot make sense of the probabilities above.

It is often helpful to get a feel for how much the data has influenced the result of the test. For these purposes we also define the Bayes factor

\begin{equation} \label {eq:bayes_factor} B=\frac {\text {posterior odds}}{\text {prior odds}}. \end{equation}

Our next lemma shows why \(B\) is important. It is equal to the ratio of the likelihoods of the event \(\{X=x\}\), i.e. of the data that we have, conditional on \(\Theta \in \Pi _0\) and \(\Theta \in \Pi _1\). In other words, \(B\) is the ratio of the likelihood of \(H_0\) compared to \(H_1\).

  • Lemma 7.1.3 In the notation above, the Bayes factor satisfies \(B=\frac {L_{X|_{\{\Theta \in \Pi _0\}}}(x)}{L_{X|_{\{\Theta \in \Pi _1\}}}(x)}\) where \(L\) denotes the likelihood function.

Proof: We split the proof into two cases, depending on whether the Bayesian model is discrete or absolutely continuous. In the discrete case we have

\[B =\frac {\P [\Theta |_{\{X=x\}}\in \Pi _0]\P [\Theta \in H_1]}{\P [\Theta |_{\{X=x\}}\in \Pi _1]\P [\Theta \in H_0]} =\frac {\frac {\P [\Theta \in \Pi _0, X=x]}{\P [X=x]}\P [\Theta \in H_1]}{\frac {\P [\Theta \in \Pi _1, X=x]}{\P [X=x]}\P [\Theta \in H_0]} =\frac {\frac {\P [\Theta \in \Pi _0,X=x]}{\P [\theta \in \Pi _0]}}{\frac {\P [\Theta \in \Pi _1,X=x]}{\P [\theta \in \Pi _1]}} =\frac {\P [X|_{\{\Theta \in \Pi _0\}}=x]}{\P [X|_{\{\Theta \in \Pi _1\}}=x]}. \]

We have used equation (1.4) from Lemma 1.4.1 several times here. The continuous case is left for you, in Exercise 7.7   ∎

As a rough guide to interpreting the Bayes factor, the following table1 is often used:

.
Bayes factor Interpretation: evidence in favour of \(H_0\) over \(H_1\)
1 to 3.2 Indecisive / not worth more than a bare mention
3.2 to 10 Substantial
10 to 100 Strong
above 100 Decisive

Note that a high value of \(B\) only says that \(H_0\) should be preferred over \(H_1\). It does not tell us anything objective about how good our model \((M_\theta )\) is; it only tells us that \(X|_{\{\Theta \in \Pi _0\}}\) is a better fit for \(x\) than \(X|_{\{\Theta \in \Pi _1\}}\) is.

Values of the Bayes factor below \(1\) suggest evidence in favour of \(H_1\) over \(H_0\). In such a case we can swap the roles of \(H_0\) and \(H_1\), which corresponds to the Bayes factor changing from \(B\) to \(1/B\), and we can then use the same table to discuss the weight of evidence in favour of \(H_1\) over \(H_0\).

1 From Kass & Raftery (1995).

  • Example 7.1.4 Returning to Example 4.5.3, suppose that we wished to test the hypothesis that the speed camera is, on average, overestimating the speed to cars. According to our posterior, a car travelling at exactly 30mph will have a recorded speed with a \(N(\mu ,\frac {1}{\tau })\) distribution where \((\mu ,\tau )\sim \NGam (30.14, 10.01, 6.00, 1.24)\).

    The speed camera on average overestimates the speed when \(\mu >30\), and underestimates when \(\mu <30\). The probability that \(\mu \) is exactly \(30\) is zero, because our posterior \(\NGam \) is a continuous distribution, so we will simply ignore that possibility. We don’t care about the location of \(\tau \) here so we simply allow it to take any value \(\tau \in (0,\infty )\). This gives us hypothesis

    \begin{align*} H_0&:\text { that }(\mu ,\tau )\sw \Pi _0=(30,\infty )\times (0,\infty ), \\ H_1&:\text { that }(\mu ,\tau )\sw \Pi _1=(-\infty ,30)\times (0,\infty ). \end{align*} We have

    \begin{equation*} \P [(\mu ,\tau )\in \Pi _0]=\int _{30}^\infty \int _0^\infty f_{\NGam (30.14, 10.01, 6.00, 1.24)}(\mu ,\tau )\,d\tau \,d\mu = 0.82, \end{equation*}

    computed numerically and rounded to two decimal places. Note that \(\P [(\mu ,\tau )\in \Pi _1]=1-\P [(\mu ,\tau )\in \Pi _0]\), which gives a posterior odds ratio of

    \[\frac {\P [\Theta |_{\{X=x\}}\in H_0]}{\P [\Theta |_{\{X=x\}}\in H_1]}=\frac {0.82}{1-0.82}=4.56\]

    again rounded to two decimal places. The prior odds ratio, calculated via the same procedure, is exactly \(1\). This occurs because of the symmetry of the prior \(\NGam (30,\frac {1}{10^2},1,\frac 15)\) distribution (this symmetry is visible in the sketch in Example 4.5.3) gives that \(\P [\NGam (30,\frac {1}{10^2},1,\frac 15)\in \Pi _1]=\P [\NGam (30,\frac {1}{10^2},1,\frac 15)\in \Pi _0]=\frac 12\). Hence the Bayes factor for this hypothesis test is

    \begin{equation} B=\frac {4.56}{1.00}=4.56. \end{equation}

    Based on our table above, we have substantial evidence that the speed camera is overestimating speeds.

    A potential problem with our test is that we have not cared about how much the camera is overestimating speeds. The (marginal) mean of \(\mu \) in our posterior distribution is \(30.11\), which is only slightly larger than the true speed \(30\), and this suggests that the error is fairly small. We would need to be careful about communicating the result of our test, to avoid giving the wrong impression.

    Note that we have used a small amount of Bayesian shorthand in this example, by writing \(\mu \) and \(\tau \) for both random variables and samples of these random variables.