last updated: October 24, 2024

Bayesian Statistics

\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \) \(\require {colortbl}\) \(\let \LWRorigcolumncolor \columncolor \) \(\renewcommand {\columncolor }[2][named]{\LWRorigcolumncolor [#1]{#2}\LWRabsorbtwooptions }\) \(\let \LWRorigrowcolor \rowcolor \) \(\renewcommand {\rowcolor }[2][named]{\LWRorigrowcolor [#1]{#2}\LWRabsorbtwooptions }\) \(\let \LWRorigcellcolor \cellcolor \) \(\renewcommand {\cellcolor }[2][named]{\LWRorigcellcolor [#1]{#2}\LWRabsorbtwooptions }\) \(\require {mathtools}\) \(\newenvironment {crampedsubarray}[1]{}{}\) \(\newcommand {\smashoperator }[2][]{#2\limits }\) \(\newcommand {\SwapAboveDisplaySkip }{}\) \(\newcommand {\LaTeXunderbrace }[1]{\underbrace {#1}}\) \(\newcommand {\LaTeXoverbrace }[1]{\overbrace {#1}}\) \(\newcommand {\LWRmultlined }[1][]{\begin {multline*}}\) \(\newenvironment {multlined}[1][]{\LWRmultlined }{\end {multline*}}\) \(\let \LWRorigshoveleft \shoveleft \) \(\renewcommand {\shoveleft }[1][]{\LWRorigshoveleft }\) \(\let \LWRorigshoveright \shoveright \) \(\renewcommand {\shoveright }[1][]{\LWRorigshoveright }\) \(\newcommand {\shortintertext }[1]{\text {#1}\notag \\}\) \(\newcommand {\vcentcolon }{\mathrel {\unicode {x2236}}}\) \(\renewcommand {\intertext }[2][]{\text {#2}\notag \\}\) \(\newenvironment {fleqn}[1][]{}{}\) \(\newenvironment {ceqn}{}{}\) \(\newenvironment {darray}[2][c]{\begin {array}[#1]{#2}}{\end {array}}\) \(\newcommand {\dmulticolumn }[3]{#3}\) \(\newcommand {\LWRnrnostar }[1][0.5ex]{\\[#1]}\) \(\newcommand {\nr }{\ifstar \LWRnrnostar \LWRnrnostar }\) \(\newcommand {\mrel }[1]{\begin {aligned}#1\end {aligned}}\) \(\newcommand {\underrel }[2]{\underset {#2}{#1}}\) \(\newcommand {\medmath }[1]{#1}\) \(\newcommand {\medop }[1]{#1}\) \(\newcommand {\medint }[1]{#1}\) \(\newcommand {\medintcorr }[1]{#1}\) \(\newcommand {\mfrac }[2]{\frac {#1}{#2}}\) \(\newcommand {\mbinom }[2]{\binom {#1}{#2}}\) \(\newenvironment {mmatrix}{\begin {matrix}}{\end {matrix}}\) \(\newcommand {\displaybreak }[1][]{}\) \( \def \offsyl {(\oslash )} \def \msconly {(\Delta )} \) \( \DeclareMathOperator {\var }{var} \DeclareMathOperator {\cov }{cov} \DeclareMathOperator {\Bin }{Bin} \DeclareMathOperator {\Geo }{Geometric} \DeclareMathOperator {\Beta }{Beta} \DeclareMathOperator {\Unif }{Uniform} \DeclareMathOperator {\Gam }{Gamma} \DeclareMathOperator {\Normal }{N} \DeclareMathOperator {\Exp }{Exp} \DeclareMathOperator {\Cauchy }{Cauchy} \DeclareMathOperator {\Bern }{Bernoulli} \DeclareMathOperator {\Poisson }{Poisson} \DeclareMathOperator {\Weibull }{Weibull} \DeclareMathOperator {\IGam }{IGamma} \DeclareMathOperator {\NGam }{NGamma} \DeclareMathOperator {\ChiSquared }{ChiSquared} \DeclareMathOperator {\Pareto }{Pareto} \DeclareMathOperator {\NBin }{NegBin} \DeclareMathOperator {\Studentt }{Student-t} \DeclareMathOperator *{\argmax }{arg\,max} \DeclareMathOperator *{\argmin }{arg\,min} \) \( \def \to {\rightarrow } \def \iff {\Leftrightarrow } \def \ra {\Rightarrow } \def \sw {\subseteq } \def \mc {\mathcal } \def \mb {\mathbb } \def \sc {\setminus } \def \wt {\widetilde } \def \v {\textbf } \def \E {\mb {E}} \def \P {\mb {P}} \def \R {\mb {R}} \def \C {\mb {C}} \def \N {\mb {N}} \def \Q {\mb {Q}} \def \Z {\mb {Z}} \def \B {\mb {B}} \def \~{\sim } \def \-{\,;\,} \def \qed {$\blacksquare $} \CustomizeMathJax {\def \1{\unicode {x1D7D9}}} \def \cadlag {c\`{a}dl\`{a}g} \def \p {\partial } \def \l {\left } \def \r {\right } \def \Om {\Omega } \def \om {\omega } \def \eps {\epsilon } \def \de {\delta } \def \ov {\overline } \def \sr {\stackrel } \def \Lp {\mc {L}^p} \def \Lq {\mc {L}^p} \def \Lone {\mc {L}^1} \def \Ltwo {\mc {L}^2} \def \toae {\sr {\rm a.e.}{\to }} \def \toas {\sr {\rm a.s.}{\to }} \def \top {\sr {\mb {\P }}{\to }} \def \tod {\sr {\rm d}{\to }} \def \toLp {\sr {\Lp }{\to }} \def \toLq {\sr {\Lq }{\to }} \def \eqae {\sr {\rm a.e.}{=}} \def \eqas {\sr {\rm a.s.}{=}} \def \eqd {\sr {\rm d}{=}} \def \approxd {\sr {\rm d}{\approx }} \def \Sa {(S1)\xspace } \def \Sb {(S2)\xspace } \def \Sc {(S3)\xspace } \)

4.5 The normal distribution with unknown mean and variance

We’ve considered the normal distribution with a fixed mean, but with unknown variance, in Example 4.2.4 and then more generally in Lemma 4.2.2. The situation of fixed variance and unknown mean is treated in Exercise 4.4. We’ll deal here with the case where both the mean and variance are unknown parameters.

In the formulae obtained in Lemma 4.2.2, both variables relating the variance (\(\sigma ^2\) and \(s^2\)) only appeared as \(\frac {1}{\sigma ^2}\) and \(\frac {1}{s^2}\). This suggests that we would obtain neater formulae if we parameterized the variance part as \(\tau =\frac {1}{\sigma ^2}\). The parameter \(\tau \) is known as precision. We will use this form in the next lemma, and also in Exercise 4.4.

The conjugate prior in this case is complicated. It is the Normal-Gamma distribution, written \(\NGam (m,p,a,b)\) with p.d.f.

\begin{align} f_{\NGam (m,p,a,b)}(\mu ,\tau ) \;&=\;f_{\Normal (m,\frac {1}{p\tau })}(\mu )\, f_{\Gam (a,b)}(\tau ) \label {eq:NGamma_pre} \\ &=\; \sqrt {\frac {p\tau }{2\pi }}\exp \l (-\frac {p\tau }{2}(\mu -m)^2\r )\;\; \frac {b^a}{\Gamma (a)}\tau ^{a-1}e^{-b\tau } \notag \\ &\;\propto \tau ^{a-\frac 12}\exp \l (-\frac {p\tau }{2}(\mu -m)^2-b\tau \r ). \label {eq:NGamma} \end{align} You can find this p.d.f. on the reference sheets in Appendix A. Note that it is a two-dimensional distribution, which we will use to construct random versions of the parameters \(\mu \) and \(\tau \) in \(\Normal (\mu ,\frac {1}{\tau })\). The restrictions on the \(\NGam \) parameters are that \(m\in \R \) and \(p,a,b>0\). The range is \(\mu \in \R \) and \(\tau >0\). Here’s a contour plot of the p.d.f. of \(\NGam (2,1,3,2)\) to give some idea of what is going on here:

(image)

Let us note a couple of facts about the \(\NGam (m,p,a,b)\) distribution before we proceed further. If \((U,T)\sim \NGam (m,p,a,b)\) then:

  • The marginal distribution of \(T\) is \(\Gam (a,b)\).

    This follows easily from (4.8). Integrating out \(\mu \) will remove the term \(f_{\Normal (m,\frac {1}{p\tau })}(\mu )\), which is a p.d.f. and integrates to \(1\), leaving only term \(f_{\Gam (a,b)}(\tau )\).

  • The conditional distribution of \(U|_{\{T=\tau \}}\) is \(\Normal (m,\frac {1}{p\tau })\).

    This also follows from (4.8), by applying Lemma 1.6.1. We already know that \(f_T(t)=f_{\Gam (a,b)}(\tau )\), so

    \[f_{U|_{\{T=\tau \}}}(\mu )=\frac {f_{\Normal (m,\frac {1}{p\tau })}(\mu )\, f_{T}(\tau )}{f_{T}(\tau )}=f_{\Normal (m,\frac {1}{p\tau })}(\mu ).\]

Note that we are using \(U\) as a capital \(\mu \) and \(T\) as a capital \(\tau \), to preserve our usual relationship between random variables and the arguments of their probability density functions. These two facts won’t be used in our proof of conjugacy, but hopefully they help explain the formula (4.8) and the picture below it.

Our next goal is to state the conjugacy between \(\NGam \) and the Normal distribution. We will need the sample-mean-variance identity, which states that for all \(x\in \R ^n\)

\begin{equation} \label {eq:sample-mean-variance} \sum _{i=1}^n(x_i-\mu )^2=ns^2 + n(\bar {x}-\mu )^2 \end{equation}

where \(\bar {x}=\frac {1}{n}\sum _{1}^n x_i\) and \(s^2=\frac {1}{n}\sum _{1}^n(x_i-\bar {x})^2\).

  • Remark 4.5.1 \(\offsyl \) To deduce (4.10), let \(Z\) be a random variable with the uniform distribution on \(\{x_1,\ldots ,x_n\}\). Note that \(\E [Z]=\bar {x}\) and \(\var (Z)=s^2\). The identity follows from the fact that \(\var (Z)=\var (Z-\mu )=\E [(Z-\mu )^2]-\E [Z-\mu ]^2\).

  • Lemma 4.5.2 (Normal-NGamma conjugate pair) Let \((X,\Theta )\) be a continuous Bayesian model with model family \(M_{\mu ,\tau }=\Normal (\mu ,\frac {1}{\tau })^{\otimes n}\), with parameters \(\mu \in \R \) and \(\tau >0\). Suppose that the prior is \(\Theta =(U,T)\sim \NGam (m,p,a,b)\) and let \(x\in \R ^n\). Then \(\Theta |_{\{X=x\}}\sim \NGam \l (m^*,p^*,a^*,b^*\r )\) where

    \[\begin {alignedat}{2} m^* &= \frac {n\bar {x}+mp}{n+p} \qquad \qquad && p^* = n+p \\ a^* &= a + \frac {n}{2} && b^* = b + \frac {n}{2}\l (s^2+\frac {p}{n+p}\l (\bar {x}-m\r )^2\r ), \end {alignedat}\]

    where \(\bar {x}=\frac {1}{n}\sum _1^n x_i\) and \(s^2=\frac 1n\sum _1^n (x_i-\bar {x})^2\).

Proof: \(\offsyl \) From Theorem 3.1.2 we have that for all \(\mu \in \R \) and \(\tau >0\),

\begin{align*} f_{(U,T)|_{\{X=x\}}}(\mu ,\tau ) &\propto \l (\prod _{i=1}^n\sqrt {\frac {\tau }{2\pi }}\exp \l (-\frac {\tau (\mu -x_i)^2}{2}\r )\r ) \tau ^{a-\frac 12}\exp \l (-\frac {p\tau }{2}(\mu -m)^2-b\tau \r ) \\ &\propto \tau ^{\frac {n}{2}+a-\frac 12}\exp \l [-\frac {\tau }{2}\l (\sum _{i=1}^n(\mu -x_i)^2 + p(\mu -m)^2+2b\r )\r ] \\ &\propto \tau ^{\frac {n}{2}+a-\frac 12}\exp \l [-\frac {\tau }{2}\l (ns^2+n(\mu -\bar {x})^2 + p(\mu -m)^2+2b\r )\r ] \\ &\propto \tau ^{\frac {n}{2}+a-\frac 12}\exp \l [-\frac {\tau }{2}\mc {Q}(\mu )\r ] \end{align*} To deduce the third line we have used (4.10). We have

\[\mc {Q}(\mu )= \mu ^2 \stackrel {A}{\overbrace {\l (n+p\r )}} -2\mu \stackrel {B}{\overbrace {\l (n\bar {x} +mp\r )}} + \stackrel {C}{\overbrace {\l (ns^2 + n\bar {x}^2+pm^2+2b\r )}}.\]

Completing the square (with the help of the reference sheet) we obtain \(\mc {Q}(\mu )=A(\mu -\frac {B}{A})^2+C-\frac {B^2}{A}\) and hence

\begin{align*} f_{(U,T)|_{\{X=x\}}}(\mu ,\tau ) &\propto \tau ^{\frac {n}{2}+a-\frac 12}\exp \l [-\frac {\tau }{2}A\l (\mu -\frac {B}{A}\r )^2-\tau \frac {1}{2}\l (C-\frac {B^2}{A}\r )\r ]. \end{align*} The right hand side is above in the form of (4.9) and we can now extract the posterior \(\NGam \) parameters. From the exponent of \(\tau \) we have \(a^*=a+\frac {n}{2}\). From the term \((\mu -\ldots )^2\) we have \(m^*=\frac {B}{A}=\frac {n\bar {x}+mp}{n+p}\) and from the coefficient of this term we have \(p^*=A=n+p\). This leaves only

\begin{align*} b^* &=\frac {1}{2}\l (C-\frac {B^2}{A}\r ) \\ &=\frac {1}{2}\l [ns^2 + n\bar {x}^2+pm^2+2b-\frac {1}{n+p}\l (n\bar {x}+mp\r )^2\r ] \\ &=b+\frac {ns^2}{2}+\frac {1}{n+p}\l [(n+p)n\bar {x}^2+(n+p)pm^2-n^2\bar {x}^2+2mpn\bar {x}-m^2p^2\r ] \\ &=b+\frac {ns^2}{2}+\frac {1}{n+p}\l [pn\bar {x}^2+npm^2+2mpn\bar {x}\r ] \\ &=b+\frac {ns^2}{2}+\frac {np}{n+p}(\bar {x}-m)^2, \end{align*} which completes the proof.   ∎

  • Example 4.5.3 Coming back to Example 4.2.4, regarding testing a speed camera, we can now do a Bayesian update where both the mean and variance are unknown parameters.

    (image) (image)

    On the left we’ve taken an \(\NGam (30,\frac {1}{10^2},1,\frac 15)\) prior. Note that \(p=\frac {1}{10^2}\) corresponds to standard deviation \(=10\), so our prior is well spread out about its mean \(m=\mu =30\) on the \(\mu \)-axis. The values chosen for \(a\) and \(b\) ensure that the prior is also well spread out on the \(\tau \)-axis. On the right we have the resulting \(\NGam (30.14, 10.01, 6.00, 1.24)\) posterior. These posterior parameters were computing using Lemma 4.5.2 and with the same ten datapoints as in Example 4.2.4. As we would expect from Example 4.2.4, the mass of the posterior has focused close to \(\mu \approx 30\). The parameter \(\tau \) has focused on a wide range of fairly large values, but remember that \(\tau =\frac {1}{\sigma ^2}\) so the range of likely values for the standard deviation \(\sigma \) will in fact be a small range of small numbers.

    Comparing the predictive densities gives:

    (image)

    Our new predictive distribution is still broadly similar to the manufacturers \(\Normal (30,0.16)\) claim, but it now looks more spread out and the mean remains slightly higher than the manufacturers claim. A point that might worry us is that our new predictive p.d.f. is not close to zero at 31mph, whereas the manufacturer claims that it should be; our data suggests that the camera is more likely to overestimate speeds than the manufacturer has claimed. We can’t reasonably say more without statistical testing, which we’ll study in Chapter 7.

    Think for a moment about how much numerical work has been done to produce a graph of the predictive pdf here. According to (3.5), for each point \(x'\) on the graph, to obtain \(f_{\text {pred}}(x')\) we need to integrate \(f_{\Normal (\mu ,\frac {1}{\tau })}(x')f_{\NGam (30.14, 10.01, 6.00, 1.24)}(\mu ,\tau )\) with respect to both \(\mu \) and \(\tau \), over a region of \(\R ^2\) that includes the spike visible on the posterior density. Numerical integration over \(\R ^2\) is expensive – doing it once is not very noticeable, but doing it repeatedly usually is. The graph was made using \(30\) \(x\)-axis values and took \(255\) seconds to create, using scipy.integrate.dblquad for the integration. If we had three unknown parameters then we would have to integrate in \(\R ^3\), which is even worse. In fact, problems of this type with several parameters quickly become computationally infeasible via direct numerical integration. They need a more subtle numerical technique, which we’ll introduce in Chapter 8.