Bayesian Statistics

$\newcommand{\footnotename}{footnote}$ $\def \LWRfootnote {1}$ $\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}$ $\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}$ $\let \LWRorighspace \hspace $ $\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }$ $\newcommand {\mathnormal }[1]{{#1}}$ $\newcommand \ensuremath [1]{#1}$ $\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } $ $\newcommand {\setlength }[2]{}$ $\newcommand {\addtolength }[2]{}$ $\newcommand {\setcounter }[2]{}$ $\newcommand {\addtocounter }[2]{}$ $\newcommand {\arabic }[1]{}$ $\newcommand {\number }[1]{}$ $\newcommand {\noalign }[1]{\text {#1}\notag \\}$ $\newcommand {\cline }[1]{}$ $\newcommand {\directlua }[1]{\text {(directlua)}}$ $\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}$ $\newcommand {\protect }{}$ $\def \LWRabsorbnumber #1 {}$ $\def \LWRabsorbquotenumber "#1 {}$ $\newcommand {\LWRabsorboption }[1][]{}$ $\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }$ $\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }$ $\def \mathcode #1={\mathchar }$ $\let \delcode \mathcode $ $\let \delimiter \mathchar $ $\def \oe {\unicode {x0153}}$ $\def \OE {\unicode {x0152}}$ $\def \ae {\unicode {x00E6}}$ $\def \AE {\unicode {x00C6}}$ $\def \aa {\unicode {x00E5}}$ $\def \AA {\unicode {x00C5}}$ $\def \o {\unicode {x00F8}}$ $\def \O {\unicode {x00D8}}$ $\def \l {\unicode {x0142}}$ $\def \L {\unicode {x0141}}$ $\def \ss {\unicode {x00DF}}$ $\def \SS {\unicode {x1E9E}}$ $\def \dag {\unicode {x2020}}$ $\def \ddag {\unicode {x2021}}$ $\def \P {\unicode {x00B6}}$ $\def \copyright {\unicode {x00A9}}$ $\def \pounds {\unicode {x00A3}}$ $\let \LWRref \ref $ $\renewcommand {\ref }{\ifstar \LWRref \LWRref }$ $ \newcommand {\multicolumn }[3]{#3}$ $\require {textcomp}$ $\newcommand {\intertext }[1]{\text {#1}\notag \\}$ $\let \Hat \hat $ $\let \Check \check $ $\let \Tilde \tilde $ $\let \Acute \acute $ $\let \Grave \grave $ $\let \Dot \dot $ $\let \Ddot \ddot $ $\let \Breve \breve $ $\let \Bar \bar $ $\let \Vec \vec $ $\require {colortbl}$ $\let \LWRorigcolumncolor \columncolor $ $\renewcommand {\columncolor }[2][named]{\LWRorigcolumncolor [#1]{#2}\LWRabsorbtwooptions }$ $\let \LWRorigrowcolor \rowcolor $ $\renewcommand {\rowcolor }[2][named]{\LWRorigrowcolor [#1]{#2}\LWRabsorbtwooptions }$ $\let \LWRorigcellcolor \cellcolor $ $\renewcommand {\cellcolor }[2][named]{\LWRorigcellcolor [#1]{#2}\LWRabsorbtwooptions }$ $\require {mathtools}$ $\newenvironment {crampedsubarray}[1]{}{}$ $\newcommand {\smashoperator }[2][]{#2\limits }$ $\newcommand {\SwapAboveDisplaySkip }{}$ $\newcommand {\LaTeXunderbrace }[1]{\underbrace {#1}}$ $\newcommand {\LaTeXoverbrace }[1]{\overbrace {#1}}$ $\newcommand {\LWRmultlined }[1][]{\begin {multline*}}$ $\newenvironment {multlined}[1][]{\LWRmultlined }{\end {multline*}}$ $\let \LWRorigshoveleft \shoveleft $ $\renewcommand {\shoveleft }[1][]{\LWRorigshoveleft }$ $\let \LWRorigshoveright \shoveright $ $\renewcommand {\shoveright }[1][]{\LWRorigshoveright }$ $\newcommand {\shortintertext }[1]{\text {#1}\notag \\}$ $\newcommand {\vcentcolon }{\mathrel {\unicode {x2236}}}$ $\renewcommand {\intertext }[2][]{\text {#2}\notag \\}$ $\newenvironment {fleqn}[1][]{}{}$ $\newenvironment {ceqn}{}{}$ $\newenvironment {darray}[2][c]{\begin {array}[#1]{#2}}{\end {array}}$ $\newcommand {\dmulticolumn }[3]{#3}$ $\newcommand {\LWRnrnostar }[1][0.5ex]{\\[#1]}$ $\newcommand {\nr }{\ifstar \LWRnrnostar \LWRnrnostar }$ $\newcommand {\mrel }[1]{\begin {aligned}#1\end {aligned}}$ $\newcommand {\underrel }[2]{\underset {#2}{#1}}$ $\newcommand {\medmath }[1]{#1}$ $\newcommand {\medop }[1]{#1}$ $\newcommand {\medint }[1]{#1}$ $\newcommand {\medintcorr }[1]{#1}$ $\newcommand {\mfrac }[2]{\frac {#1}{#2}}$ $\newcommand {\mbinom }[2]{\binom {#1}{#2}}$ $\newenvironment {mmatrix}{\begin {matrix}}{\end {matrix}}$ $\newcommand {\displaybreak }[1][]{}$ $ \def \offsyl {(\oslash )} \def \msconly {(\Delta )} $ $ \DeclareMathOperator {\var }{var} \DeclareMathOperator {\cov }{cov} \DeclareMathOperator {\Bin }{Bin} \DeclareMathOperator {\Geo }{Geometric} \DeclareMathOperator {\Beta }{Beta} \DeclareMathOperator {\Unif }{Uniform} \DeclareMathOperator {\Gam }{Gamma} \DeclareMathOperator {\Normal }{N} \DeclareMathOperator {\Exp }{Exp} \DeclareMathOperator {\Cauchy }{Cauchy} \DeclareMathOperator {\Bern }{Bernoulli} \DeclareMathOperator {\Poisson }{Poisson} \DeclareMathOperator {\Weibull }{Weibull} \DeclareMathOperator {\IGam }{IGamma} \DeclareMathOperator {\NGam }{NGamma} \DeclareMathOperator {\ChiSquared }{ChiSquared} \DeclareMathOperator {\Pareto }{Pareto} \DeclareMathOperator {\NBin }{NegBin} \DeclareMathOperator {\Studentt }{Student-t} \DeclareMathOperator *{\argmax }{arg\,max} \DeclareMathOperator *{\argmin }{arg\,min} $ \( \def \to {\rightarrow } \def \iff {\Leftrightarrow } \def \ra {\Rightarrow } \def \sw {\subseteq } \def \mc {\mathcal } \def \mb {\mathbb } \def \sc {\setminus } \def \wt {\widetilde } \def \v {\textbf } \def \E {\mb {E}} \def \P {\mb {P}} \def \R {\mb {R}} \def \C {\mb {C}} \def \N {\mb {N}} \def \Q {\mb {Q}} \def \Z {\mb {Z}} \def \B {\mb {B}} \def \~{\sim } \def \-{\,;\,} \def \qed {$\blacksquare $} \CustomizeMathJax {\def \1{\unicode {x1D7D9}}} \def \cadlag {c\`{a}dl\`{a}g} \def \p {\partial } \def \l {\left } \def \r {\right } \def \Om {\Omega } \def \om {\omega } \def \eps {\epsilon } \def \de {\delta } \def \ov {\overline } \def \sr {\stackrel } \def \Lp {\mc {L}^p} \def \Lq {\mc {L}^p} \def \Lone {\mc {L}^1} \def \Ltwo {\mc {L}^2} \def \toae {\sr {\rm a.e.}{\to }} \def \toas {\sr {\rm a.s.}{\to }} \def \top {\sr {\mb {\P }}{\to }} \def \tod {\sr {\rm d}{\to }} \def \toLp {\sr {\Lp }{\to }} \def \toLq {\sr {\Lq }{\to }} \def \eqae {\sr {\rm a.e.}{=}} \def \eqas {\sr {\rm a.s.}{=}} \def \eqd {\sr {\rm d}{=}} \def \approxd {\sr {\rm d}{\approx }} \def \Sa {(S1)\xspace } \def \Sb {(S2)\xspace } \def \Sc {(S3)\xspace } \)

Chapter 3 Continuous Bayesian models

In this chapter we expand our results from Chapter 2 to also cover continuous data. This has the consequence that we will make more use of probability density functions than in Chapter 2, which causes some of the formulae to change and/or simplify. The key ideas do not change.

3.1 Continuous Bayesian models

In this section we construct a version of the model from Section 2.2 that is suitable for continuous data. We use a continuous family of random variables $(M_\theta )$, in place of the discrete family used in Section 2.2. It will behave in much the same way, except that when we used the p.m.f. of the discrete random variable $M_\theta $, we will now use the p.d.f. of the continuous random variable $M_\theta $.

We need two ingredients to construct the model:

1. Let $(M_\theta )_{\theta \in \Theta }$ be a family of continuous variables with range $R\sw \R ^n$ and parameter space $\Pi \sw \R ^d$. Write $f_{M_\theta }:\R ^n\to [0,\infty )$ for the p.d.f. of $M_\theta $.
2. Let $f_\Theta :\R ^d\to [0,\infty )$ be a probability density function with range $\Pi $.

We used the same terminology as in Section 2.2: we refer to the family $(M_\theta )$ as the model family, and we will also use this term for $(f_{M_\theta })$. We say that $\Pi $ is the parameter space of the model and $R$ is the range of the model. The random variable $\Theta $, with p.d.f. $f_\Theta $, is known as the prior of the model.

Definition 3.1.1 The continuous Bayesian model associated to $(M_\theta )$ and $f$ is the random variable $(X,\Theta )\in \R ^n\times \R ^d$ with distribution given by
$\seteqnumber{0}{3.}{0}$
\begin{equation} \label {eq:bayes_continuous_general} \P [X\in A,\Theta \in B] =\int _B\P [M_\theta \in A]f_\Theta (\theta )\,d\theta =\int _B\int _A f_{M_\theta }(x) f_\Theta (\theta )\,dx\,d\theta . \end{equation}

The random variable $(X,\Theta )$ is continuous, with p.d.f. $f_{M_\theta }(x) f_\Theta (\theta )$.

The symbols $\theta ,\Theta ,x,X$ and have the same interpretations as listed in Section 2.2, and we won’t repeat that list here. A warning: note that the p.d.f. $f_{M_\theta }(x) f_\Theta (\theta )$ in (3.1) is not in a factorized form $g(x)h(\theta )$ because $f_{M_\theta }(x)$ depends on both $\theta $ and $x$. Just as in Section 2.2, in general $X$ and $\Theta $ are dependent random variables.

As in the discrete case, from Section 1.7 and Lemma 1.7.1 we have that $\Theta $ has p.d.f. $f_\Theta $, and that $X|_{\{\Theta =\theta \}}\eqd M_\theta $ whenever $f_\Theta $ is continuous at $\theta $. To find the (marginal) p.d.f. of the data $X$ we must instead integrate out the $\theta $ variable from the joint p.d.f., giving

\begin{equation} \label {eq:bayes_continuous_sampling_pdf} f_X(x)=\int _{\R ^d}f_{M_\theta }(x)f_\Theta (\theta )\,d\theta . \end{equation}

This is known as the sampling p.d.f. or sampling density of the model, and the distribution of $X$ is the sampling distribution.

The posterior of the model is $\Theta |_{\{X=x\}}$, which is defined using Lemma 1.6.1. For the same reasons as in the discrete case, we can hope that using $\Theta |_{\{X=x\}}$ in place of $\Theta $ will result in an improved model. Let’s work out the distribution of the posterior (in general) before we do an example with some data.

Theorem 3.1.2 (Bayesian updates for continuous data) Let $(X,\Theta )$ be a continuous Bayesian model, with parameter space $\Pi $, prior p.d.f. $f_\Theta $, model family $(M_\theta )$ and range $R$. Write $f_{M_\theta }$ for the p.d.f. of $M_\theta $. Suppose that $x\in R$. Then the posterior $\Theta |_{\{X=x\}}$ is a continuous random variable and has p.d.f.
$\seteqnumber{0}{3.}{2}$
\begin{equation} \label {eq:bayes_rule_continuous_data} f_{\Theta |_{\{X=x\}}}(\theta )= \frac {1}{Z}f_{M_\theta }(x)f_{\Theta }(\theta ) \end{equation}

where $Z=\int _\Pi f_{M_\theta }(x)f_{\Theta }(\theta )\,d\theta $. The range of $\Theta |_{\{X=x\}}$ is $\Pi $, the same range as for $\Theta $.

Proof:The proof is similar to our proof of Theorem 2.4.1, except that we use Lemma 1.6.1 (in place of Lemma 1.5.1) to find the p.d.f. of $\Theta |_{\{X=x\}}$.

A difficulty is that Lemma 1.6.1 requires continuity conditions, which are not always satisfied in the situation here (although they are often are). For that reason, we will only give a proof covering the special case where both $f_{M_\theta }(x)$ and $f_\Theta (\theta )$ are continuous functions. In that case, from Lemma 1.6.1 we have that

\begin{align} f_{\Theta |_{\{X=x\}}}(\theta ) =\frac {f_{(X,\Theta )}(x,\theta )}{f_X(x)} =\frac {f_{M_{\theta }}(x)f_{\Theta }(\theta )}{f_X(x)}. \label {eq:bayes_thm_continuous_data_1} \end{align} We have used p.d.f. coming from Definition 3.1.1 in the numerator above. For the denominator, we already found $f_X(x)$ in (3.2), which gives $f_X(x)=Z$. This gives (3.3). The definition of a continuous Bayesian model requires that the prior $\Theta $ has range $\Pi $, so $f_\Theta (\theta )>0$ if and only if $\theta \notin \Pi $. Since $x\in \R $ we have $f_{M_\theta }(x)>0$ for all $\theta \in \Pi $. Hence the range of $\Theta |_{\{X=x\}}$ is $\Pi $. ∎

Remark 3.1.3 $\offsyl $ In fact equation (3.3) only holds for almost all $x\in R$, but it works for all $x$ when we have ‘enough continuity’ in some suitable sense. This is generally sufficient for practical purposes and we won’t worry about this issue within these notes. To see a natural case where (3.3) fails for a particular choice of $x$, take $\Theta \sim \Gamma (\frac 14,1)$ and $M_\theta \sim N(0,\theta )$, with the data $x=0$. Then according to (3.3) we have $f_{\Theta |_{\{X=0\}}}(\theta ) =\frac {1}{Z}\frac {1}{\sqrt {2\pi \theta }}e^{-0^2/2\theta }\frac {1^{1/4}}{\Gamma (1/4)}\theta ^{1/4-1}e^{-\theta } =\frac {1}{Z'}\theta ^{-5/4}e^{-\theta }$ for $\theta >0$. This does not define a p.d.f. since $\int _0^\infty \theta ^{-5/4}e^{-\theta }\,d\theta =\infty $. The problem stems from fact that $(x,\theta )\mapsto f_{M_\theta }(x)$ is discontinuous at $(0,0)$, which causes the continuity conditions mentioned in the above proof to fail. In this case for any $x\neq 0$ we do obtain a posterior p.d.f. that integrates to one.

Equation (3.3) is a ‘continuous data’ analogue of equation (2.11). Both equations are often known as (versions of) Bayes’ rule, for the historical reasons that we discussed in Section 2.3. In both cases $Z$ is a normalizating constant, ensuring that the p.d.f. of $\Theta |_{\{X=x\}}$ integrates to $1$. The only difference between (2.11) and (3.3) is that:

• (2.11) features the p.m.f. $\P [M_\theta =x]$ of the (discrete) model family;
• (3.3) features the p.d.f. $f_{M_\theta }(x)$ of the (continuous) model family.

Lastly, once we have obtained $\Theta |_{\{X=x\}}$ we construct a new Bayesian model with $\Theta |_{\{X=x\}}$ in place of $\Theta $. As before:

Definition 3.1.4 The predictive distribution is given by replacing the prior $\Theta $ with the posterior $\Theta |_{\{X=x\}}$, inside the sampling distribution. Hence, from (3.2), the predictive distribution is of a continuous random variable with p.d.f.
$\seteqnumber{0}{3.}{4}$
\begin{equation} \label {eq:bayes_continuous_predictive_pdf} f_{X'}(x')=\int _{\R ^d}f_{M_{\theta }}(x')f_{\Theta |_{\{X=x\}}}(\theta )\,d\theta . \end{equation}