Bayesian Statistics

$\newcommand{\footnotename}{footnote}$ $\def \LWRfootnote {1}$ $\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}$ $\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}$ $\let \LWRorighspace \hspace $ $\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }$ $\newcommand {\mathnormal }[1]{{#1}}$ $\newcommand \ensuremath [1]{#1}$ $\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } $ $\newcommand {\setlength }[2]{}$ $\newcommand {\addtolength }[2]{}$ $\newcommand {\setcounter }[2]{}$ $\newcommand {\addtocounter }[2]{}$ $\newcommand {\arabic }[1]{}$ $\newcommand {\number }[1]{}$ $\newcommand {\noalign }[1]{\text {#1}\notag \\}$ $\newcommand {\cline }[1]{}$ $\newcommand {\directlua }[1]{\text {(directlua)}}$ $\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}$ $\newcommand {\protect }{}$ $\def \LWRabsorbnumber #1 {}$ $\def \LWRabsorbquotenumber "#1 {}$ $\newcommand {\LWRabsorboption }[1][]{}$ $\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }$ $\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }$ $\def \mathcode #1={\mathchar }$ $\let \delcode \mathcode $ $\let \delimiter \mathchar $ $\def \oe {\unicode {x0153}}$ $\def \OE {\unicode {x0152}}$ $\def \ae {\unicode {x00E6}}$ $\def \AE {\unicode {x00C6}}$ $\def \aa {\unicode {x00E5}}$ $\def \AA {\unicode {x00C5}}$ $\def \o {\unicode {x00F8}}$ $\def \O {\unicode {x00D8}}$ $\def \l {\unicode {x0142}}$ $\def \L {\unicode {x0141}}$ $\def \ss {\unicode {x00DF}}$ $\def \SS {\unicode {x1E9E}}$ $\def \dag {\unicode {x2020}}$ $\def \ddag {\unicode {x2021}}$ $\def \P {\unicode {x00B6}}$ $\def \copyright {\unicode {x00A9}}$ $\def \pounds {\unicode {x00A3}}$ $\let \LWRref \ref $ $\renewcommand {\ref }{\ifstar \LWRref \LWRref }$ $ \newcommand {\multicolumn }[3]{#3}$ $\require {textcomp}$ $\newcommand {\intertext }[1]{\text {#1}\notag \\}$ $\let \Hat \hat $ $\let \Check \check $ $\let \Tilde \tilde $ $\let \Acute \acute $ $\let \Grave \grave $ $\let \Dot \dot $ $\let \Ddot \ddot $ $\let \Breve \breve $ $\let \Bar \bar $ $\let \Vec \vec $ $\require {colortbl}$ $\let \LWRorigcolumncolor \columncolor $ $\renewcommand {\columncolor }[2][named]{\LWRorigcolumncolor [#1]{#2}\LWRabsorbtwooptions }$ $\let \LWRorigrowcolor \rowcolor $ $\renewcommand {\rowcolor }[2][named]{\LWRorigrowcolor [#1]{#2}\LWRabsorbtwooptions }$ $\let \LWRorigcellcolor \cellcolor $ $\renewcommand {\cellcolor }[2][named]{\LWRorigcellcolor [#1]{#2}\LWRabsorbtwooptions }$ $\require {mathtools}$ $\newenvironment {crampedsubarray}[1]{}{}$ $\newcommand {\smashoperator }[2][]{#2\limits }$ $\newcommand {\SwapAboveDisplaySkip }{}$ $\newcommand {\LaTeXunderbrace }[1]{\underbrace {#1}}$ $\newcommand {\LaTeXoverbrace }[1]{\overbrace {#1}}$ $\newcommand {\LWRmultlined }[1][]{\begin {multline*}}$ $\newenvironment {multlined}[1][]{\LWRmultlined }{\end {multline*}}$ $\let \LWRorigshoveleft \shoveleft $ $\renewcommand {\shoveleft }[1][]{\LWRorigshoveleft }$ $\let \LWRorigshoveright \shoveright $ $\renewcommand {\shoveright }[1][]{\LWRorigshoveright }$ $\newcommand {\shortintertext }[1]{\text {#1}\notag \\}$ $\newcommand {\vcentcolon }{\mathrel {\unicode {x2236}}}$ $\renewcommand {\intertext }[2][]{\text {#2}\notag \\}$ $\newenvironment {fleqn}[1][]{}{}$ $\newenvironment {ceqn}{}{}$ $\newenvironment {darray}[2][c]{\begin {array}[#1]{#2}}{\end {array}}$ $\newcommand {\dmulticolumn }[3]{#3}$ $\newcommand {\LWRnrnostar }[1][0.5ex]{\\[#1]}$ $\newcommand {\nr }{\ifstar \LWRnrnostar \LWRnrnostar }$ $\newcommand {\mrel }[1]{\begin {aligned}#1\end {aligned}}$ $\newcommand {\underrel }[2]{\underset {#2}{#1}}$ $\newcommand {\medmath }[1]{#1}$ $\newcommand {\medop }[1]{#1}$ $\newcommand {\medint }[1]{#1}$ $\newcommand {\medintcorr }[1]{#1}$ $\newcommand {\mfrac }[2]{\frac {#1}{#2}}$ $\newcommand {\mbinom }[2]{\binom {#1}{#2}}$ $\newenvironment {mmatrix}{\begin {matrix}}{\end {matrix}}$ $\newcommand {\displaybreak }[1][]{}$ $ \def \offsyl {(\oslash )} \def \msconly {(\Delta )} $ $ \DeclareMathOperator {\var }{var} \DeclareMathOperator {\cov }{cov} \DeclareMathOperator {\Bin }{Bin} \DeclareMathOperator {\Geo }{Geometric} \DeclareMathOperator {\Beta }{Beta} \DeclareMathOperator {\Unif }{Uniform} \DeclareMathOperator {\Gam }{Gamma} \DeclareMathOperator {\Normal }{N} \DeclareMathOperator {\Exp }{Exp} \DeclareMathOperator {\Cauchy }{Cauchy} \DeclareMathOperator {\Bern }{Bernoulli} \DeclareMathOperator {\Poisson }{Poisson} \DeclareMathOperator {\Weibull }{Weibull} \DeclareMathOperator {\IGam }{IGamma} \DeclareMathOperator {\NGam }{NGamma} \DeclareMathOperator {\ChiSquared }{ChiSquared} \DeclareMathOperator {\Pareto }{Pareto} \DeclareMathOperator {\NBin }{NegBin} \DeclareMathOperator {\Studentt }{Student-t} \DeclareMathOperator *{\argmax }{arg\,max} \DeclareMathOperator *{\argmin }{arg\,min} $ \( \def \to {\rightarrow } \def \iff {\Leftrightarrow } \def \ra {\Rightarrow } \def \sw {\subseteq } \def \mc {\mathcal } \def \mb {\mathbb } \def \sc {\setminus } \def \wt {\widetilde } \def \v {\textbf } \def \E {\mb {E}} \def \P {\mb {P}} \def \R {\mb {R}} \def \C {\mb {C}} \def \N {\mb {N}} \def \Q {\mb {Q}} \def \Z {\mb {Z}} \def \B {\mb {B}} \def \~{\sim } \def \-{\,;\,} \def \qed {$\blacksquare $} \CustomizeMathJax {\def \1{\unicode {x1D7D9}}} \def \cadlag {c\`{a}dl\`{a}g} \def \p {\partial } \def \l {\left } \def \r {\right } \def \Om {\Omega } \def \om {\omega } \def \eps {\epsilon } \def \de {\delta } \def \ov {\overline } \def \sr {\stackrel } \def \Lp {\mc {L}^p} \def \Lq {\mc {L}^p} \def \Lone {\mc {L}^1} \def \Ltwo {\mc {L}^2} \def \toae {\sr {\rm a.e.}{\to }} \def \toas {\sr {\rm a.s.}{\to }} \def \top {\sr {\mb {\P }}{\to }} \def \tod {\sr {\rm d}{\to }} \def \toLp {\sr {\Lp }{\to }} \def \toLq {\sr {\Lq }{\to }} \def \eqae {\sr {\rm a.e.}{=}} \def \eqas {\sr {\rm a.s.}{=}} \def \eqd {\sr {\rm d}{=}} \def \approxd {\sr {\rm d}{\approx }} \def \Sa {(S1)\xspace } \def \Sb {(S2)\xspace } \def \Sc {(S3)\xspace } \)

1.7 Families with random parameters

In this section we are interested to take a model family $(M_\theta )_{\theta \in \Pi }$ and treat the parameter $\theta $ as a random variable, which will be denoted by a capital letter $\Theta $. We think of first sampling the value of $\Theta $ and then (using whatever value we obtain) taking a sample $X$ from $M_\Theta $. The resulting distribution is sometimes known as a compound or mixture distribution. We will have a detailed discussion of how this idea becomes useful in Section 2.1. For now let us note that it increases the range of models that we have available.

To make sense of the idea, let us state it more precisely. We want random variables $X$ and $\Theta $ such that $X|_{\{\Theta =\theta \}}\eqd M_\theta $. In this section we show that a pair $(X,\Theta )$ with this property is given by the distribution

\begin{equation} \label {eq:bayes_model_pre} \P [X\in A,\Theta \in B]=\int _B\P [M_\theta \in A]f_\Theta (\theta )\,d\theta . \end{equation}

where $(M_\theta )$ is a family of distributions with range $R\sw \R ^n$, as defined in Section 1.3, and $f_\Theta $ is a probability density function with range $\Pi \sw \R ^d$. This is a type of random variable you may not have seen before. We will shortly show that the $\Theta $ part is a continuous random variable, but the $X$ part might be discrete or continuous, depending on $(M_\theta )$.

Our notation strongly suggests that we expect $f_\Theta $ to be the (marginal) probability density function of $\Theta $, and we can confirm this by setting $A=\R ^n$, in which case equation (1.13) becomes $\P [\Theta \in B]=\int _B f_\Theta (\theta )$. We can also find the marginal distribution of $X$, by setting $B=\R ^d$, giving

\[\P [X\in A]=\P [X\in A, \Theta \in \R ^d]=\int _{\R ^d}\P [M_\theta \in A]f_\Theta (\theta )\,d\theta ,\]

but that formula doesn’t really explain what is going on here. The relationship that we are interested in comes from the following lemma.

Lemma 1.7.1 Let $(M_\theta )$ and $(X,\Theta )$ have distribution given by (1.13). Suppose that $f_\Theta (\theta )>0$ and that $t\mapsto f_\Theta (t)$ is continuous at $t=\theta $. Then $X|_{\{\Theta =\theta \}}\eqd M_\theta $.

Proof: $\offsyl $ We give a sketch proof to illustrate the idea, in similar style to Lemma 1.6.1. From Lemma 1.5.1, for $A\in \R ^d$ we have

\begin{equation*} \P [X|_{\{|\Theta -\theta |\leq \eps \}}\in A] =\frac {\P [|\Theta -\theta |\leq \eps , X\in A]}{\P [|\Theta -\theta |\leq \eps ]} =\frac {\int _{\theta -\eps }^{\theta +\eps }\P [M_t\in A]f_\Theta (t)\,dt}{\int _{\theta -\eps }^{\theta +\eps }f_\Theta (t)\,dt} \end{equation*}

The second equality follows from (1.13) with $B=[\theta -\eps ,\theta +\eps ]$, for the numerator, and from the fact that $f_\Theta $ is the p.d.f. of $\Theta $, for the denominator. Using continuity, from the statement of the lemma and from Assumption 1.3.2, for $|\theta -t|\leq \eps $ we can approximate $f_\Theta (t)\approx f_\Theta (\theta )$ and $\P [M_t\in A]\approx \P [M_\theta \in A]$. This gives

\[\P [X|_{\{|\Theta -\theta |\leq \eps }\in A] \approx \frac {\int _{\theta -\eps }^{\theta +\eps }\P [M_{\theta }\in A]f_\Theta (\theta )\,dt}{\int _{\theta -\eps }^{\theta +\eps }f_\Theta (\theta )\,dt} =\frac {2\eps f_{\Theta }(\theta )\P [M_\theta \in A]}{2\eps f_{\Theta }(\theta )}=\P [M_\theta \in A].\]

Letting $\eps \to 0$ we have $\P [X|_{\{|\Theta -\theta |\leq \eps }\in A]\to \P [M_\theta \in A]$, so by Definition 1.2.1 and (1.9) we have that $X|_{\{\Theta =\theta \}}$ is well defined and $X|_{\{\Theta =\theta \}}\eqd M_\theta $. ∎