last updated: October 24, 2024

Bayesian Statistics

\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \) \(\require {colortbl}\) \(\let \LWRorigcolumncolor \columncolor \) \(\renewcommand {\columncolor }[2][named]{\LWRorigcolumncolor [#1]{#2}\LWRabsorbtwooptions }\) \(\let \LWRorigrowcolor \rowcolor \) \(\renewcommand {\rowcolor }[2][named]{\LWRorigrowcolor [#1]{#2}\LWRabsorbtwooptions }\) \(\let \LWRorigcellcolor \cellcolor \) \(\renewcommand {\cellcolor }[2][named]{\LWRorigcellcolor [#1]{#2}\LWRabsorbtwooptions }\) \(\require {mathtools}\) \(\newenvironment {crampedsubarray}[1]{}{}\) \(\newcommand {\smashoperator }[2][]{#2\limits }\) \(\newcommand {\SwapAboveDisplaySkip }{}\) \(\newcommand {\LaTeXunderbrace }[1]{\underbrace {#1}}\) \(\newcommand {\LaTeXoverbrace }[1]{\overbrace {#1}}\) \(\newcommand {\LWRmultlined }[1][]{\begin {multline*}}\) \(\newenvironment {multlined}[1][]{\LWRmultlined }{\end {multline*}}\) \(\let \LWRorigshoveleft \shoveleft \) \(\renewcommand {\shoveleft }[1][]{\LWRorigshoveleft }\) \(\let \LWRorigshoveright \shoveright \) \(\renewcommand {\shoveright }[1][]{\LWRorigshoveright }\) \(\newcommand {\shortintertext }[1]{\text {#1}\notag \\}\) \(\newcommand {\vcentcolon }{\mathrel {\unicode {x2236}}}\) \(\renewcommand {\intertext }[2][]{\text {#2}\notag \\}\) \(\newenvironment {fleqn}[1][]{}{}\) \(\newenvironment {ceqn}{}{}\) \(\newenvironment {darray}[2][c]{\begin {array}[#1]{#2}}{\end {array}}\) \(\newcommand {\dmulticolumn }[3]{#3}\) \(\newcommand {\LWRnrnostar }[1][0.5ex]{\\[#1]}\) \(\newcommand {\nr }{\ifstar \LWRnrnostar \LWRnrnostar }\) \(\newcommand {\mrel }[1]{\begin {aligned}#1\end {aligned}}\) \(\newcommand {\underrel }[2]{\underset {#2}{#1}}\) \(\newcommand {\medmath }[1]{#1}\) \(\newcommand {\medop }[1]{#1}\) \(\newcommand {\medint }[1]{#1}\) \(\newcommand {\medintcorr }[1]{#1}\) \(\newcommand {\mfrac }[2]{\frac {#1}{#2}}\) \(\newcommand {\mbinom }[2]{\binom {#1}{#2}}\) \(\newenvironment {mmatrix}{\begin {matrix}}{\end {matrix}}\) \(\newcommand {\displaybreak }[1][]{}\) \( \def \offsyl {(\oslash )} \def \msconly {(\Delta )} \) \( \DeclareMathOperator {\var }{var} \DeclareMathOperator {\cov }{cov} \DeclareMathOperator {\Bin }{Bin} \DeclareMathOperator {\Geo }{Geometric} \DeclareMathOperator {\Beta }{Beta} \DeclareMathOperator {\Unif }{Uniform} \DeclareMathOperator {\Gam }{Gamma} \DeclareMathOperator {\Normal }{N} \DeclareMathOperator {\Exp }{Exp} \DeclareMathOperator {\Cauchy }{Cauchy} \DeclareMathOperator {\Bern }{Bernoulli} \DeclareMathOperator {\Poisson }{Poisson} \DeclareMathOperator {\Weibull }{Weibull} \DeclareMathOperator {\IGam }{IGamma} \DeclareMathOperator {\NGam }{NGamma} \DeclareMathOperator {\ChiSquared }{ChiSquared} \DeclareMathOperator {\Pareto }{Pareto} \DeclareMathOperator {\NBin }{NegBin} \DeclareMathOperator {\Studentt }{Student-t} \DeclareMathOperator *{\argmax }{arg\,max} \DeclareMathOperator *{\argmin }{arg\,min} \) \( \def \to {\rightarrow } \def \iff {\Leftrightarrow } \def \ra {\Rightarrow } \def \sw {\subseteq } \def \mc {\mathcal } \def \mb {\mathbb } \def \sc {\setminus } \def \wt {\widetilde } \def \v {\textbf } \def \E {\mb {E}} \def \P {\mb {P}} \def \R {\mb {R}} \def \C {\mb {C}} \def \N {\mb {N}} \def \Q {\mb {Q}} \def \Z {\mb {Z}} \def \B {\mb {B}} \def \~{\sim } \def \-{\,;\,} \def \qed {$\blacksquare $} \CustomizeMathJax {\def \1{\unicode {x1D7D9}}} \def \cadlag {c\`{a}dl\`{a}g} \def \p {\partial } \def \l {\left } \def \r {\right } \def \Om {\Omega } \def \om {\omega } \def \eps {\epsilon } \def \de {\delta } \def \ov {\overline } \def \sr {\stackrel } \def \Lp {\mc {L}^p} \def \Lq {\mc {L}^p} \def \Lone {\mc {L}^1} \def \Ltwo {\mc {L}^2} \def \toae {\sr {\rm a.e.}{\to }} \def \toas {\sr {\rm a.s.}{\to }} \def \top {\sr {\mb {\P }}{\to }} \def \tod {\sr {\rm d}{\to }} \def \toLp {\sr {\Lp }{\to }} \def \toLq {\sr {\Lq }{\to }} \def \eqae {\sr {\rm a.e.}{=}} \def \eqas {\sr {\rm a.s.}{=}} \def \eqd {\sr {\rm d}{=}} \def \approxd {\sr {\rm d}{\approx }} \def \Sa {(S1)\xspace } \def \Sb {(S2)\xspace } \def \Sc {(S3)\xspace } \)

1.5 Conditioning and correlations

This section demonstrates the effect of taking a jointly distributed random variable \((X,Y)\in \R ^2\), and conditioning \(X\) to be within a particular location. If \(X\) and \(Y\) are independent then conditioning \(X\) will have no effect on \(Y\) (see Exercise 1.10 for details), but if they are dependent then conditioning on the location of \(X\) will change the distribution of the \(y\) coordinate. This is because \(X\) and \(Y\) affect each other so, if we force \(X\) to do something, it will also have some effect on \(Y\).

Let us introduce some notation for these ideas.

  • We write \((X,Y)|_{\{X\in A\}}\) as a shorthand for \((X,Y)|_{\{(X,Y)\in A\times \R ^d\}}\), where we restrict the location of \(X\) (to be inside \(A\)) but we do not restrict the location of \(Y\) (because \(Y\in \R ^d\) is true anyway).

  • We write \(Y|_{\{X\in A\}}\) for the \(y\) coordinate of \((X,Y)|_{\{X\in A\}}\).

In this notation, we idea we described above is that, if \(X\) and \(Y\) are dependent, the random variables \(Y\) and \(Y|_{\{X\in A\}}\) will have different distributions.

  • Lemma 1.5.1 Let \(X\) and \(Y\) be random variables, with \(A\sw R_X\), \(B\sw R_Y\) and \(\P [X\in A]>0\). Then

    \begin{equation} \label {eq:cond_corr} \P [Y|_{\{X\in A\}}\in B]=\frac {\P [X\in A,Y\in B]}{\P [X\in A]}. \end{equation}

Proof: From part 2 of Lemma 1.4.1,

\begin{equation} \label {eq:cond_corr_pre} \P [(X,Y)|_{\{X\in A\}}\in A\times B]=\frac {\P [(X,Y)\in A\times B]}{\P [(X,Y)\in A\times \R ^d]}=\frac {\P [X\in A,Y\in B]}{\P [X\in A]}. \end{equation}

Part 1 of Lemma 1.4.1 tells us that \((X,Y)|_{\{X\in A\}}\) has range \(A\times R_Y\). Hence \(X|_{\{X\in A\}}\) has range \(A\), which means that \(\P [Y|_{\{X\in A\}}\in B]=\P [(X,Y)|_{\{X\in A\}}\in A\times B]\). Combining this fact with (1.8) completes the proof.   ∎

If \(X\) and \(Y\) are discrete then taking \(A=\{x\}\) and \(B=\{y\}\) gives us \(\P [Y|_{\{X=x\}}=y]=\frac {\P [X=x,Y=y]}{\P [X=x]}.\) This formula, and more generally (1.7), should feel familiar. In earlier courses you will probably have seen equations like \(\P [Y=y\,|\,X=x]=\frac {\P [X=x,Y=y]}{\P [X=x]}\), which has essentially the same meaning but different notation. The reason for introducing \(Y|_{\{X=A\}}\) is simply that it is easier to understand probability when we can imagine random objects.

  • Example 1.5.2 Suppose that we roll a fair dice, with outcomes \(Z=1,2,\ldots ,6\). Define the random variables

    \[X= \begin {cases} 1 & \text { if $Z$ is odd,} \\ 0 & \text { if $Z$ is even,} \end {cases} \qquad Y= \begin {cases} 0 & \text { if }Z\leq 3, \\ 1 & \text { if }Z\geq 4, \end {cases} \]

    which are dependent. We can illustrate their joint distribution with a table of values:

    .
    \(Z\)
    1
    2
    3
    4
    5
    6
    \(X\)
    1
    0
    1
    0
    1
    0
    \(Y\)
    0
    0
    0
    1
    1
    1
    \(\P [\text {column}]\)
    \(\frac 16\)
    \(\frac 16\)
    \(\frac 16\)
    \(\frac 16\)
    \(\frac 16\)
    \(\frac 16\)

    Each column is a possible outcome, each of which has probability \(\tfrac 16\). Conditioning on the event \(X=1\) forces the outcome to be within the shaded columns.

    The distribution of \(Y\) is easily found:

    \begin{align*} \P [Y=0]&=\P [Z\in \{1,2,3\}]=\tfrac 16+\tfrac 16+\tfrac 16 = \tfrac 12, \\ \P [Y=1]&=\P [Z\in \{4,5,6\}]=\tfrac 16+\tfrac 16+\tfrac 16 = \tfrac 12. \end{align*} By Lemma 1.5.1 we have

    \[\P [Y|_{\{X=1\}}=0]=\frac {\P [Y=0,X=1]}{\P [X=1]}=\frac {\P [Z\in \{1,3\}]}{\P [Z\in \{1,3,5\}]} =\frac {\tfrac 16+\tfrac 16}{\tfrac 16+\tfrac 16+\tfrac 16}=\frac {2}{3}\]

    and so \(\P [Y|_{\{X=1\}}=1]=1-\frac 23=\frac 13\). As we expected, the distributions of \(Y\) and \(Y|_{\{X=1\}}\) are different.

Compare the situation of Example 1.5.2 to that of a random variable \((X,Y)\) taking values in \(\R ^n\times \R ^d\equiv \R ^{n+d}\). Here, \(X\) takes values in \(\R ^n\) and \(Y\) takes values in \(\R ^d\). If \(X\) and \(Y\) are dependent, then we should still expect that conditioning one affects the distribution of the other. This is the key fact that we take away from this section.

  • Remark 1.5.3 In a multivariate situation, say \((X,Y_1,Y_2)\), we could do something similar and find the conditional distributions of \((Y_1,Y_2)|_{\{X\in A\}}\) as well as \(Y_1|_{\{X\in A\}}\) and \(Y_2|_{\{X\in A\}}\). A slightly subtle point is that

    \[(Y_1,Y_2)|_{\{X\in A\}}\eqd \l (Y_1|_{\{X\in A\}}, Y_2|_{\{X\in A\}}\r ).\]

    In words, conditioning two coordinates on the same event, is equivalent to conditioning each coordinate individually on that event – as we would intuitively expect.

    We will use this fact later on when we work with multivariate situations, but we won’t include a proof within our course. It isn’t difficult to check, but it wouldn’t help us understand anything more than we already do.