Bayesian Statistics
\(\newcommand{\footnotename}{footnote}\)
\(\def \LWRfootnote {1}\)
\(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\)
\(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\)
\(\let \LWRorighspace \hspace \)
\(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\)
\(\newcommand {\mathnormal }[1]{{#1}}\)
\(\newcommand \ensuremath [1]{#1}\)
\(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \)
\(\newcommand {\setlength }[2]{}\)
\(\newcommand {\addtolength }[2]{}\)
\(\newcommand {\setcounter }[2]{}\)
\(\newcommand {\addtocounter }[2]{}\)
\(\newcommand {\arabic }[1]{}\)
\(\newcommand {\number }[1]{}\)
\(\newcommand {\noalign }[1]{\text {#1}\notag \\}\)
\(\newcommand {\cline }[1]{}\)
\(\newcommand {\directlua }[1]{\text {(directlua)}}\)
\(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\)
\(\newcommand {\protect }{}\)
\(\def \LWRabsorbnumber #1 {}\)
\(\def \LWRabsorbquotenumber "#1 {}\)
\(\newcommand {\LWRabsorboption }[1][]{}\)
\(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\)
\(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\)
\(\def \mathcode #1={\mathchar }\)
\(\let \delcode \mathcode \)
\(\let \delimiter \mathchar \)
\(\def \oe {\unicode {x0153}}\)
\(\def \OE {\unicode {x0152}}\)
\(\def \ae {\unicode {x00E6}}\)
\(\def \AE {\unicode {x00C6}}\)
\(\def \aa {\unicode {x00E5}}\)
\(\def \AA {\unicode {x00C5}}\)
\(\def \o {\unicode {x00F8}}\)
\(\def \O {\unicode {x00D8}}\)
\(\def \l {\unicode {x0142}}\)
\(\def \L {\unicode {x0141}}\)
\(\def \ss {\unicode {x00DF}}\)
\(\def \SS {\unicode {x1E9E}}\)
\(\def \dag {\unicode {x2020}}\)
\(\def \ddag {\unicode {x2021}}\)
\(\def \P {\unicode {x00B6}}\)
\(\def \copyright {\unicode {x00A9}}\)
\(\def \pounds {\unicode {x00A3}}\)
\(\let \LWRref \ref \)
\(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\)
\( \newcommand {\multicolumn }[3]{#3}\)
\(\require {textcomp}\)
\(\newcommand {\intertext }[1]{\text {#1}\notag \\}\)
\(\let \Hat \hat \)
\(\let \Check \check \)
\(\let \Tilde \tilde \)
\(\let \Acute \acute \)
\(\let \Grave \grave \)
\(\let \Dot \dot \)
\(\let \Ddot \ddot \)
\(\let \Breve \breve \)
\(\let \Bar \bar \)
\(\let \Vec \vec \)
\(\require {colortbl}\)
\(\let \LWRorigcolumncolor \columncolor \)
\(\renewcommand {\columncolor }[2][named]{\LWRorigcolumncolor [#1]{#2}\LWRabsorbtwooptions }\)
\(\let \LWRorigrowcolor \rowcolor \)
\(\renewcommand {\rowcolor }[2][named]{\LWRorigrowcolor [#1]{#2}\LWRabsorbtwooptions }\)
\(\let \LWRorigcellcolor \cellcolor \)
\(\renewcommand {\cellcolor }[2][named]{\LWRorigcellcolor [#1]{#2}\LWRabsorbtwooptions }\)
\(\require {mathtools}\)
\(\newenvironment {crampedsubarray}[1]{}{}\)
\(\newcommand {\smashoperator }[2][]{#2\limits }\)
\(\newcommand {\SwapAboveDisplaySkip }{}\)
\(\newcommand {\LaTeXunderbrace }[1]{\underbrace {#1}}\)
\(\newcommand {\LaTeXoverbrace }[1]{\overbrace {#1}}\)
\(\newcommand {\LWRmultlined }[1][]{\begin {multline*}}\)
\(\newenvironment {multlined}[1][]{\LWRmultlined }{\end {multline*}}\)
\(\let \LWRorigshoveleft \shoveleft \)
\(\renewcommand {\shoveleft }[1][]{\LWRorigshoveleft }\)
\(\let \LWRorigshoveright \shoveright \)
\(\renewcommand {\shoveright }[1][]{\LWRorigshoveright }\)
\(\newcommand {\shortintertext }[1]{\text {#1}\notag \\}\)
\(\newcommand {\vcentcolon }{\mathrel {\unicode {x2236}}}\)
\(\renewcommand {\intertext }[2][]{\text {#2}\notag \\}\)
\(\newenvironment {fleqn}[1][]{}{}\)
\(\newenvironment {ceqn}{}{}\)
\(\newenvironment {darray}[2][c]{\begin {array}[#1]{#2}}{\end {array}}\)
\(\newcommand {\dmulticolumn }[3]{#3}\)
\(\newcommand {\LWRnrnostar }[1][0.5ex]{\\[#1]}\)
\(\newcommand {\nr }{\ifstar \LWRnrnostar \LWRnrnostar }\)
\(\newcommand {\mrel }[1]{\begin {aligned}#1\end {aligned}}\)
\(\newcommand {\underrel }[2]{\underset {#2}{#1}}\)
\(\newcommand {\medmath }[1]{#1}\)
\(\newcommand {\medop }[1]{#1}\)
\(\newcommand {\medint }[1]{#1}\)
\(\newcommand {\medintcorr }[1]{#1}\)
\(\newcommand {\mfrac }[2]{\frac {#1}{#2}}\)
\(\newcommand {\mbinom }[2]{\binom {#1}{#2}}\)
\(\newenvironment {mmatrix}{\begin {matrix}}{\end {matrix}}\)
\(\newcommand {\displaybreak }[1][]{}\)
\( \def \offsyl {(\oslash )} \def \msconly {(\Delta )} \)
\( \DeclareMathOperator {\var }{var} \DeclareMathOperator {\cov }{cov} \DeclareMathOperator {\Bin }{Bin} \DeclareMathOperator {\Geo }{Geometric} \DeclareMathOperator {\Beta
}{Beta} \DeclareMathOperator {\Unif }{Uniform} \DeclareMathOperator {\Gam }{Gamma} \DeclareMathOperator {\Normal }{N} \DeclareMathOperator {\Exp }{Exp} \DeclareMathOperator
{\Cauchy }{Cauchy} \DeclareMathOperator {\Bern }{Bernoulli} \DeclareMathOperator {\Poisson }{Poisson} \DeclareMathOperator {\Weibull }{Weibull} \DeclareMathOperator {\IGam
}{IGamma} \DeclareMathOperator {\NGam }{NGamma} \DeclareMathOperator {\ChiSquared }{ChiSquared} \DeclareMathOperator {\Pareto }{Pareto} \DeclareMathOperator {\NBin }{NegBin}
\DeclareMathOperator {\Studentt }{Student-t} \DeclareMathOperator *{\argmax }{arg\,max} \DeclareMathOperator *{\argmin }{arg\,min} \)
\( \def \to {\rightarrow } \def \iff {\Leftrightarrow } \def \ra {\Rightarrow } \def \sw {\subseteq } \def \mc {\mathcal } \def \mb {\mathbb } \def \sc {\setminus } \def \wt
{\widetilde } \def \v {\textbf } \def \E {\mb {E}} \def \P {\mb {P}} \def \R {\mb {R}} \def \C {\mb {C}} \def \N {\mb {N}} \def \Q {\mb {Q}} \def \Z {\mb {Z}} \def \B {\mb {B}}
\def \~{\sim } \def \-{\,;\,} \def \qed {$\blacksquare $} \CustomizeMathJax {\def \1{\unicode {x1D7D9}}} \def \cadlag {c\`{a}dl\`{a}g} \def \p {\partial } \def \l
{\left } \def \r {\right } \def \Om {\Omega } \def \om {\omega } \def \eps {\epsilon } \def \de {\delta } \def \ov {\overline } \def \sr {\stackrel } \def \Lp {\mc {L}^p} \def
\Lq {\mc {L}^p} \def \Lone {\mc {L}^1} \def \Ltwo {\mc {L}^2} \def \toae {\sr {\rm a.e.}{\to }} \def \toas {\sr {\rm a.s.}{\to }} \def \top {\sr {\mb {\P }}{\to }} \def \tod {\sr
{\rm d}{\to }} \def \toLp {\sr {\Lp }{\to }} \def \toLq {\sr {\Lq }{\to }} \def \eqae {\sr {\rm a.e.}{=}} \def \eqas {\sr {\rm a.s.}{=}} \def \eqd {\sr {\rm d}{=}} \def \approxd
{\sr {\rm d}{\approx }} \def \Sa {(S1)\xspace } \def \Sb {(S2)\xspace } \def \Sc {(S3)\xspace } \)
Chapter 1 Conditioning
1.1 Random variables
Let \(X\) be a random variable taking values in \(\R \). You should think of \(X\) as an object that takes a random value, which is hopefully natural. Most of the things we interact with are random
e.g. when we buy a pair of shoes we do not know how long they will last for; when we walk home later, we do not know how much rain will fall, and so on. In principle we might think of anything as being
random, but within this course we will restrict ourselves to random variables that take values in \(\R ^d\). We won’t use bold symbols for vectors in this course. Typically we will write \(x\) or \(y\) for elements
of \(\R ^d\), and when we need to use coordinates we’ll write e.g. \(x=(x_1,\ldots x_d)\in \R ^d\), where \(x_i\in \R \).
We are interested in two particular types of random variable in this course, captured by the following definition.
Most random variables used in statistical inference are one of these two types. In this course we will use reference sheets of named distributions, found in Appendix A, covering a very large range of examples. These reference sheets will be made available in the exam. You should be familiar with relationships
between named distributions that were discussed in earlier courses, for example the relationship between Bernoulli trials and the Geometric and Binomial distributions.
Note that the integral in (1.1) is over a set \(A\sw \R ^d\), with variable \(x\in \R ^d\). We’ll generally use this notation
instead of writing out multiple integral signs (e.g. \(\int \int \int \cdots \int \ldots dx_1\,dx_2,\ldots ,dx_d\)) in this course.
We will often view a constant, say \(a\in \R \), as an example of a deterministic random variable. This is another slight abuse of terminology, but it is natural and it won’t cause any trouble. Note that
deterministic random variables are a special type of discrete random variable.
1.1.1 \(\offsyl \) Technicalities
In this off-syllabus section we mention three technical points. They are aimed mainly at students with more technical backgrounds in analysis and probability theory. We won’t discuss these points in lectures.
-
1. More advanced textbooks use the term absolutely continuous for the class of random variables that we have called continuous. The complication arises because
there are random variables for which \(F_X\) is a continuous function but no p.d.f. \(f_X\) exists. These random variables are usually associated to random fractals and are rarely used within statistics, so in
statistics it is common to drop the word ‘absolutely’.
-
2. In this course we will use the convention that probability density functions must be continuous (as functions) except where they are zero. You can check that all of the
distributions on the reference sheet in Appendix A are given in this form.
In fact, probability density functions \(f_X(x)\) are only defined almost everywhere. The term for almost all \(x\) is also commonly used. We cannot explain the precise meaning of it within
this course, and many (otherwise good) textbooks on Bayesian statistics fail to note that this difficulty exists. Loosely, the same distribution can be defined using two (or more) different probability density functions
\(f_X(x)\) and \(f'_X(x)\), but it will always be the case that \(f_X(x)=f'_X(x)\) for ‘almost all’ values of \(x\). We will discuss the matter further in Section 1.2, Remarks 3.1.3 and 6.1.2.
-
3. In our definitions and results above, the sets \(A\) for which we evaluate \(\P [X\in A]\) must be Borel subsets of \(\R ^d\). In practice this technicality does not restrict
us at all and we will continue to ignore this point for the remainder of the course.
Taking care of these issues rigorously requires some background on Lebesgue integration, but we do not assume that background for this course.