Bayesian Statistics
8.4 Gibbs sampling
Recall that our parameter
For
-
Example 8.4.1 Imagine you have parked your car inside a multi-story car park and then forgotten where you’ve parked it. If you know which level your car is on then you will only have to search on that level, and you will find your car in minutes. If you don’t know which level, it will take you much longer. The first case is exploring
, the second is . Actually, to make this example properly match the difference between and , the car park should have the same number of floors as there are parking spaces along the side-length of each floor! The problem only gets worse in .
One strategy for working around this problem is to update the parameters
-
Remark 8.4.2 In reality, instead of updating the
one-by-one, it is common to update the parameters in small batches. For example we might update in one step, then in the next step, and so on. It is helpful to put related parameters, with values that might strongly influence each other, within the same batch.
When we update the parameters in turn, a common choice of proposal distribution is to set
-
Definition 8.4.3 The distributions of
, for , are known as the full conditional distributions. From Lemma 1.6.1 the full conditional has p.d.f. given by We can calculate from Theorems 2.4.1/3.1.2, which provides a strategy for calculating (8.12) analytically. Note that in (8.12) treats and as constants, and the only variable is .
The Gibbs sampler that results from these strategies is as follows.
-
1. Choose an initial point
. -
2. For each
, sample from and setNote that we increment the value of
each time that we increment . When reach , return to and repeat.Repeat this step until
is large enough that values taken by the are no longer affected by the choice of . -
3. The final value of
is now a sample of .
Note that we need to take samples from the full conditionals
For the MH algorithm we had Theorem 8.2.2 to tell us that
If the full conditionals can’t be easily sampled from, then one strategy is to use the MH algorithm (run inside of step 2 above) to obtain samples of
-
Example 8.4.5 This example comes from Sections 1.1.1/7.5.3/8.6.2 of the book ‘Bayesian Approach to Intrepreting Archaeological Data’ by Buck et al (1996). The data comes from radiocarbon dating, and is a vector
of estimated ages obtained (via carbon dating) from different objects. We write for the true age of object , which is unknown. Our model for the age of each object is and we assume that the estimation errors are independent, for each . For simplicity we will assume that the are known parameters, so our model family has parameters . We thus have the model familyFrom the historical context of the objects, it is known that
, so we condition our model on this event. We can use Exercise 1.8 to do this conditioning, resulting in a new model family given by We use the Bayesian model with model family and the improper priorBy Theorem 3.1.2 we obtain that the posterior distribution has p.d.f.
This is the same density as
, except now we treat rather than as the variable. The density is symmetric in and so we already know this distribution. It is the distribution of conditioned on the event . One way to simulate samples of this distribution is via rejection sampling: simulate and reject the sample until it satisfies . The trouble is that unless is small, we will mostly end up rejecting the samples because the condition we have imposed is an unlikely one.From (8.13) and (8.12) we have full conditionals given by
(where we set and to make convenient notation). Note that is the only variable here, and the second line follows because and are treated as constants. We recognize this full conditional distribution as that of conditioned on the event . These full conditionals are much easier to sample from: we use rejection sampling, sample and reject until we obtain a sample for which . Hence, in this situation we have all the necessary ingredients to use a Gibbs sampler.