MATH3871 Assignment 1
Robert Tan School of Mathematics and Statistics
[email protected]
Robert Tan
MATH3871: Assignment 1.
Question 1 Let θ Let θ be be the true proportion of people over the age of 40 in your community with hypertension. Consider the following thought experiment:
Part (a) Making an educated guess, suppose we choose an initial point estimate of θ θ = 0.2, obtained by taking the expectation of a Beta(2, Beta(2, 8) distribution. We choose this type of distribution since it is the conjugate prior of a binomial distribution, which is the distribution of our data.
Part (b) If we survey for hypertension within the community, and the first five people randomly selected have 4 positives, then our posterior distribution can be evaluated as follows: 7
f θ (θ (θ)
∝ θ (1 − θ) L x|θ ∝ θ (1 − θ) f (θ (θ) ∝ p (θ (θ) × p ∝ θ (1 − θ)
θ |x
4
(x) x|θ (x 8
θ
5
So our posterior has a Beta(6, Beta(6, 9) distribution, and the new point estimate using the expected 6 value is 6+9 = 0.4.
Part (c) If our final survey results are 400 positives out of 1000 people, we can once again compute the posterior as follows: 7
f θ (θ (θ)
∝ θ (1 − θ) L x|θ ∝ θ (1 − θ) f (θ (θ) ∝ p (θ (θ) × p (x (x) ∝ θ (1 − θ)
θ |x
400 θ
401
600
x|θ 607
So our posterior has a Beta(402, Beta(402, 608) distribution, and the new point estimate using the ex402 pected value is 402+608 = 0.39802.
1
Robert Tan
MATH3871: Assignment 1.
Question 2 d Let x1 , . . . , xn -dimension ional al vectors. vectors. Suppo Suppose se that we wish to model xi ∼ R be n iid d-dimens N d (µ, Σ) for i = 1, . . . , n where n where µ R is an unknown mean vector, and Σ is a known positive semi-definite covariance matrix.
∈
∈
Part (a) Claim. By adopting the conjugate prior µ for µ x1 , . . . , xn is N d µ, µ ˆ, ˆ Σ , where
|
µ ˆ = and
Σ0−1
∼
−1
+ nΣ nΣ
N d (µ (µ0 , Σ0 ), the resulting posterior distribution
−1
Σ0−1 µ0 + n + nΣ Σ−1 x ¯
ˆ = Σ−1 + nΣ Σ nΣ−1 0 prior µ Proof. We have the prior µ
(µ0 , Σ0 ), ∼ N d (µ 1
f µ (µ) =
d/2 d/2
(2π (2π )
1/2
|Σ | 0
so
exp
−
1 (µ (µ 2
−1
.
−µ ) 0
Σ0−1 (µ
−µ ) 0
.
We also have the likelihood function as follows:
L
x1 , . . . , xn µ =
|
1 nd/2 nd/2
(2π (2π )
|Σ|
exp n/2 n/2
Calculating the posterior: f µ|x
(µ (µ) 1 ,...,x2
∝ p (µ) × L µ
∝ exp
x1 , . . . , x n µ
− − − − 1 (µ ( µ 2
|
µ0 ) Σ0−1 (µ
−
n
1 2
(xi
i=1
− µ ) − 21 0
− µ)
Σ−1 (xi
− µ)
n
(xi
i=1
− µ)
Σ−1 (xi
− µ)
Expanding and eliminating the constant terms due to proportionality:
∝ exp
1 −1 µ Σ0 µ 2
0
− µ Σ
n
−1
−µ
Σ0−1 µ0 +
n
xi Σ−1 µ
nµ Σ µ
−1 0 µ
i=1
−
.
µ Σ−1 xi
i=1
Adding in a constant term to “complete the square” and factorising (again, we can do this because of proportionality):
∝ exp
− − × − 1 µ 2
µ0 Σ0−1
µ
−1
+ n¯ nx ¯ Σ
Σ0−1 + nΣ nΣ−1
2
Σ0−1
−1
−1
+ nΣ nΣ
−1
Σ0−1 µ0 + n + nΣ Σ−1 x ¯
Σ0−1
.
−1
+ nΣ nΣ
Robert Tan
MATH3871: Assignment 1.
Using (Ax (Ax)) = x A and the fact that covariance matrices (and their inverses) are symmetric and hence invariant under the transpose, we obtain
Σ0−1
−1
+ nΣ nΣ
−1
Σ0−1 µ0 + n + nΣ Σ−1 x ¯
µ0 Σ0−1
=
−1
+ n¯ nx¯ Σ
Σ0−1
−1
+ nΣ nΣ
−1
.
So we have f µ|x
1
(µ) ,...,x2 (µ
∝ exp
− − × − 1 2
µ
Σ0−1
Σ0−1
µ
−1
−1
+ nΣ nΣ
−1
+ nΣ nΣ
−1
Σ0−1 µ0 + n + nΣ Σ−1 x¯
Σ0−1
−1
+ nΣ nΣ
Σ0−1 µ0 + n + nΣ Σ−1x¯
which means the posterior distribution is a multivariate normal N d µ, µ ˆ, ˆ Σ , where µ ˆ = and
Σ0−1
−1
+ nΣ nΣ
−1
Σ0−1 µ0 + n + nΣ Σ−1 x ¯
ˆ = Σ−1 + nΣ Σ nΣ−1 0
−1
.
Part (b) We now derive Jeffreys’ prior πJ (µ) for µ. We have the likelihood function from above:
L
x1 , . . . , xn µ =
Lemma. If x is a n
|
1 nd/2 nd/2
(2π (2π )
|Σ|
exp n/2 n/2
− 1 2
n
(xi
i=1
− µ)
Σ−1 (xi
× 1 vector, and A is a n × n matrix, then we have
− µ)
.
d xAx = x A + A . dx
We shal shalll use use Eins Einstei tein’ n’ss summ summati ation on conve convent ntio ion n for this this proof proof for for clar clarit ity y. Proof. We
Let x =
(x1 , . . . , xn ) , e j be the j th basis column vector, and let [A [A]ij = a ij . d x Ax = dx =
x ) i ij e j x)
∇(x a ∇(x a
i ij x j )
= 2aii xi + (a ( aij + aj + ajii)x j ei
where j = i, i , since we have one xi2 term and the rest are xi x j terms
= xi aii + a + aij x j ei + xi aii + a + a ji x j ei = x
A + A .
3
Robert Tan
MATH3871: Assignment 1.
Now, returning to the derivation of Jeffreys’ prior: log
L
|
x1 , . . . , xn µ = log d dµ
L=−
1
nd/2 nd/2
(2π (2π )
Σ
1 2
n/2 n/2
n
(xi
1 d 2 dµ
−1
µ) Σ−1 (xi
i=1
n
i=1
µ Σ−1µ
− µ)
n
xi Σ−1 µ
nµ Σ µ
− n2 ddµ
=
− − || − − − −
n
−1
µ Σ xi +
i=1
x¯ Σ−1 µ
i=1
µΣ−1 x¯
xi Σ−1 xi
Using the above lemma, with the fact that Σ and hence Σ−1 are symmetric, and noting that d dx
x A =
d dx
A x
= A where A is an n
+
× k matrix, with k ∈ Z
(a result we can
confirm easily using summation notation): d dµ
∴
d2 dµ2
L=
−
−1
n µ Σ
−1
n Σ L = nΣ
− x¯
−1
Σ
.
1
− L ∝ 2
∴
E
d dµ2
2
1.
since the square root of the expectation of the determinant of a constant matrix will also be a constan constant. t. We see that Jeffrey Jeffreys’ s’ prior for the multi multiv variate ariate normal distribu distribution tion with fixed covariance matrix and unknown mean vector is simply proportional to a constant. This result is similar to the one for a univariate Gaussian distribution with fixed variance, which also has a constant (improper) distribution for its Jeffreys’ prior.
Question 3 Part (a) We know that p is pˆ is an estimate of the ratio of the area of the circle to the area of the square. πr This ratio’s true value is (2r = π4 , so this means 4ˆ p is p is an estimate of π. (2r) 2
2
Part (b) > n <- 1000 > x1 <- runi runif( f(n, n, -1, -1, 1) > x2 <- runi runif( f(n, n, -1, -1, 1) > ind <- ((x1^2 + x2^2) < 1) > pi.hat <- 4 * (sum(ind) / n) > pi.hat pi.hat [1] 3.156 3.156
The above R code gives a one-trial estimate of 4ˆ p = p = πˆ = 3.156. 4
Robert Tan
MATH3871: Assignment 1.
Part (c) We know that bi is a Bernoul Bernoulli li r.v. with with probabi probabilit lity y π/4, π/4, so its variance is n
the sampling variability of π ˆ=4 Central Limit Theorem we have
bi is simply 4 2 n
i=1
πˆn
d
−→ N
π,
π (4 π (4
×n× − π)
n
− ×
π 4
1
π 4
.
Note: we can re-write this in terms of p as πˆn
d
−→
16 p 16 p (1 p) p) N 4 p, n
−
.
Part (d) n <- 1000 p <- 0.78 0.7854 54 pi <- 4*p var <- 16*p*( 16*p*(1-p 1-p)/n )/n pi.h pi.hat at <- c() c() for for (i in c(1: c(1:10 1000 00)) )) { x1 <- runi runif( f(n, n, -1, -1, 1) x2 <- runi runif( f(n, n, -1, -1, 1) ind ind <- ((x1^ ((x1^2 2 + x2^2 x2^2) ) < 1) pi.h pi.hat at[i [i] ] <- 4 * (sum (sum(i (ind nd) ) / n) } hist(p hist(pi.h i.hat, at, breaks breaks = 20, freq freq = FALSE) FALSE) x <- seq(min(pi seq(min(pi.hat) .hat),max( ,max(pi.ha pi.hat),le t),length ngth = 100) y <- dnor dnorm( m(x, x, mean mean = 4*p, 4*p, sd = sqrt sqrt(v (var ar)) )) poin points ts(x (x, , y, type type = "l") "l") Histogram of pi.hat
6 y t i s n e D
4
2
0
3.00
3.05
3 .1 0
3.15
3.20
3.25
3.30
pi.hat
We can see that the histogram fits the overlay distribution fairly well. 5
1 n2
=
π 4
− 1
π(4− (4−π ) , n
π 4
. Th Then en
so so by by the the
Robert Tan
MATH3871: Assignment 1.
Part (e) 16 p(1− p) We know that the variance is given by , which which is maxi maximis mised ed at p = 0.5, giving n4 . n We choose to maximise the variance since this will result in maximal Monte Carlo sampling variability, which is what we need for the most conservative estimate of the sample size n required to estimate π to within 0.01 with at least 95% probability. Solving for n for n::
| − | ≤ ≥ − √ ≤ ≤ ≤ √ ≥ ≤ ≤ √ − ≥ π ˆ
P
π
0.01
0.95
We apply the CLT and use a normal approximation to get: 0.01 2/ n
P
2
P
Z
Z
0.01 2/ n
P
0.01 2/ n
0.5
0.95 where Z
(0, 1) ∼ N (0, ∼
0.95
01 ≤ 20/.√ ≤ ≥ 0.975 n
Z
0.01 2/ n n n
√ ≥ 1.96 √ ≥ 392 153664. ≥ 153664.
So n = 153664 is a conservative sample size for estimating π to within 0.01 with at least 95% probability. To be even more conservative, we could round up to 160000 samples (after all, we are using a normal approximation ).
6