Nov. 22, 2003, revised Dec. 27, 2003
Hayashi Econometrics Econometrics
Solution to Chapter 1 Analytical Exercises 1. (Reproducin (Reproducing g the answer answer on p. 84 of the book)
(y−Xβ ) (y − Xβ ) = [(y − Xb) + X(b − β )] [(y − Xb) + X(b − β)] (by the add-and-subtract strategy)
= [( y − Xb) + (b − β ) X ][(y − Xb) + X(b − β )]
= ( y − Xb) (y − Xb) + ( b − β ) X (y − Xb)
+ (y − Xb) X(b − β) + ( b − β) X X(b − β)
= ( y − Xb) (y − Xb) + 2(b − β) X (y − Xb) + ( b − β) X X(b − β) (since (b − β ) X (y − Xb) = ( y − Xb) X(b − β)) = ( y − Xb) (y − Xb) + ( b − β ) X X(b − β )
(since X (y − Xb) = 0 by the normal equations) ≥ (y − Xb) (y − Xb) n
(since (b − β ) X X(b − β ) = z z =
zi2 ≥ 0 where z ≡ X(b − β)). )).
i=1
2. (a), (a), (b). (b). If X is an n × K matrix matrix of full column rank, then X X is symmetric and invertible. It is very straightforward to show (and indeed you’ve been asked to show in the text) that MX ≡ In − X(X X)−1 X is symmetric and idempotent and that M XX = 0 . In this question, set X = 1 (vector of ones). (c) M1 y = [In − 1(1 1)−1 1 ]y
1 = n)) 11 y (since 1 1 = n n n 1 = y − 1 yi = y − 1· y n = y −
i=1
(d) Replace “ y” by “X” in (c). 3. Special case of the solution solution to the next exercise. exercise. 4. From the normal equations equations (1.2.3) of the text, we obtain (a)
X1 X2
. [X1 .. X 2 ]
b1 b2
=
X1 X2
y.
Using the rules of multiplication of partitioned matrices, it is straightforward to derive ( ∗) and (∗∗) from the above. 1
(b) By premultiplying both sides of (∗) in the question by X1 (X1 X1 )−1 , we obtain X1 (X1 X1 )−1 X1 X1 b1 = −X1 (X1 X1 )−1 X1 X2 b2 + X1 (X1 X1 )−1 X1 y
⇔
X1 b1 = −P1 X2 b2 + P1 y
Substitution of this into ( ∗∗) yields X2 (−P1 X2 b2 + P1 y) + X2 X2 b2 = X 2 y
⇔
X2 (I − P1 )X2 b2 = X 2 (I − P1 )y
⇔ ⇔
X2 M1 X2 b2 = X 2 M1 y
X2 M1 M1 X2 b2 = X 2 M1 M1 y (since M 1 is symmetric & idempotent)
X2 X2 b2 = X2 y.
⇔
Therefore,
b2 = (X2 X2 )−1 X2 y
(The matrix X2 X2 is invertible because X2 is of full column rank. To see that X2 is of full column rank, suppose not. Then there exists a non-zero vector c such that X2 c = 0 . But
X2 c = X 2 c − X1 d where d ≡ (X1 X1 )−1 X1 X2 c. That is, Xπ X π = 0 for π ≡
−d . This is c
. a contradiction because X = [X1 .. X2 ] is of full column rank and π = 0 .) (c) By premultipl premultiplying ying both sides of y y = X 1 b1 + X2 b2 + e by M 1 , we obtain M1 y = M 1 X1 b1 + M1 X2 b2 + M1 e.
Since M1 X1 = 0 and y ≡ M1 y, the above equation can be rewritten as
y = M 1 X2 b2 + M1 e
= X2 b2 + M1 e.
M1 e = e because
M1 e = (I − P1 )e
= e − P1 e = e − X1 (X1 X1 )−1 X1 e = e (since X1 e = 0 by normal equations). equations). (d) From (b), we have
b2 = (X2 X2 )−1 X2 y
= (X2 X2 )−1 X2 M1 M1 y = (X2 X2 )−1 X2 y.
Therefore, b2 is the OLS coefficient estimator for the regression y on X2 . The resi residua duall vector from the regression is
y − X2 b2 = (y − y) + ( y − X2 b2 )
= (y − M1 y) + ( y − X2 b2 ) = (y − M1 y) + e (by (c)) = P 1 y + e. 2
(b) By premultiplying both sides of (∗) in the question by X1 (X1 X1 )−1 , we obtain X1 (X1 X1 )−1 X1 X1 b1 = −X1 (X1 X1 )−1 X1 X2 b2 + X1 (X1 X1 )−1 X1 y
⇔
X1 b1 = −P1 X2 b2 + P1 y
Substitution of this into ( ∗∗) yields X2 (−P1 X2 b2 + P1 y) + X2 X2 b2 = X 2 y
⇔
X2 (I − P1 )X2 b2 = X 2 (I − P1 )y
⇔ ⇔
X2 M1 X2 b2 = X 2 M1 y
X2 M1 M1 X2 b2 = X 2 M1 M1 y (since M 1 is symmetric & idempotent)
X2 X2 b2 = X2 y.
⇔
Therefore,
b2 = (X2 X2 )−1 X2 y
(The matrix X2 X2 is invertible because X2 is of full column rank. To see that X2 is of full column rank, suppose not. Then there exists a non-zero vector c such that X2 c = 0 . But
X2 c = X 2 c − X1 d where d ≡ (X1 X1 )−1 X1 X2 c. That is, Xπ X π = 0 for π ≡
−d . This is c
. a contradiction because X = [X1 .. X2 ] is of full column rank and π = 0 .) (c) By premultipl premultiplying ying both sides of y y = X 1 b1 + X2 b2 + e by M 1 , we obtain M1 y = M 1 X1 b1 + M1 X2 b2 + M1 e.
Since M1 X1 = 0 and y ≡ M1 y, the above equation can be rewritten as
y = M 1 X2 b2 + M1 e
= X2 b2 + M1 e.
M1 e = e because
M1 e = (I − P1 )e
= e − P1 e = e − X1 (X1 X1 )−1 X1 e = e (since X1 e = 0 by normal equations). equations). (d) From (b), we have
b2 = (X2 X2 )−1 X2 y
= (X2 X2 )−1 X2 M1 M1 y = (X2 X2 )−1 X2 y.
Therefore, b2 is the OLS coefficient estimator for the regression y on X2 . The resi residua duall vector from the regression is
y − X2 b2 = (y − y) + ( y − X2 b2 )
= (y − M1 y) + ( y − X2 b2 ) = (y − M1 y) + e (by (c)) = P 1 y + e. 2
This does not equal e because P1 y is not necessarily necessarily zero. The SSR from the regression of y on X2 can be written as
(y − X2 b2 ) (y − X2 b2 ) = ( P1 y + e) (P1 y + e) = ( P1 y) (P1 y) + e e (since P 1 e = X 1 (X1 X1 )−1 X1 e = 0 ).
This does not equal e e if P1 y is not zero.
(e) From (c), y = X2 b2 + e. So
y y = (X2 b2 + e) (X2 b2 + e)
= b 2 X2 X2 b2 + e e (since X2 e = 0 ).
Since b2 = (X2 X2 )−1 X2 y, we have b 2 X2 X2 b2 = y X2 (X2 M1 X2 )−1 X2 y. (f)
(i) Let b1 be the OLS coefficient estimator for the regression of y on X 1 . Then
b1 = (X1 X1 )−1 X1 y
= ( X1 X1 )−1 X1 M1 y
= ( X1 X1 )−1 (M1 X1 ) y = 0 (since M1 X1 = 0 ).
So S So S SR 1 = (y − X1 b1 ) (y − X1 b1 ) = y y. (ii) Since the residual vector vector from the regression of y on X2 equals e by (c), S (c), SSR SR 2 = e e. (iii) From the Frisch-W Frisch-Waugh augh Theorem, the residuals from the regression of y on X1 and So SSR SR 3 = e e. X2 equal those from the regression of M 1 y (= y) on M 1 X2 (= X2 ). So S
5. (a) The hint is as good as the answer.
(b) Let ε ≡ y −Xβ, the residuals from the restricted regression. By using the add-and-subtract strategy, we obtain
ε ≡ y − Xβ = (y − Xb) + X(b − β).
So
SSR SS RR = [(y − Xb) + X(b − β)] [(y − Xb) + X(b − β )]
= ( y − Xb) (y − Xb) + ( b − β ) X X(b − β) But S But S SR U = ( y − Xb) (y − Xb), so
(since X (y − Xb) = 0 ).
SS RR − SSR SS RU = ( b − β ) X X(b − β )
= (Rb − r) [R(X X)−1 R ]−1 (Rb − r)
(usi (using ng the the expr expres esio ion n for for β from (a))
= λ R(X X)−1 R λ
(using (using the expresion expresion for λ from (a))
= ε X(X X)−1 X ε
(by the first order order conditio conditions ns that that X (y − Xβ) = R λ)
= ε Pε. (c) The F The F -ratio -ratio is defined as F ≡
(Rb − r) [R(X X)−1 R ]−1 (Rb − r)/r s2 3
(where r (where r = #r)
(1.4.9)
Since (Rb − r) [R(X X)−1 R ]−1 (Rb − r) = SSR S SR R − SS RU as shown above, the F -ratio F -ratio can be rewritten as (SS RR − SSR SS RU )/r 2 s (SS RR − SSR SS RU )/r = e e/(n − K ) (SS RR − SSR SS RU )/r = SSR SS RU /(n − K )
F =
Therefore, (1.4.9)=(1.4.11). 6. (a) Unrestricted model : y = Xβ X β + ε, where
y (N ×1)
=
y1 .. .
,
X
(N ×K )
yn
=
1 .. .
x12 . . . x1K .. .. ... . . 1 xn2 . . . xnK
,
=
β (K ×1)
β 1 .. .
.
β n
Restricted Restricted model: y = Xβ X β + ε, Rβ R β = r , where
R
((K −1)×K )
=
0 1 0 0 0 1 .. .. . . 0 0
... 0 ... 0 .. . 1
,
r
((K −1)×1)
=
0 .. .
.
0
Obviously, the restricted OLS estimator of β is
β
(K ×1)
=
y 0 .. . 0
.
So Xβ =
y y .. . y
= 1 · y.
(You can use the formula for the unrestricted OLS derived in the previous exercise, β = verify y this.) this.) If SSR SS RU and SS RR are the b − (X X)−1 R [R(X X)−1 R ]−1 (Rb − r), to verif minimized sums of squared residuals from the unrestricted and restricted models, they are calculated as n
(yi − y )2
SSR SS RR = (y − Xβ) (y − Xβ) =
i=1
n
SS RU = ( y − Xb) (y − Xb) = e e =
e2i
i=1
Therefore, n
SS RR − SSR SS RU =
i=1
4
n
2
(yi − y ) −
i=1
e2i .
(A)
On the other hand,
(b − β) (X X)(b − β ) = (Xb − Xβ) (Xb − Xβ) n
(yi − y)2 .
=
i=1
Since S SR R − SSRU = (b − β) (X X)(b − β) (as shown in Exercise 5(b)), n
n
2
n
e2i
(yi − y) −
i=1
(yi − y)2 .
=
i=1
(B)
i=1
(b) F = = =
(SSRR − SSRU )/(K − 1) n 2 i=1 ei /(n − K )
(
n i=1 (yi
− y)2 −
n 2 i=1 ei )/(K −
n 2 i=1 ei /(n
(yi −y)2 /(K −1) n (yi −y)2 i=1 n e 2i /(n−K ) i=1 n (yi −y)2 i=1
1)
(by equation (A) above)
− K )
n 2 i=1 (yi − y) /(K − 1) n 2 i=1 ei /(n − K ) n
(by Exercise 5(c))
(by equation (B) above) n
i=1
=
=
R2 /(K − 1) (1 − R2 )/(n − K )
(by dividing both numerator & denominator by
(yi − y)2 )
i=1
(by the definition or R2 ).
7. (Reproducing the answer on pp. 84-85 of the book)
(a) βGLS − β = Aε where A ≡ (X V−1 X)−1 X V−1 and b − β GLS = Bε where B ≡ (X X)−1 X − (X V−1 X)−1 X V−1 . So
Cov(β GLS − β, b − β GLS) = Cov(Aε, Bε) = A Var(ε)B = σ 2 AVB . It is straightforward to show that AVB = 0 . (b) For the choice of H indicated in the hint,
Var(β ) − Var(β GLS) = −CVq−1 C . If C = 0 , then there exists a nonzero vector z such that C z ≡ v = 0 . For such z ,
z [Var(β) − Var(βGLS)]z = −v Vq−1 v < 0
which is a contradiction because β GLS is efficient.
5
(since Vq is positive definite),
Nov. 25, 2003, Revised February 23, 2010
Hayashi Econometrics
Solution to Chapter 2 Analytical Exercises 1. For any ε > 0,
1 n
Prob( zn > ε) =
| |
→ 0
as n
→ ∞.
So, plim zn = 0. On the other hand, E(zn ) = which means that limn→∞ E(zn ) =
n
− 1 · 0 + 1 · n2 = n, n
n
∞.
2. As shown in the hint, (z n
− µ)2 = (z − E(z n
2
n ))
+ 2(z n
− E(z
n ))(E(z n )
− µ) + (E(z ) − µ)2. n
Take the expectation of both sides to obtain E[(z n
− µ)2] = E[(z − E(z n
2
n ))
] + 2E[z n
= Var(z n ) + (E(z n )
− µ)2
− E(z
− µ) + (E(z ) − µ)2 − E(z )] = E(z ) − E(z
n )](E(z n )
(because E[z n
n
n
n
n)
= 0).
Take the limit as n
→ ∞ of both sides to obtain lim E[(z − µ)2 ] = lim Var(z ) + lim (E(z ) − µ)2
n→∞
n
n→∞
n
n
n→∞
= 0 (because lim E(z n ) = µ, lim Var(z n ) = 0). n→∞
n→∞
Therefore, zn
→m.s. µ. By Lemma 2.2(a), this implies z →p µ. n
3. (a) Since an i.i.d. process is ergodic stationary, Assumption 2.2 is implied by Assumption 2.2 . Assumptions 2.1 and 2.2 imply that gi xi εi is i.i.d. Since an i.i.d. process with mean zero is mds (martingale differences), Assumption 2.5 is implied by Assumptions 2.2 and 2.5 .
≡ ·
(b) Rewrite the OLS estimator as
− β = (X X) 1X ε = S 1 g . (A) {x } is i.i.d., {x x } is i.i.d. So by Kolmogorov’s Second Strong
b
Since by Assumption 2.2 LLN, we obtain
−
−
xx
i
i i
→p Σ
S
xx
xx
The convergence is actually almost surely, but almost sure convergence implies convergence in probability. Since Σ is invertible by Assumption 2.4, by Lemma 2.3(a) we get xx
S−1 xx
1
→p Σ
1
−
xx
.
Similarly, under Assumption 2.1 and 2.2 gi is i.i.d. By Kolmogorov’s Second Strong LLN, we obtain
{ }
→p E(g ),
g
i
which is zero by Assumption 2.3. So by Lemma 2.3(a), S−1 g xx
→p Σ 1· 0 = 0. −
xx
Therefore, plimn→∞(b β) = 0 which implies that the OLS estimator b is consistent. Next, we prove that the OLS estimator b is asymptotically normal. Rewrite equation(A) above as
−
√ n(b − β) = S 1 √ ng. −
xx
As already observed, gi is i.i.d. with E(gi ) = 0 . The variance of gi equals E(gi gi ) = S since E(gi ) = 0 by Assumption 2.3. So by the Lindeberg-Levy CLT,
{ }
√ ng → N (0, S). d
→p Σ 1. Thus by Lemma 2.4(c), √ n(b − β) → N (0, Σ 1 S Σ 1). d
Furthermore, as already noted, S −1
−
xx
xx
−
−
xx
xx
4. The hint is as good as the answer. 5. As shown in the solution to Chapter 1 Analytical Exercise 5, SSR R SSRR
− SSR
U
= (Rb
−
− r) [R(X X)
1
U can
− SSR 1 (Rb − r).
be written as
R ]−
Using the restrictions of the null hypothesis, Rb
− r = R(b − β)
= R (X X)−1 X ε (since b = RS
−
1
xx
g (where g
1 n
≡
−
− β = (X X) x · ε .).
1
X ε)
n
i
i
i=1
Also [R(X X)−1 R]−1 = n [RS−1 R]−1 . So
·
SSRR
xx
− SSR
U
√
√
= ( n g) S−1 R (R S−1 R )−1 R S−1 ( n g). xx
xx
xx
Thus SSRR
− SSR
U
s2
√
√
= ( n g) S−1 R (s2 R S−1 R )−1 R S−1 ( n g) xx
xx
xx
1 = z n A− n zn ,
where
≡ R S 1(√ n g),
≡ s2 R S
−
zn
An
xx
By Assumption 2.2, plim S Lemma 2.4(c), we have:
= Σ
xx
xx
1
−
xx
R .
. By Assumption 2.5, −
→d N (0, RΣ
zn
1
xx
2
SΣ −1 R ). xx
√ ng →
d
N (0, S). So by
But, as shown in (2.6.4), S = σ 2 Σ under conditional homoekedasticity (Assumption 2.7). So the expression for the variance of the limiting distribution above becomes xx
RΣ−1 SΣ −1 R = σ 2 RΣ−1 R xx
xx
xx
≡ A.
Thus we have shown:
→d z, z ∼ N (0, A).
zn
2 2 As already observed, S p Σ . By Assumption 2.7, σ = E(εi ). So by Proposition 2.2, s2 p σ2 . Thus by Lemma 2.3(a) (the “Continuous Mapping Theorem”), An p A. Therefore, by Lemma 2.4(d), 1 zn A− z A−1 z. n zn xx
→
→
xx
→
→d
But since Var(z) = A , the distribution of z A−1 z is chi-squared with #z degrees of freedom. 6. For simplicity, we assumed in Section 2.8 that yi , xi is i.i.d. Collecting all the assumptions made in Section 2.8,
{
}
(i) (linearity) y i = x i β + εi . (ii) (random sample) yi , xi is i.i.d.
{
}
(iii) (rank condition) E( xi xi ) is non-singular. (iv) E(ε2i xi xi ) is non-singular. (v) (stronger version of orthogonality) E(εi xi ) = 0 (see (2.8.5)).
|
(vi) (parameterized conditional heteroskedasticity) E(ε2i xi ) = z i α.
|
These conditions together are stronger than Assumptions 2.1-2.5. (a) We wish to verify Assumptions 2.1-2.3 for the regression equation (2.8.8). Clearly, Assumption 2.1 about the regression equation (2.8.8) is satisfied by (i) about the original regression. Assumption 2.2 about (2.8.8) (that ε2i , xi is ergodic stationary) is satisfied by (i) and (ii). To see that Assumption 2.3 about (2.8.8) (that E(zi ηi ) = 0) is satisfied, note first that E(ηi xi ) = 0 by construction. Since zi is a function of xi , we have E(ηi zi ) = 0 by the Law of Iterated Expectation. Therefore, Assumption 2.3 is satisfied. The additional assumption needed for (2.8.8) is Assumption 2.4 that E( zi zi ) be nonsingular. With Assumptions 2.1-2.4 satisfied for (2.8.8), the OLS estimator α is consistent by Proposition 2.1(a) applied to (2.8.8).
{
}
|
|
(b) Note that α
α = (α
α)
(α
− − −∗∗ −
α) and use the hint.
(c) Regarding the first term of ( ), by Kolmogorov’s LLN, the sample mean in that term converges in probability to E(xi εi zi ) provided this population mean exists. But E(xi εi zi ) = E[zi xi E(εi zi )].
· ·
|
By (v) (that E(εi xi ) = 0) and the Law of Iterated Expectations, E( εi zi ) = 0. Thus E(xi εi zi ) = 0. Furthermore, plim(b β ) = 0 since b is consistent when Assumptions 2.1-2.4 (which are implied by Assumptions (i)-(vi) above) are satisfied for the original regression. Therefore, the first term of ( ) converges in probability to zero. Regarding the second term of ( ), the sample mean in that term converges in probability to E(x2i zi ) provided this population mean exists. Then the second term converges in probability to zero because plim(b β ) = 0.
|
|
−
∗∗ −
∗∗
3
√ n,
(d) Multiplying both sides of ( ) by
∗
√ · − √ −
√ n(α − α) = 1 = n
1 n
n
n
1
−
zi zi
i=1
1
1 n
−
zi zi
2 n(b
i=1
n
zi vi
i=1
1 β ) n
n
i=1
√ 1 x ε z + n(b − β )· (b − β ) i i i
n
n
x2i zi .
i=1
Under Assumptions 2.1-2.5 for the original regression (which are implied by Assumptions (i)-(vi) above), n(b β ) converges in distribution to a random variable. As shown in (c), n1 n p 0. So by Lemma 2.4(b) the first term in the brackets vanishes i=1 xi εi zi 2 (converges to zero in probability). As shown in (c), (b β ) n1 n i=1 xi zi vanishes provided E(x2i zi ) exists and is finite. So by Lemma 2.4(b) the second term, too, vanishes. Therefore, n(α α) vanishes, provided that E(zi zi ) is non-singular.
√ − →
√ −
−
7. This exercise is about the model in Section 2.8, so we continue to maintain Assumptions (i)(vi) listed in the solution to the previous exercise. Given the hint, the only thing to show is that the LHS of ( ) equals Σ −1 S Σ−1 , or more specifically, that plim n1 X VX = S . Write S as
∗∗
xx
xx
S = E(ε2i xi xi )
= E[E(ε2i xi )xi xi ]
|
= E(zi α xi xi )
(since E(ε2i xi ) = z i α by (vi)).
|
Since xi is i.i.d. by (ii) and since zi is a function of xi , zi αxi xi is i.i.d. So its sample mean converges in probability to its population mean E( zi α xi xi ), which equals S. The sample mean can be written as 1 n
n
zi αxi xi
i=1
1 = n =
n
vi xi xi
(by the definition of v i , where vi is the i-th diagonal element of V )
i=1
1 X VX. n
8. See the hint. 9. (a) E(gt gt−1 , gt−2 , . . . , g2 ) = E[E(gt εt−1 , εt−2 , . . . , ε1 ) gt−1 , gt−2 , . . . , g2 ] (by the Law of Iterated Expectations) = E[E(εt εt−1 εt−1 , εt−2 , . . . , ε1 ) gt−1 , gt−2 , . . . , g2 ] = E[εt−1 E(εt εt−1 , εt−2 , . . . , ε1 ) gt−1 , gt−2 , . . . , g2 ] (by the linearity of conditional expectations) =0 (since E(εt εt−1 , εt−2 , . . . , ε1 ) = 0).
|
| ·
| |
|
|
| |
4
(b) E(gt2 ) = E(ε2t ε2t−1 )
·
= E[E(ε2t ε2t−1 εt−1 , εt−2 , . . . , ε 1 )]
· | = E[E(ε |ε 1 , ε 2 t
(by the Law of Total Expectations)
2
t−2 , . . . , ε1 )εt−1 ]
t−
= E(σ 2 ε2t−1 )
(by the linearity of conditional expectations)
(since E(ε2t εt−1 , εt−2 , . . . , ε1 ) = σ 2 )
|
= σ 2 E(ε2t−1 ). But
E(ε2t−1 ) = E[E(ε2t−1 εt−2 , εt−3 , . . . , ε1 )] = E(σ 2 ) = σ 2 .
|
(c) If εt is ergodic stationary, then εt εt−1 is ergodic stationary (see, e.g., Remark 5.3 on p. 488 of S. Karlin and H. Taylor, A First Course in Stochastic Processes , 2nd. ed., Academic Press, 1975, which states that “For any function φ, the sequence Y n = φ(X n , X n+1 , . . . ) generates an ergodic stationary process whenever X n is ergodic stationary”.) Thus the Billingsley CLT (see p. 106 of the text) is applicable to nγ1 = n n1 n t=j +1 gt .
{ }
{ ·
}
{ } √
√
(d) Since ε 2t is ergodic stationary, γ0 converges in probability to E(ε2t ) = σ 2 . As shown in (c), 4 nγ1 n γ d N (0, σ ). So by Lemma 2.4(c) d N (0, 1). γ
√ →
√ → 1
0
10. (a) Clearly, E(yt ) = 0 for all t = 1, 2, . . . .
Cov(yt , yt−j ) =
(1 + θ12 + θ22 )σε2 (θ1 + θ1 θ2 )σε2 θ2 σε2 0
for j = 0 for j = 1, for j = 2, for j > 2,
So neither E(yt ) nor Cov(yt , yt−j ) depends on t. (b) E(yt yt−j , yt−j −1 , . . . , y0 , y−1 ) = E(yt εt−j , εt−j −1 , . . . , ε0 , ε−1 ) (as noted in the hint) = E(εt + θ1 εt−1 + θ2 εt−2 εt−j , εt−j −1 , . . . , ε0 , ε−1 )
|
=
|
|
εt + θ1 εt−1 + θ2 εt−2 θ1 εt−1 + θ2 εt−2 θ2 εt−2 0
for for for for
which gives the desired result.
5
j = 0, j = 1, j = 2, j > 2,
(c)
√
Var( n y) = =
1 [Cov(y1 , y1 + n
··· + y
n)
··· + Cov(y
n , y1 +
··· + y
1 [(γ 0 + γ 1 + + γ n−2 + γ n−1 ) + (γ 1 + γ 0 + γ 1 + n + + (γ n−1 + γ n−2 + + γ 1 + γ 0 )]
···
···
=
+
1 [nγ 0 + 2(n n
··· + γ
n−2 )
···
− 1)γ 1 + ··· + 2(n − j)γ + ··· + 2γ
n−1 ]
j
−
n−1
= γ 0 + 2
n )]
j γ j . n
1
j =1
(This is just reproducing (6.5.2) of the book.) Since γ j = 0 for j > 2, one obtains the desired result. (d) To use Lemma 2.1, one sets z n = ny. However, Lemma 2.1, as stated in the book, inadvertently misses the required condition that there exist an M > 0 such that E( zn s+δ ) < M for all n for some δ > 0. Provided this technical condition is satisfied, the variance of the limiting distribution of ny is the limit of Var( ny), which is γ 0 + 2(γ 1 + γ 2 ).
√
√
| |
√
11. (a) In the auxiliary regression, the vector of the dependent variable is e and the matrix of . regressors is [ X .. E]. Using the OLS formula,
1
α = B
1
−
n
X e
1
E e n
.
X e = 0 by the normal equations for the original regression. The j -th element of n1 E e is
1 (ej +1 e1 + n
···
1 + en en−j ) = nt
n
et et−j .
=j +1
which equals γj defined in (2.10.9). (b) The j-th column of n1 X E is n1 n t=j +1 xt et−j (which, incidentally, equals µj defined on p. 147 of the book). Rewrite it as follows.
1 nt
n
·
· − − · − xt et−j
=j +1
1 nt
n
1 = nt
n
=
xt (εt−j
xt−j (b
xt εt−j
1 nt
β ))
=j +1
=j +1
n
xt xt−j
(b
=j +1
The last term vanishes because b is consistent for β . Thus n1 probability to E(xt εt−j ). The (i, j) element of the symmetric matrix n1 E E is, for i
·
1 (e1+i−j e1 + n
···
1 + en−j en−i ) = nt
n t=j +1
xt et−j converges in
≥ j,
n−j
=1+i−j
6
− β)
et et−(i−j ) .
·
Using the relation et = ε t
− x (b − β), this can be rewritten as
t
n−j
1 nt
εt εt−(i−j )
=1+i−j
−
− (b −
n−j
1 nt
(xt εt−(i−j ) + xt−(i−j ) εt ) (b
=1+i−j
1 β) nt
n−j
xt xt−(i−j ) (b
=1+i−j
− β)
− β).
The type of argument that is by now routine (similar to the one used on p. 145 for (2.10.10)) shows that this expression converges in probability to γ i−j , which is σ 2 for i = j and zero for i = j.
(c) As shown in (b), plim B = B. Since Σ is non-singular, B is non-singular. So B−1 converges in probability to B−1 . Also, using an argument similar to the one used in (b) for showing that plim n1 E E = I p , we can show that plim γ = 0 . Thus the formula in (a) shows that α converges in probability to zero. xx
1
(d) (The hint should have been: “ n E e = γ . Show that
SSR
n
=
1 n
ee
the auxiliary regression can be written as
−
α
0 .” The SSR from γ
. . 1 1 SSR = (e [X .. E]α) (e [X .. E ]α) n n . 1 = (e [X .. E]α) e (by the normal equation for the auxiliary regression) n . 1 1 = e e α [X .. E] e n n
− − −
1 = e e n =
1 ee n
−
− − − − ≡ ≡ − − 1
α
n
X e
1
n
α
E e
0
γ
(since X e = 0 and
1 E e = γ ). n
As shown in (c), plim α = 0 and plim γ = 0 . By Proposition 2.2, we have plim n1 e e = σ 2 . Hence SSR/n (and therefore SSR /(n K p)) converges to σ 2 in probability. (e) Let
R
0
( p×K )
.. . I p
. [X .. E ].
, V
The F -ratio is for the hypothesis that R α = 0 . The F -ratio can be written as 1
−
(Rα) R(V V)−1 R (Rα)/p F = . SSR/(n K p)
7
( )
∗
Using the expression for α in (a) above, R α can be written as
Rα =
.. . I p
0
( p×K )
=
0
(K ×1)
1
−
B
γ
( p×1)
.. . I p
0
( p×K )
= B22 γ .
B11 (K ×K ) B21 ( p×K )
B12 (K × p) B22 ( p× p)
0
(K ×1)
γ
( p×1)
( )
∗∗
Also, R (V V)−1 R in the expression for F can be written as R(V V)−1 R =
1 R B−1 R n
1 = n
0
( p×K )
(since .. . I p
1 V V = B) n
B11 (K ×K ) B21 ( p×K )
B12 (K × p) B22 ( p× p)
0
(K × p)
I p
1 22 B . n Substitution of ( ) and ( ) into ( ) produces the desired result. (f) Just apply the formula for partitioned inverses. =
∗∗∗
√ − √ · −
∗∗
(
∗ ∗ ∗)
∗
→ − − → → − −
(g) Since nρ nγ /σ2 p 0 and Φ p Φ, it should be clear that the modified Box-Pierce Q (= n ρ (I p Φ)−1 ρ) is asymptotically equivalent to n γ (I p Φ)−1 γ /σ 4 . Regarding the pF statistic given in (e) above, consider the expression for B22 given in (f) above. Since the j -th element of n1 X E is µj defined right below (2.10.19) on p. 147, we have
→
1 1 E X S−1 XE , n n
s2 Φ =
so
xx
B22 =
1 EE n
As shown in (b), n1 E E p σ 2 I p . Therefore, B22 cally equivalent to n γ (I p Φ)−1 γ /σ 4 .
1
−
s2 Φ
.
1 p σ2 (I p
Φ)−1 , and pF is asymptoti-
12. The hints are almost as good as the answer. Here, we give solutions to (b) and (c) only. (b) We only prove the first convergence result. 1 n
r
xt xt =
t=1
r n
1 r
r
xt xt = λ
t=1
The term in parentheses converges in probability to Σ (c) We only prove the first convergence result. r
√ 1 n
t=1
xt εt =
·
xx
1 r
r
xt xt
.
t=1
as n (and hence r) goes to infinity.
√ · √ √ · r n
1 r
r
xt εt =
t=1
λ
1 r
r
xt εt
.
t=1
The term in parentheses converges in distribution to N (0, σ 2 Σ ) as n (and hence r) goes to infinity. So the whole expression converges in distribution to N (0, λ σ 2 Σ ). xx
xx
8
December 27, 2003
Hayashi Econometrics
Solution to Chapter 3 Analytical Exercises 1. If A is symmetric and idempotent, then A = A and AA = A. So x Ax = x AAx = x A Ax = z z 0 where z Ax.
≥
≡
2. (a) By assumption, xi , εi is jointly stationary and ergodic, so by ergodic theorem the first term of ( ) converges almost surely to E( x2i ε2i ) which exists and is finite by Assumption 3.5. (b) zi x2i εi is the product of x i εi and x i zi . By using the Cauchy-Schwarts inequality, we obtain
{
∗
}
|≤
E(x2i ε2i ) E(x2i zi2 ).
E( xi εi xi zi )
|
·
E(x2i ε2i ) exists and is finite by Assumption 3.5 and E( x2i zi2 ) exists and is finite by Assumption 3.6. Therefore, E( xi zi xi εi ) is finite. Hence, E(xi zi xi εi ) exists and is finite. (c) By ergodic stationarity the sample average of z i x2i εi converges in probability to some finite number. Because δ is consistent for δ by Proposition 3.1, δ δ converges to 0 in probability. Therefore, the second term of ( ) converges to zero in probability. (d) By ergodic stationarity and Assumption 3.6 the sample average of z i2 x2i converges in probability to some finite number. As mentioned in (c) δ δ converges to 0 in probability. Therefore, the last term of ( ) vanishes.
|
·
|
·
−
∗
−
∗
3. (a) Q
≡
= =
−1 Σxz WΣxz xz WΣxz (Σxz WSWΣxz ) −1 −1 Σxz C CΣxz C WΣxz )−1 Σxz WΣxz xz WΣxz (Σxz WC H H Σxz WΣxz (G G)−1 Σxz WΣxz −1
1
−
Σxz S
Σxz
−Σ −Σ
− H H − H G(G G) G H H [I − G(G G) G ]H
= = K = H MG H.
1
−
(b) First, we show that M G is symmetric and idempotent. MG
MG MG
= = = =
−
−
−
−
1
IK
IK
−
1
−
MG .
= IK IK G(G G) 1 G IK = IK G(G G) 1 G = MG .
−
1
− G(G(G G) ) − G((G G) G ) − G(G G) G
IK
− I
K
G(G G)
1
−
1
−
G + G(G G)
Thus, MG is symmetric and idempotent. For any L-dimensional vector x ,
x Qx
= x H MG Hx = z MG z (where z Hx) 0 (since M G is positive semidefinite) .
≡
≥
Therefore, Q is positive semidefinite. 1
1
−
G G(G G)
G
4. (the answer on p. 254 of the book simplified) If W is as defined in the hint, then
WSW = W
1
−
and Σxz WΣxz = Σzz A
Σzz .
So (3.5.1) reduces to the asymptotic variance of the OLS estimator. By (3.5.11), it is no smaller than (Σxz S 1 Σxz ) 1 , which is the asymptotic variance of the efficient GMM estimator.
−
−
− − − − − − −
5. (a) From the expression for δ (S 1 ) (given in (3.5.12)) and the expression for gn (δ ) (given in (3.4.2)), it is easy to show that g n (δ (S 1 )) = Bsxy . But Bsxy = Bg because −
−
1
Sxz )
−
1
Sxz )
−
−
−
Bsxy = (IK
Sxz (Sxz S
= (IK
Sxz (Sxz S
Sxz S
1
Sxz S
Sxz )
1
= C C, we obtain B S
1
1
1
−
= (Sxz
Sxz (Sxz S
= (Sxz
Sxz )δ + Bg
−
1
)sxy
1
)(Sxz δ + g)
(since yi = zi δ + εi )
1
Sxz (Sxz S
−
−
−
Sxz S
Sxz )δ + ( IK
1
−
Sxz )
1
−
= Bg.
(b) Since S
1
−
CB
−
1
−
)g
B = B C CB = (CB) (CB). But
= C(IK Sxz (Sxz S 1 Sxz ) 1 Sxz S 1 ) = C CSxz (Sxz C CSxz ) 1 Sxz C C = C A(A A) 1 A C (where A CSxz ) = [IK A(A A) 1 A ]C
Sxz S
− −
≡
−
−
−
−
−
−
≡
MC.
So B S 1 B = (MC) (MC) = C M MC. It should be routine to show that M is symmetric and idempotent. Thus B S 1 B = C MC. The rank of M equals its trace, which is
−
−
trace(M) = trace(IK A(A A) 1 A ) = trace(IK ) trace(A(A A)
−
− − A) ) − trace(A A(A A) )
= trace(IK = K trace(IL ) = K L.
− −
1
−
1
−
(c) As defined in (b), C C = S 1 . Let D be such that D D = S 1 . The choice of C and D is not unique, but it would be possible to choose C so that plim C = D. Now,
−
v
−
≡ √ n(Cg) = C(√ n g).
By using the Ergodic Stationary Martingale Differences CLT, we obtain So
√ → N (0, Avar(v))
v = C( n g)
d
where Avar(v) = = = = 2
DSD
D(D D) 1
−
DD IK .
1
−
1
−
D
D
D
√ n g →
d
N (0, S).
(d)
J (δ (S
1
−
1
−
), S
1
1
−
) = n gn (δ (S
−
1
−
· )) S g (δ (S )) n · (Bg) S (Bg) (by (a)) n · g B S Bg n · g C MCg (by (b)) √ v Mv (since v ≡ nCg). 1
= = = =
n
−
1
−
Since v d N (0, IK ) and M is idempotent, v Mv is asymptotically chi-squared with degrees of freedom equaling the rank of M = K L.
→
1
6. From Exercise 5, J = n g B S
·
−
−
Bg. Also from Exercise 5, Bg = Bsxy .
7. For the most parts, the hints are nearly the answer. Here, we provide answers to (d), (f), (g), (i), and (j). (d) As shown in (c), J 1 = v1 M1 v1 . It suffices to prove that v1 = C1 F C
≡ √ nC g √ = nC F g √ = nC F C Cg √ nCg =C FC
v1
1
v.
1
1
1
1
1
1
−
−
1
−
−
1
= C1 F C
v (since v
≡ √ nCg).
(f) Use the hint to show that A D = 0 if A1 M1 = 0. It should be easy to show that A1 M1 = 0 from the definition of M1 .
(g) By the definition of M in Exercise 5, MD = D A(A A) 1 A D. So MD = D since A D = 0 as shown in the previous part. Since both M and D are symmetric, DM = D M = (MD) = D = D. As shown in part (e), D is idempotent. Also, M is idempotent as shown in Exercise 5. So (M D)2 = M 2 DM MD + D2 = M D. As shown in Exercise 5, the trace of M is K L. As shown in (e), the trace of D is K 1 L. So the trace of M D is K K 1 . The rank of a symmetric and idempotent matrix is its trace.
−
−
−
− −
−
−
−
−
−
g C DCg = g FC1 M1 C1 F g
1
−
1
−
= g1 B1 (S11 )
1
−
B1 F g (since C1 M1 C1 = B1 (S11 )
B1 g1 (since g1 = F g).
From the definition of B1 and the fact that sx B1 sx y . So
1
y
B1 from (a))
= Sx z δ + g1 , it follows that B1 g1 = 1
1
1
−
g1 B1 (S11 )
1
−
B1 g1 = sx y B1 (S11 ) 1
B1 sx
1
−
= sxy FB1 (S11 )
1
y
B1 F sxy
(since s x y = F sxy ) 1
= sxy C DCsxy .
3
1
−
= sxy FC1 M1 C1 F sxy (since B1 (S11 )
B.
(C DC = FC1 M1 C1 F by the definition of D in (d))
= g FB1 (S11 )
1
−
(i) It has been shown in Exercise 6 that g C MCg = sxy C MCsxy since C MC = B S Here, we show that g C DCg = sxy C DCsxy .
B1 = C1 M1 C1 from (a))
(j) M
− D is positive semi-definite because it is symmetric and idempotent.
8. (a) Solve the first-order conditions in the hint for δ to obtain 1 (S WSxz ) 2n xz
−
δ = δ (W)
1
−
R λ.
Substitute this into the constraint Rδ = r to obtain the expression for λ in the question. Then substitute this expression for λ into the above equation to obtain the expression for δ in the question. (b) The hint is almost the answer.
· − − √ √ − √ −
(c) What needs to be shown is that n (δ (W) δ ) (Sxz WSxz )(δ (W) δ ) equals the Wald statistic. But this is immediate from substitution of the expression for δ in (a). 9. (a) By applying (3.4.11), we obtain n(δ 1
δ)
n(δ 1
δ)
(Sxz W1 Sxz )
=
1
Sxz W1
1
Sxz W2
−
−
(Sxz W2 Sxz )
ng.
By using Billingsley CLT, we have
√ ng → N (0, S). d
Also, we have
→
(Sxz W1 Sxz )
√ √ − → − n(δ 1 n(δ 1
δ)
d
δ)
=
(b)
Sxz W1
1
Sxz W2
−
(Sxz W2 Sxz )
Therefore, by Lemma 2.4(c),
1
−
p
Q1 1 Σxz W1 . Q2 1 Σxz W2 −
−
N 0, N 0,
A11 A21
−
−
A12 A22
−
−
.
√ nq can be rewritten as √ nq = √ n(δ − δ ) = √ n(δ − δ) − √ n(δ − δ) =
1
2
1
Therefore, we obtain
2
− √ √ −− 1
1
n(δ 1
δ)
n(δ 2
δ)
√ nq → N (0, Avar(q)). d
where
−
Avar(q) = 1
1
. Q1 1 Σxz W1 1 . ( . W2 Σxz Q2 1 ) S W Σ Q xz 1 1 Q2 1 Σxz W2
A11 A21
A12 A22
1 = A11 + A22 1
−
4
−A −A 12
21 .
.
1
(c) Since W2 = S
−
, Q 2 , A12 , A 21 , and A22 can be rewritten as follows:
= Σxz W2 Σxz = Σxz S 1 Σxz ,
Q2
−
= Q1 1 Σxz W1 S S 1 Σxz Q2 1 = Q1 1 (Σxz W1 Σxz )Q2 1 = Q1 1 Q1 Q2 1 = Q2 1 ,
−
A12
−
−
−
−
−
−
−
= Q2 1 Σxz S = Q2 1 , −
A21
1
−
SW1 Σxz Q1 1 −
−
1
Σxz )
1
Σxz )
−
−
= (Σxz S = (Σxz S = Q2 1 .
A22
1
−
Σxz S
1
−
1
−
SS
Σxz (Σxz S
1
−
1
−
Σxz )
1
−
−
Substitution of these into the expression for Avar( q) in (b), we obtain Avar(q) =
A11
= A11
1
−
−Q − (Σ
2
= Avar(δ (W1 )) 10. (a)
≡ E(x z )
σxz
i
i
1
−
xz S
1
−
Σxz )
− Avar(δ(S
1
−
)).
= E(xi (xi β + vi )) = β E(x2i ) + E(xi vi ) = βσ x2 = 0 (by assumptions (2), (3), and (4)).
−
(b) From the definition of δ , δ
δ =
1
n
1
−
n
xi zi
n
1
n
i=1
i=1
xi εi = s xz1 −
1 n
n
xi εi .
i=1
We have xi zi = xi (xi β + v i ) = x2i β + xi vi , which, being a function of (xi , η i ), is ergodic stationary by assumption (1). So by the Ergodic theorem, sxz p σxz . Since σxz = 0 by (a), we have s xz1 p σ xz1 . By assumption (2), E( xi εi ) = 0. So by assumption (1), we have n 1 xi εi p 0. Thus δ δ p 0. i=1 n −
(c)
→
→
sxz
→
−
− → ≡ 1
n
=
1
n
n
xi zi
i=1 n
(x2i β + xi vi )
i=1
1 1
n
n
1
2
√ n n x + n → 0 · E(x ) + E(x v ) =
i
i=1
=
i=1
2
p
i
i
0
5
i
xi vi (since β =
√ 1n )
(d)
√ ns
xz
=
n
1 n
n
√ 1
2
xi +
n
i=1
xi vi .
i=1
By assumption (1) and the Ergodic Theorem, the first term of RHS converges in probability to E(x2i ) = σ x2 > 0. Assumption (2) and the Martingale Differences CLT imply that n
√ 1
n
xi vi
i=1
→ a ∼ N (0, s
22
d
).
Therefore, by Lemma 2.4(a), we obtain
√ ns → σ + a. xz
−
(e) δ
2
d
x
δ can be rewritten as
− δ
√
δ = ( nsxz )
1
−
√ ng . 1
From assumption (2) and the Martingale Differences CLT, we obtain
√ ng → b ∼ N (0, s 1
d
11
).
where s11 is the (1, 1) element of S . By using the result of (d) and Lemma 2.3(b),
− → √ √ √ δ
δ
d
(σx2 + a)
1
−
b.
(a, b) are jointly normal because the joint distribution is the limiting distribution of n g =
−
ng 1
n( n1
n
i=1
xi vi )
(f) Because δ δ converges in distribution to ( σx2 + a)
1
−
6
.
b which is not zero, the answer is No.
January 8, 2004, answer to 3(c)(i) simplified, February 23, 2004
Hayashi Econometrics
Solution to Chapter 4 Analytical Exercises
1. It should be easy to show that Amh = n1 Zm PZh and that cmh = n1 Zm Pyh . Going back to the formula (4.5.12) on p. 278 of the book, the first matrix on the RHS (the matrix to be inverted) is a partitioned matrix whose ( m, h) block is Amh . It should be easy to see that it equals 1 n
1
−
Z (Σ
1
−
[Z (Σ n
⊗ P)y.
1
⊗ P)Z]. Similarly, the second matrix on the RHS of (4.5.12) equals
2. The sprinkled hints are as good as the answer. 3. (b) (amplification of the answer given on p. 320) In this part only, for notational brevity, let zi be a m Lm × 1 stacked vector collecting ( zi1 , . . . , ziM ).
E(εim | Z) = E( εim | z1 , z2 , . . . , zn ) (since Z collects z i ’s) = E( εim | zi ) (since (εim , zi ) is independent of zj ( j = i )) =0 (by the strengthened orthogonality conditions). The (i, j ) element of the n × n matrix E(εm εh | Z) is E(εim εjh | Z). E(εim εjh | Z) = E(εim εjh | z1 , z2 , . . . , zn ) = E( εim εjh | zi , zj ) (since (εim , zi , εjh , zj ) is independent of z k (k = i, j )). For j = i , this becomes E(εim εjh | zi , zj ) = E [E(εim εjh | zi , zj , εjh ) | zi , zj ] = E [εjh E(εim | zi , zj , εjh ) | zi , zj ]
(by the Law of Iterated Expectations) (by linearity of conditional expectations)
= E [εjh E(εim | zi ) | zi , zj ] (since (εim , zi ) is independent of (εjh , zj )) =0 (since E(εim | zi ) = 0). For j = i , E(εim εjh | Z) = E(εim εih | Z) = E(εim εih | zi ). Since xim = xi and xi is the union of ( zi1 , . . . , ziM ) in the SUR model, the conditional homoskedasticity assumption, Assumption 4.7, states that E( εim εih | zi ) = E(εim εih | xi ) = σ mh . (c)
(i) We need to show that Assumptions 4.1-4.5, 4.7 and (4.5.18) together imply Assumptions 1.1-1.3 and (1.6.1). Assumption 1.1 (linearity) is obviously satisfied. Assumption 1.2 (strict exogeneity) and (1.6.1) have been verified in 3(b). That leaves Assumption 1.3 (the rank condition that Z (defined in Analytical Exercise 1) be of full column rank). Since Z is block diagonal, it suffices to show that Zm is of full column rank for m = 1, 2, . . . , M . The proof goes as follows. By Assumption 4.5, 1
S is non-singular. By Assumption 4.7 and the condition (implied by (4.5.18)) that the set of instruments be common across equations, we have S = Σ ⊗ E(xi xi ) (as in (4.5.9)). So the square matrix E( xi xi ) is non-singular. Since n1 X X (where X is the n × K data matrix, as defined in Analytical Exercise 1) converges almost surely to E(xi xi ), the n × K data matrix X is of full column rank for sufficiently large n. Since Zm consists of columns selected from the columns of X, Zm is of full column
rank as well. (ii) The hint is the answer. (iii) The unbiasedness of δSUR follows from (i), (ii), and Proposition 1.7(a). (iv) Avar(δSUR ) is (4.5.15) where Amh is given by (4.5.16 ) on p. 280. The hint shows that it equals the plim of n · Var(δSUR | Z).
(d) For the most part, the answer is a straightforward modification of the answer to (c). The only part that is not so straightforward is to show in part (i) that the M n × L matrix Z is of full column rank. Let Dm be the Dm matrix introduced in the answer to (c), so zim = D m xi and Zm = XD m . Since the dimension of xi is K and that of zim is L, the M matrix Dm is K × L. The m=1 K m × L matrix Σxz in Assumption 4.4 can be written as
D1
Σxz
(KM ×L)
= [ IM ⊗ E(xi xi )]D where
D
(KM ×L)
≡
.. .
.
DM
Since Σxz is of full column rank by Assumption 4.4 and since E(xi xi ) is non-singular, D is of full column rank. So Z = (IM ⊗ X)D is of full column rank if X is of full column rank. X is of full column rank for sufficiently large n if E(xi xi ) is non-singular. 4. (a) Assumptions 4.1-4.5 imply that the Avar of the efficient multiple-equation GMM estimator is (Σxz S−1 Σxz )−1 . Assumption 4.2 implies that the plim of Sxz is Σxz . Under Assumptions 4.1, 4.2, and 4.6, the plim of S is S . (b) The claim to be shown is just a restatement of Propositions 3.4 and 3.5.
(c) Use (A9) and (A6) of the book’s Appendix A. Sxz and W are block diagonal, so WSxz (Sxz WSxz )−1 is block diagonal. (d) If the same residuals are used in both the efficient equation-by-equation GMM and the efficient multiple-equation GMM, then the S in (∗∗) and the S in (Sxz S−1 Sxz )−1 are numerically the same. The rest follows from the inequality in the question and the hint.
(e) Yes. (f) The hint is the answer. 5. (a) For the LW 69 equation, the instruments (1 , MED ) are 2 in number while the number of the regressors is 3. So the order condition is not satisfied for the equation. (b) (reproducing the answer on pp. 320-321)
1 1
E(S69 ) E(IQ ) E(S80 ) E(IQ ) E(MED ) E(S69 · MED ) E(IQ · MED ) E(MED ) E(S80 · MED ) E(IQ · MED )
E(LW69 ) β 0 E(LW80 ) β 1 = . E(LW69 · MED ) β 2 E(LW80 · MED )
The condition for the system to be identified is that the 4 × 3 coefficient matrix is of full column rank. 2
(c) (reproducing the answer on p. 321) If IQ and MED are uncorrelated, then E( IQ · MED ) = E(IQ ) · E(MED ) and the third column of the coefficient matrix is E( IQ ) times the first column. So the matrix cannot be of full column rank.
6. (reproducing the answer on p. 321) εim = y im − zim δm = ε im − zim (δm − δm ). So 1 n
n
[εim − zim (δm − δ m )][εih − zih (δh − δ h )] = (1) + (2) + (3) + (4) ,
i=1
where (1) =
n
1 n
εim εih ,
i=1
(2) = −(δm − δm )
n
(4) = (δm − δ m )
zih · εim ,
n
zim · εih ,
i=1 n
1
(3) = −(δ h − δh )
n
1
i=1
n
1
n
zim zih (δ h − δ h ).
i=1
As usual, under Assumption 4.1 and 4.2, (1) →p σ mh (≡ E(εim εih )). For (4), by Assumption 4.2 and the assumption that E(zim zih ) is finite, zim zih converges in probability to a (finite) matrix. So (4) →p 0. Regarding (2), by Cauchy-Schwartz, E(|zimj · εih |) ≤
1 n
i
·
2 ) · E(ε2 ), E(zimj ih
where z imj is the j -th element of zim . So E(zim · εih ) is finite and (2) →p 0 because δm − δ m → p 0. Similarly, (3) →p 0.
7. (a) Let B, Sxz , and W be as defined in the hint. Also let
1
n
=
sxy (MK ×1)
.. .
n i=1
1
n
Then
n i=1
xi · yi1
xi · yiM
.
1
−
δ 3SLS =
Sxz WSxz
1
−
= (I ⊗ B )(Σ −
= Σ
1
Sxz Wsxy
−
1 ⊗ S− )(I ⊗ B) xx
1 B ⊗ B S− xx
1
−
1 = Σ ⊗ (B S− B)−1 xx
1
−
Σ
1
−
Σ
1
1 sxy ⊗ B S− xx
1 ⊗ B S− sxy xx
1 1 B)−1 B S− sxy = IM ⊗ (B S− xx xx
=
1 B)−1 B S−1 (B S− xx xx .. . −
xx
n i=1
xi · yi1
,
1 B)−1 B S−1 1
(B Sxx
1
n
n
3
n i=1
−
(I ⊗ B )(Σ
xi · yiM
1
1 ⊗ S− )sxy xx
which is a stacked vector of 2SLS estimators. (b) The hint is the answer. 8. (a) The efficient multiple-equation GMM estimator is
Sxz S−1 Sxz
1
−
Sxz S−1 sxy ,
where S xz and s xy are as defined in (4.2.2) on p. 266 and S−1 is a consistent estimator of S. Since xim = z im here, Sxz is square. So the above formula becomes
1 S− S Sxz xz
−
1
1 Sxz S−1 sxy = S − s , xz xy
which is a stacked vector of OLS estimators.
(b) The SUR is efficient multiple-equation GMM under conditional homoskedasticity when the set of orthogonality conditions is E(zim · εih ) = 0 for all m, h. The OLS estimator derived above is (trivially) efficient multiple-equation GMM under conditional homoskedasticity when the set of orthogonality conditions is E(zim · εim ) = 0 for all m. Since the sets of orthogonality conditions differ, the efficient GMM estimators differ. 9. The hint is the answer (to derive the formula in (b) of the hint, use the SUR formula you derived in Analytical Exercise 2(b)).
1 10. (a) Avar(δ1,2SLS ) = σ 11 A− 11 .
(b) Avar(δ1,3SLS ) equals G −1 . The hint shows that G =
1 σ11
A11 .
11. Because there are as many orthogonality conditions as there are coefficients to be estimated, it is possible to choose δ so that gn (δ ) defined in the hint is a zero vector. Solving
n
1
n
zi1 ·yi1 + · · · +
i=1
1 n
n
ziM ·yiM −
i=1
1
n
n
zi1 zi1 + · · · +
i=1
1 n
n
ziM ziM δ = 0
i=1
for δ, we obtain
δ =
1
n
n
i=1
zi1 zi1 + · · · +
1 n
n
i=1
1
−
ziM ziM
1
n
which is none other than the pooled OLS estimator.
4
n
i=1
zi1 ·yi1 + · · · +
1 n
n
i=1
ziM ·yiM ,
January 9, 2004
Hayashi Econometrics
Solution to Chapter 5 Analytical Exercises 1. (a) Let ( a , b ) be the OLS estimate of ( α , β ) . Define MD as in equation (4) of the hint. By the Frisch-Waugh theorem, b is the OLS coefficient estimate in the regression of M D y on MD F. The proof is complete if we can show the claim that
y = M D y and F = M D F,
where y and F are defined in (5.2.2) and (5.2.3). This is because the fixed-effects estimator can be written as (F F)1 F y (see (5.2.4)). But the above claim follows immediately if we 1 can show that MD = I n ⊗ Q, where Q ≡ IM − M 1M 1M , the annihilator associated with 1M . 1
−
MD = I Mn − (In ⊗ 1M ) [(In ⊗ 1M ) (In ⊗ 1M )] 1
−
= I Mn − (In ⊗ 1M ) [(In ⊗ 1M 1M )]
(In ⊗ 1M )
(In ⊗ 1M )
= I Mn − (In ⊗ 1M ) [(In ⊗ M )]−1 (In ⊗ 1M ) 1 = I Mn − (In ⊗ 1M )(In ⊗ )(In ⊗ 1M ) M
= I Mn − (In ⊗
1 M
1M 1M )
= (In ⊗ IM ) − (In ⊗ = (In ⊗ (IM −
1 M
1 M
1M 1M )
1M 1M ))
= I n ⊗ Q. (b) As indicated in the hint to (a), we have a = (D D)−1 (D y − D Fb). It should be straightforward to show that
1M y1
D D = M I n ,
D y =
.. .
1M F1 b
,
D Fb =
1M yn
Therefore,
a =
.. .
.
1M Fn b
1
(1M y1 − 1M F1 b) M .. . . 1 M
(1M yn − 1M Fn b)
The desired result follows from this because b equals the fixed-effects estimator βFE and f i1
1M yi = (yi1 + · · · + yiM ) and 1M Fn b = 1 M
.. .
f iM
1
M
f im
b =
m=1
b.
(c) What needs to be shown is that (3) and conditions (i)-(iv) listed in the question together imply Assumptions 1.1-1.4. Assumption 1.1 (linearity) is none other than (3). Assumption 1.3 is a restatement of (iv). This leaves Assumptions 1.2 (strict exogeneity) and Assumption 1.4 (spherical error term) to be verified. The following is an amplification of the answer to 1.(c) on p. 363. E(ηi | W) = E(ηi | F) (since D is a matrix of constants) = E(ηi | F1 , . . . , Fn ) = E(ηi | Fi ) (since (ηi , Fi ) is indep. of F j for j = i ) by (i) = 0
(by (ii)).
Therefore, the regressors are strictly exogenous (Assumption 1.2). Also, E(ηi ηi | W) = E(ηi ηi | F) = E(ηi ηi | Fi ) = σ η2 IM (by the spherical error assumption (iii)). For i = j , E(η i ηj | W) = E(ηi ηj | F) = E( ηi ηj | F1 , . . . , Fn ) = E( ηi ηj | Fi , Fj )
(since (ηi , Fi , ηj , Fj ) is indep. of Fk for k = i, j by (i))
= E[E(η i ηj | Fi , Fj , ηi ) | Fi , Fj ] = E[ ηi E(ηj | Fi , Fj , ηi ) | Fi , Fj ] = E[ ηi E(ηj | Fj ) | Fi , Fj ] = 0
(since (ηj , Fj ) is independent of (ηi , Fi ) by (i))
(since E(ηj | Fj ) by (ii)).
So E(ηη | W) = σ η2 IMn (Assumption 1.4). Since the assumptions of the classical regression model are satisfied, Propositions 1.1 holds for the OLS estimator ( a, b). The estimator is unbiased and the Gauss-Markov theorem holds. As shown in Analytical Exercise 4.(f) in Chapter 1, the residual vector from the original regression (3) (which is to regress y on D and F) is numerically the same as the residual vector from the regression of y (= M D y) on F (= M D F)). So the two SSR ’s are the same.
2. (a) It is evident that C 1M = 0 if C is what is referred to in the question as the matrix of first differences. Next, to see that C 1M = 0 if C is an M × (M − 1) matrix created by dropping one column from Q, first note that by construction of Q , we have: 1M =
Q (M ×M )
0
,
(M ×1)
which is a set of M equations. Drop one row from Q and call it C and drop the corresponding element from the 0 vector on the RHS. Then C
((M −1)×M )
1M =
0
.
((M −1)×1)
(b) By multiplying both sides of (5.1.1 ) on p. 329 by C , we eliminate 1 M · bi γ and 1M · αi . 2
(c) Below we verify the five conditions. • The random sample condition is immediate from (5.1.2). • Regarding the orthogonality conditions, as mentioned in the hint, (5.1.8b) can be written as E(ηi ⊗ xi ) = 0 . This implies the orthogonality conditions because E(ηi ⊗ xi ) = E[(C ⊗ IK )(ηi ⊗ xi )] = (C ⊗ IK ) E(ηi ⊗ xi ).
• As shown on pp. 363-364, the identification condition to be verified is equivalent to (5.1.15) (that E( QFi ⊗ xi ) be of full column rank). • Since ε i = 1 M · αi + η i , we have ηi ≡ C ηi = C εi . So ηi ηi = C εi εi C and
E(ηi ηi | xi ) = E(C εi εi C | xi ) = C E(εi εi | xi )C = C ΣC.
(The last equality is by (5.1.5).) • By the definition of gi , we have: gi gi = ηi ηi ⊗ x i xi . But as just shown above, η i η i = C εi εi C. So
gi gi = C εi εi C ⊗ xi xi = (C ⊗ IK )(εi εi ⊗ xi xi )(C ⊗ IK ).
Thus
E(gi gi ) = (C ⊗ IK )E[(εi εi ⊗ xi xi )](C ⊗ IK ) = ( C ⊗ IK ) E(gi gi )(C ⊗ IK )
(since g i ≡ εi ⊗ xi ).
Since E(gi gi ) is non-singular by (5.1.6) and since C is of full column rank, E(gi gi ) is non-singular.
(d) Since Fi ≡ C Fi , we can rewrite Sxz and s xy as
n
1
Sxz = (C ⊗ IK )
n
i=1
Sxz WSxz =
= =
1
n
1
n
1
n
n
Fi ⊗ xi (C ⊗ IK ) (C C)
i=1 n
C(C C)−1 C ⊗
Fi ⊗ xi
i=1 n
F i ⊗ xi
Q⊗
i=1
n
1
n
1
−
i=1
(since C(C C)
n
1
−
xi xi
1
(C ⊗ IK )
i=1
−
1
Fi ⊗ xi
i=1
F i ⊗ xi
i=1
C = Q , as mentioned in the hint).
Sxz Wsxy =
1
n
n
F i ⊗ xi
Q⊗
i=1
3
n
n
Similarly,
1
n
1
n
1
n
i=1
−
1
−
xi xi
n
i=1
n
1
xi xi
⊗
yi ⊗ xi .
n
n
1
n
1
Fi ⊗ xi , sxy = (C ⊗ IK )
So
1
n
n
xi xi
i=1
1
−
1
n
n
yi ⊗ xi .
i=1
n
F i ⊗ xi
i=1
Noting that f im is the m-th row of Fi and writing out the Kronecker products in full, we obtain
M
Sxz WSxz =
M
q mh
m=1 h=1 M
Sxz Wsxy =
M
q mh
m=1 h=1
n
1
f im xi
n
n
i=1 n
1
f im xi
n
i=1
1
1
−
xi xi
n
i=1 n
1
n
n
1
1
1
−
xi xi
n
i=1
n
xi f ih
,
i=1 n
xi · yih
,
i=1
where q mh is the (m, h) element of Q. (This is just (4.6.6) with xim = xi , zim = f im , 1
W = Q ⊗
n
n i=1
1
−
xi xi
xi “dissappears”. So
M
.) Since x i includes all the elements of F i , as noted in the hint, M
Sxz WSxz =
q mh
m=1 h=1 M
M
Sxz Wsxy =
q mh
m=1 h=1
n
1
f im f ih =
n
n
1
n
i=1
f im · yih =
i=1
n
1
n
1
n
M
M
i=1
m=1 h=1
n
M
q mh f imf ih ,
M
q mh f im · yih .
m=1 h=1
i=1
Using the “beautifying” formula (4.6.16b), this expression can be simplified as
Sxz WSxz = Sxz Wsxy =
1 n
1 n
n
Fi QFi ,
i=1 n
Fi Qyi .
i=1
1
−
So Sxz WSxz
Sxz Wsxy is the fixed-effects estimator.
(e) The previous part shows that the fixed-effects estimator is not efficient because the W in (10) does not satisfy the efficiency condition that plim W = S−1 . Under conditional homoskedasticity, S = E(ηi η i ) ⊗ E(xi xi ). Thus, with Ψ being a consistent estimator of E(η i ηi ), the efficient GMM estimator is given by setting 1
−
W = Ψ
⊗
1
n
n
1
−
xi xi
.
i=1
This is none other than the random-effects estimator applied to the system of M − 1 equations (9). By setting Zi = Fi , Σ = Ψ, yi = yi in (4.6.8 ) and (4.6.9 ) on p. 293, we obtain (12) and (13) in the question. It is shown on pp. 292-293 that these “beautified” formulas are numerically equivalent versions of (4.6.8) and (4.6.9). By Proposition 4.7, the random-effects estimator (4.6.8) is consistent and asymptotically normal and the asymptotic variance is given by (4.6.9). As noted on p. 324, it should be routine to show that those conditions verified in (c) above are sufficient for the hypothesis of Proposition 4.7. In particular, the Σ xz referred to in Assumption 4.4 can be written as E( Fi ⊗ xi ). In (c), we’ve verified that this matrix is of full column rank.
(f) Proposition 4.1, which is about the estimation of error cross moments for the multipleequation model of Section 4.1, can easily be adapted to the common-coefficient model of Section 4.6. Besides linearity, the required assumptions are (i) that the coefficient estimate 4
(here βFE ) used for calculating the residual vector be consistent and (ii) that the cross moment between the vector of regressors from one equation (a row from Fi ) and those from another (another row from Fi ) exist and be finite. As seen in (d), the fixed-effects estimator βFE is a GMM estimator. So it is consistent. As noted in (c), E(xi xi ) is non-singular. Since xi contains all the elements of F i , the cross moment assumption is satisfied. (g) As noted in (e), the assumptions of Proposition 4.7 holds for the present model in question. It has been verified in (f) that Ψ defined in (14) is consistent. Therefore, Proposition 4.7(c) holds for the present model. (h) Since ηi ≡ C ηi , we have E(ηi ηi ) = E (C ηi ηi C) = ση2 C C (the last equality is by (15)). By setting Ψ = ση2 C C in the expression for W in the answer to (e) (thus setting
n i=1
1
W = ση2 C C ⊗
n
1
−
xi xi
), the estimator can be written as a GMM estimator
(Sxz WSxz )−1 Sxz Wsxy . Clearly, it is numerically equal to the GMM estimator with n i=1 xi xi
1
W = C C ⊗
n
1
−
, which, as was verified in (d), is the fixed-effects estimator.
(i) Evidently, replacing C by B ≡ CA in (11) does not change Q. So the fixed-effects estimator is invariant to the choice of C . To see that the numerical values of (12) and (13) ˇ i ≡ B Fi and y ˇ i ≡ B yi . That is, the original M are invariant to the choice of C, let F equations (5.1.1 ) are transformed into M − 1 equations by B = CA, not by C. Then ˇ i = A Fi and y ˇ is the estimated error cross moment matrix when (14) is ˇ i = A yi . If Ψ F ˇ i replacing Fi , then we have: Ψ ˇ = A ΨA. So ˇ i replacing yi and F used with y
ˇΨ ˇ F i
1
−
−
ˇ i = F A(A ΨA)−1 A Fi = F AA−1 Ψ F i i
−1 ˇΨ ˇ −1 y ˇ i = Fi Ψ yi . Similarly, F i
1
−
1
(A )−1 A Fi = Fi Ψ
Fi .
3. From (5.1.1 ), vi = C (yi − Fi β ) = C ηi . So E(vi vi ) = E(C ηi ηi C) = C E(ηi ηi )C = ση2 C C. By the hint, plim
SSR = trace (C C)−1 ση2 C C = σ η2 trace[IM −1 ] = σ η2 · (M − 1). n
4. (a) bi is absent from the system of M equations (or bi is a zero vector). yi1
yi =
.. .
, Fi =
yiM
yi0
.. .
.
yi,M −1
(b) Recursive substitution (starting with a substitution of the first equation of the system into the second) yields the equation in the hint. Multiply both sides of the equation by η ih and take expectations to obtain E(yim · ηih ) = E(ηim · ηih ) + ρ E(ηi,m−1 · ηih ) + · · · + ρm−1 E(ηi1 · ηih ) 1 − ρm + E(αi · ηih ) + ρm E(yi0 · ηih ) 1−ρ = E(ηim · ηih ) + ρ E(ηi,m−1 · ηih ) + · · · + ρm−1 E(ηi1 · ηih ) (since E(αi · ηih ) = 0 and E(yi0 · ηih ) = 0) =
ρm−h ση2
0
if h = 1, 2, . . . , m , if h = m + 1, m + 2, . . . . 5
(c) That E(yim · ηih ) = ρm−h ση2 for m ≥ h is shown in (b). Noting that Fi here is a vector, not a matrix, we have: E(Fi Qη i ) = E[trace(Fi Qηi )] = E[trace(ηi Fi Q)] = trace[E(ηi Fi )Q] 1
= trace[E(ηi Fi )(IM − = trace[E(ηi Fi )] − = trace[E(ηi Fi )] −
1
M
11 )]
trace[E(ηi Fi )11 ]
M
1 M
1 E(η i Fi )1.
By the results shown in (b), E( ηi Fi ) can be written as
E(ηi Fi ) = σ η2
0 0 .. .
1 0 .. .
ρ
1 .. .
0 · ·· 0 · ·· 0 · ··
ρ2 ρ
..
· · · ρM −2 · · · ρM −3 .. ··· .
. · ·· 0 1 · ·· · ·· 0 · ·· · ·· · ··
ρ
1 0
So, in the above expression for E(Fi Qη i ), trace[E( ηi Fi )] = 0 and
.
1 E(η i Fi )1 = sum of the elements of E( η i Fi )
= sum of the first row + · · · + sum of the last row = σ η2
1 − ρM −1 1 − ρM −2 1−ρ + + ··· + 1−ρ 1−ρ 1−ρ
M − 1 − M ρ + ρ M . = σ η (1 − ρ)2 2
(d) (5.2.6) is violated because E( f im · ηih ) = E(yi,m−1 · ηih ) = 0 for h ≤ m − 1. 5. (a) The hint shows that
E(Fi Fi ) = E(QFi ⊗ xi ) IM ⊗ E(xi xi )
1
−
E(QFi ⊗ xi ).
By (5.1.15), E(QFi ⊗ xi ) is of full column rank. So the matrix product above is nonsingular. (b) By (5.1.5) and (5.1.6 ), E(εi εi ) is non-singular.
(c) By the same sort of argument used in (a) and (b) and noting that Fi ≡ C Fi , we have
E(Fi Ψ−1 Fi ) = E(C Fi ⊗ xi ) Ψ−1 ⊗ E(xi xi )
1
−
We’ve verified in 2(c) that E( C Fi ⊗ xi ) is of full column rank.
6
E(C Fi ⊗ xi ).
6. This question presumes that
xi =
f i1
.. .
f iM bi
and f im = A m xi .
(a) The m-th row of Fi is f im and f im = x i Am .
(b) The rank condition (5.1.15) is that E( Fi ⊗ xi ) be of full column rank (where Fi ≡ QFi ). By the hint, E( Fi ⊗ x i ) = [IM ⊗ E( xi xi )](Q ⊗ I K )A. Since E(xi xi ) is non-singular, IM ⊗ E(xi xi ) is non-singular. Multiplication by a non-singular matrix does not alter rank.
7. The hint is the answer.
7
September 10, 2004
Hayashi Econometrics
Solution to Chapter 6 Analytical Exercises 1. The hint is the answer. 2. (a) Let σ n
≡
n 2 j =0 ψj .
Then
m
E[(yt,m
2
t,n )
−y
]=E
ψj εt−j
j =n+1 m
= σ
2
2
ψj2 (since εt is white noise)
{ }
j =n+1
2
= σ αm
| − α |. n
Since ψj is absolutely summable (and hence square summable), αn converges. So αm αn as m, n . Therefore, E[(yt,m yt,n )2 ] 0 as m, n , which means yt,n converges in mean square in n by (i).
{ } { } | − |→∞ →∞ − → →∞ { } (b) Since y →m.s. y as shown in (a), E(y ) = lim E(y ) by (ii). But E(y ) = 0. − µ →m.s. y − µ as n → ∞, (c) Since y − µ →m.s. y − µ and y E[(y − µ)(y − µ)] = lim E[(y , − µ)(y , − µ)]. t,n
t
t
t,n
t
t,n
n→∞
t−j,n
t
t,n
t−j
t−j
t,n
n→∞
t−j ,n
(d) (reproducing the answer on pp. 441-442 of the book) Since ψj is absolutely summable, ψj 0 as j . So for any j, there exists an A > 0 such that ψj +k A for all j, k. So ψj +k ψk A ψk . Since ψk (and hence Aψk ) is absolutely summable, so is ψj +k ψk (k = 0, 1, 2, . . .) for any given j . Thus by (i),
→
{
|
· }
{ }
→∞ · |≤ | |
{ }
∞
|γ | = σ j
2
{
≤ | ∞
ψj +k ψk
k=0
σ
2
|
ψj +k ψk = σ
|
|
ψj +k ψk <
|| | ∞.
k=0
k
∞
∞
∞
ajk =
j =0
ψk ψj +k
j =0
||
| ≤ |ψ
k
| |
M
ψj <
j =0
∞
≡ |
| ∞.
∞
ψj
j =0
|
and sk
Then sk is summable because sk fore, by (ii),
{ }
2
| · |ψ |. Then
| | | Let
|≤
∞
k=0
Now set a jk in (ii) to ψj +k
|
}
≡ |ψ
k
| | j =0
ψj +k .
|
| | ≤ |ψ | · M and {ψ } is absolutely summable. There-
∞
k
k
∞
|
ψj +k
j =0 k=0
| · |ψ
k
|
<
∞.
This and the first inequality above mean that γ j is absolutely summable.
{ }
1
3. (a) γ j = Cov(yt,n , yt−j,n ) = Cov(h0 xt + h1 xt−1 + n
=
n
·· · + h
n xt−n , h0 xt−j +
h1 xt−j −1 +
·· · + h
n xt−j −n )
hk h Cov(xt−k , xt−j − )
k=0 =0 n n
=
hk h γ jx+−k .
k=0 =0
(b) Since hj is absolutely summable, we have y t,n m.s. y t as n by Proposition 6.2(a). Then, using the facts (i) and (ii) displayed in Analytical Exercise 2, we can show:
{ } n
→
→∞
n
hk h γ jx+−k = Cov(yt,n , yt−j,n )
k=0 =0
= E(yt,n yt−j,n )
as n result.
→ ∞.
− E(y
t,n ) E(yt−j,n )
That is,
n k=0
n =0
→ E(y y
hk h γ jx+−k
4. (a) (8) solves the difference equation y j
− E(y ) E(y ) = Cov(y , y ) converges as n → ∞ , which is the desired t
t−j )
t
− φ1y 1 − φ2y
yj
− φ1y 1 − φ2y
j−
j −2 =
t−j
t
t−j
0 because
j −2
j−
− φ1(c10λ1 +1 + c20λ2 +1) − φ2(c10λ1 +2 + c20λ2 = c 10 λ1 (1 − φ1 λ1 − φ2 λ21 ) + c20 λ1 (1 − φ1 λ2 − φ2 λ22 ) =0 (since λ 1 and λ 2 are the roots of 1 − φ1 z − φ2 z 2 = 0). j −j = (c10 λ− 1 + c20 λ2 )
−j
−j
−j
−j
−j
+2
)
−j
Writing down (8) for j = 0, 1 gives 1 −1 y0 = c10 + c20 , y1 = c 10 λ− 1 + c20 λ2 .
Solve this for (c10, c20 ) given (y0 , y1 , λ1 , λ2 ). (b) This should be easy. (c) For j
≥ J , we have j
n j
ξ < bj . Define B as
B
≡ max
ξ 2 n ξ j 3 n ξ 3 (J 1)n ξ J −1 , 2 , 3 ,..., b b b bJ −1
−
.
Then, by construction, j n ξ j B or j n ξ j B bj j b for j = 0, 1,..,J 1. Choose A so that A > 1 and A > B. Then j n ξ j < bj < A bj for j J and j n ξ j B bj < A bj for all j = 0, 1, . . . , J 1.
≥
≤
− ≤
≥
−
(d) The hint is the answer. 5. (a) Multiply both sides of (6.2.1 ) by y t−j the desired result.
− µ and take the expectation of both sides to derive
(b) The result follows immediately from the MA representation yt−j φ2 εt−j −2 + .
− µ = ε
·· ·
2
t−j + φ εt−j −1 +
(c) Immediate from (a) and (b). (d) Set j = 1 in (10) to obtain γ 1
− ργ 0 = 0. Combine this with (9) to solve for ( γ 0, γ 1): γ 0 =
σ2 σ2 , γ = φ. 1 1 φ2 1 φ2
−
−
Then use (10) as the first-order difference equation for j = 2, 3, . . . in γ j with the initial σ σ condition γ 1 = 1− φ. This gives: γ j = 1− φj , verifying (6.2.5). φ φ 2
2
2
2
6. (a) Should be obvious. (b) By the definition of mean-square convergence, what needs to be shown is that E[(xt xt,n )2 ] 0 as n .
→
→∞ E[(x − x
t,n )
t
−
2
] = E[(φn xt−n )2 ]
(since x t = x t,n + φn xt−n )
= φ 2n E(x2t−n ) (since φ < 1 and E(x2t−n ) <
→0
| |
∞).
(c) Should be obvious. 7. (d) By the hint, what needs to be shown is that ( F)n ξ t−n (F)n ξ t−n . m.s. 0. Let zn Contrary to the suggestion of the hint, which is to show the mean-square convergence of the components of zn , here we show an equivalent claim (see Review Question 2 to Section 2.1) that lim E(zn zn ) = 0.
→
≡
n→∞
zn zn =
trace(zn zn ) = trace[ ξ t−n [(F)n ] [(F)n ]ξt−n ] = trace ξ t−n ξ t−n [(F)n ] [(F)n ]
{
}
Since the trace and the expectations operator can be interchanged, E(zn zn ) = trace E(ξt−n ξ t−n )[(F)n ] [(F)n ] .
{
}
Since ξ t is covariance-stationary, we have E( ξ t−n ξ t−n ) = V (the autocovariance matrix). Since all the roots of the characteristic equation are less than one in absolute value, Fn = n 0. T(Λ) T−1 converges to a zero matrix. We can therefore conclude that E( zn zn ) n
(e) ψn is the (1,1) element of T(Λ)
→
T 1. −
8. (a) 1 φt c E(yt ) = c + φt E(y0 ) , 1 φ 1 φ 1 φ2t 2 σ2 2t Var(yt ) = σ + φ Var(y ) , 0 1 φ2 1 φ2 1 φ2(t−j ) 2 σ2 j 2(t−j ) Cov(yt , yt−j ) = φ j σ + φ Var(y ) φ . 0 1 φ2 1 φ2
− −
→ − → −
− − − −
→
−
(b) This should be easy to verify given the above formulas. 9. (a) The hint is the answer. (b) Since γ j 0, the result proved in (a) implies that n2 nj=1 γ j by the inequality for Var(y) shown in the question, Var(y) 0.
→
3
| | → 0. Also, γ 0/n → 0. So →
10. (a) By the hint, n
j aj
j =1
So
≤ N
n
n
n
ak +
j =1 k=j
1 n
n
j aj <
j =1
ak < N M + (n
j =N +1 k=j
− N ) 2ε .
NM n N ε N M ε + < + . n n 2 n 2
−
By taking n large enough, NM/n can be made less than ε/2. (b) From (6.5.2),
√ Var( n y) = γ + 2 0
− − { }
n−1
1
j =1
j γ j = γ 0 + 2 n
n−1
γ j
j =1
2 n
n−1
j γ j .
j =1
The term in brackets converges to ∞ j =−∞ γ j if γ j is summable. (a) has shown that the last term converges to zero if γ j is summable.
{ }
4
September 14, 2004
Hayashi Econometrics
Solution to Chapter 7 Analytical Exercises 1. (a) Since a(w) = 1 ⇔ f (y|x; θ) = f (y|x; θ0 ), we have Prob[a(w) = 1] = Prob[f (y|x; θ) = f (y|x; θ0 )]. But Prob[f (y|x; θ) = f (y|x; θ0 )] > 0 by hypothesis. (b) Set c(x) = log(x) in Jensen’s inequality. a(w) is non-constant by (a). (c) By the hint, E[a(w)|x] = 1. By the Law of Total Expectation, E[ a(w)] = 1. (d) By combining (b) and (c), E[log(a(w))] < log(1) = 0. But log(a(w)) = log f (y|x; θ) − log f (y|x; θ0 ). 2. (a) (The answer on p. 505 is reproduced here.) Since f (y | x; θ) is a hypothetical density, its integral is unity:
f (y | x; θ)dy = 1.
(1)
This is an identity, valid for any θ ∈ Θ. Differentiating both sides of this identity with respect to θ , we obtain ∂ f (y | x; θ)dy = 0 . (2) ∂ θ ( p×1) If the order of differentiation and integration can be interchanged, then ∂ ∂ θ
f (y | x; θ)dy =
∂ f (y | x; θ)dy. ∂ θ
But by the definition of the score, s(w; θ)f (y | x; θ) = into (3), we obtain
s(w; θ )f (y | x; θ )dy =
∂ f (y ∂ θ
(3)
| x; θ). Substituting this
0 .
(4)
( p×1)
This holds for any θ ∈ Θ, in particular, for θ0 . Setting θ = θ 0 , we obtain
s(w; θ 0 )f (y | x; θ0 )dy = E[s(w; θ 0 ) | x] =
0 .
(5)
( p×1)
Then, by the Law of Total Expectations, we obtain the desired result. (b) By the hint,
H(w; θ )f (y|x; θ )dy +
s(w; θ) s(w; θ ) f (y | x; θ )dy =
0 .
( p× p)
The desired result follows by setting θ = θ 0 . 3. (a) For the linear regression model with θ ≡ (β , σ 2 ) , the objective function is the average log likelihood: n 1 1 1 1 2 Qn (θ) = − log(2π) − log(σ ) − 2 (yt − xt β )2 . 2 2 2σ n t=1
1
To obtain the concentrated average log likelihood, take the partial derivative with respect to σ 2 and set it equal to 0, which yields 1 σ = n 2
n
(yt − xt β)2 ≡
t=1
1 SSR(β). n
Substituting this into the average log likelihood, we obtain the concentrated average log likelihood (concentrated with respect to σ 2 ): 1 1 1 1 1 Qn (β , SSR(β)) = − log(2π) − − log( SSR(β)). n 2 2 2 n
The unconstrained ML estimator ( β , σ 2 ) of θ 0 is obtained by maximizing this concentrated average log likelihood with respect to β, which yields β, and then setting σ 2 = n1 SSR(β). The constrained ML estimator, ( β, σ 2 ), is obtained from doing the same subject to the constraint Rβ = c. But, as clear from the expression for the concentrated average log likelihood shown above, maximizing the concentrated average log likelihood is equivalent to minimizing the sum of squared residuals SSR(β).
(b) Just substitute σ 2 = likelihood above.
1
n
SSR(β) and σ2 =
1
n
SSR(β ) into the concentrated average log
(c) As explained in the hint, both σ2 and σ 2 are consistent for σ02 . Reproducing (part of) (7.3.18) of Example 7.10, 1
− E H(wt ; θ0 ) =
σ02
E(xt xt ) 0
0 1
2σ04
.
(7.3.18)
Clearly, both Σ and Σ are consistent for − E H(wt ; θ0 ) because both σ 2 and σ 2 are consistent for σ 02 and n1 nt=1 xt xt is consistent for E( xt xt ). (d) The a(θ) and A(θ) in Table 7.2 for the present case are a(θ) = Rβ − c,
A(θ ) =
R
(r×K )
.. . 0
(r×1)
Also, observe that
.
n 1 1 ∂Q n (θ) 1 t=1 xt (yt − xt β ) n σ X (y − Xβ) = = 0 ∂ θ SSRR − 2σ1 + 2σ1 n1 nt=1 (yt − xt β)2 2
2
and
4
1 1 1 1 1 1 1 1 Qn (θ) = − log(2π) − − log( SSRU ), Qn (θ) = − log(2π) − − log( SSRR ). 2 2 2 n 2 2 2 n
Substitute these expressions and the expression for Σ and Σ given in the question into the Table 7.2 formulas, and just do the matrix algebra. (e) The hint is the answer.
(f) Let x ≡ SSR . Then x ≥ 1 and W/n = x − 1, LR/n = log(x), and LM /n = 1 − x1 . Draw SSR the graph of these three functions of x with x in the horizontal axis. Observe that their values at x = 1 are all 0 and the slopes at x = 1 are all one. Also observe that for x > 1, x − 1 > log(x) > 1 − x1 . R
U
2
September 22, 2004
Hayashi Econometrics
Solution to Chapter 8 Analytical Exercises 1. From the hint, n
n
n
(yt − Π xt )(yt − Π xt ) =
t=1
But
xt xt (Π − Π).
vt vt + ( Π − Π)
t=1
t=1
n
n
(Π − Π)
Π − Π) xt xt (Π − Π)
xt xt (Π − Π) =
t=1
t=1
is positive semi-definite.
2. Since y t = Π 0 xt + vt , we have y t − Π xt = v t + ( Π0 − Π) xt . So E[(yt − Π xt )(yt − Π xt ) ] = E[(vt + ( Π0 − Π) xt )(vt + ( Π0 − Π) xt ) ] = E(vt vt ) + E[vt xt (Π0 − Π)] + E[(Π0 − Π) xt vt ] + ( Π0 − Π) E(xt xt )(Π0 − Π) = E(vt vt ) + ( Π0 − Π) E(xt xt )(Π0 − Π) (since E(xt vt ) = 0 ). So
Ω(Π) → Ω0 + ( Π0 − Π) E(xt xt )(Π0 − Π)
almost surely. By the matrix algebra result cited in the previous question,
|Ω0 + ( Π0 − Π) E(xt xt )(Π0 − Π)| ≥ |Ω0 | > 0.
So for sufficiently large n, Ω(Π) is positive definite. . 3. (a) Multiply both sides of z tm = yt Sm .. x t Cm from left by xt to obtain . xt ztm = xt yt Sm .. x t xt Cm .
Do the same to the reduced form yt = xt Π0 + v t to obtain Substitute this into ( ∗) to obtain . . xt ztm = xt xt Π0 Sm .. x t xt Cm + xt vt .. 0 = x t xt Π0 Sm
(∗)
xt yt = xt xt Π0 + x t vt .
.. . . C m + xt vt .. 0 .
Take the expected value of both sides and use the fact that E( xt vt ) = 0 to obtain the desired result. (b) Use the reduced form y t = Π 0 xt + vt to derive yt + Γ−1 Bxt = v t + ( Π0 + Γ−1 B)xt
as in the hint. So (yt + Γ−1 Bxt )(yt + Γ−1 Bxt ) = [ vt + ( Π0 + Γ−1 B)xt ][vt + ( Π0 + Γ−1 B)xt ] = v t vt + ( Π0 + Γ−1 B)xt vt + vt xt (Π0 + Γ−1 B) + (Π0 + Γ−1 B)xt xt (Π0 + Γ−1 B) . 1
Taking the expected value and noting that E( xt vt ) = 0 , we obtain E[(yt + Γ−1 Bxt )(yt + Γ−1 Bxt ) ] = E(vt vt ) + ( Π0 + Γ−1 B) E(xt xt )(Π0 + Γ−1 B) .
Since {yt , xt } is i.i.d., the probability limit of Ω(δ) is given by this expectation. In this 1 −1 −1 expression, E(vt vt ) equals Γ− 0 Σ0 (Γ0 ) because by definition vt ≡ Γ0 εt and Σ0 ≡ E(εt εt ).
(c) What needs to be proved is that plim |Ω(δ )| is minimized only if ΓΠ0 + B = 0 . Let A ≡ −1 −1 1 Γ− 0 Σ0 (Γ0 ) be the first term on the RHS of (7) and let D ≡ (Π0 + Γ B) E(xt xt )(Π0 + 1 Γ−1 B) be the second term. Since Σ0 is positive definite and Γ− 0 is non-singular, A is positive definite. Since E(xt xt ) is positive definite, D is positive semi-definite. Then use the following the matrix inequality (which is slightly different from the one mentioned in Analytical Exercise 1 on p. 552): (Theorem 22 on p. 21 of Matrix Differential Calculus with Applications in Statistics and Econometrics by Jan R. Magnus and Heinz Neudecker, Wiley, 1988) Let A be positive definite and B positive semi-definite. Then
|A + B| ≥ |A| with equality if and only if B = 0 . Hence
1 −1 plim |Ω(δ)| = |A + D| ≥ |A| = |Γ− 0 Σ0 (Γ0 ) |.
with equality “|A + D | = |A|” only if D = 0. Since E(xt xt ) is positive definite, D ≡ (Π0 + Γ−1 B) E(xt xt )(Π0 + Γ−1 B) is a zero matrix only if Π0 + Γ−1 B = 0 , which holds if and only if ΓΠ0 + B = 0 since Γ is non-singular (the parameter space is such that Γ is non-singular). (d) For m = 1, the LHS of (8) is
αm =
The RHS is
− γ 11
1 −β 11
Sm
em −
δm (1×(M +K )) m
m
(M
m
(K
0
m
=
0
1 0
0 0
)
(M
)
(K
×M
×M
− γ 11
2
0
m
−β 12
)
×K
Cm m
)
×K
β 11
β 12
0 .
1 0 0 0 0 0 0 1 0 0 . 0 0 0 1 0
(e) Since
αm is
αm
. the m-th row of Γ .. B , the m -th row of of the LHS of (9) equals
Π0 IK
=
em −
= e m
δm (1×(M +K )) m
m
Π0
− δm
IK
Sm 0
Sm
(M
m
(K
)
×M
0
m
)
×M
(M
0
m
m
)
×K
0
Π0
Cm
IK
Π0
)
×K
Cm (K
(M ×K )
(by (8))
IK
. S Π = [[ Π0 .. I K ] em ] − δm m 0 Cm
= π 0m − δm
Sm Π0 Cm
(by the definition of π 0m ).
(f) By definition (see (8.5.10)), Γ0 Π0 + B0 = 0 . By the same argument given in (e) with δm replaced by δ 0m shows that δ 0m is a solution to (10). Rewrite (10) by taking the transpose: . Ax = y with A ≡ [Π0 Sm .. C m ], x ≡ δ m , y ≡ π0m .
(10 )
A necessary and sufficient condition that δ0m is the only solution to (10 ) is that the coefficient matrix in (10 ), which is K × Lm (where Lm = M m + K m ), be of full column rank (that is, the rank of the matrix be equal to the number of columns, which is L m ). We have shown in (a) that this condition is equivalent to the rank condition for identification for the m -th equation. (g) The hint is the answer. . 4. In this part, we let F m stand for the K × Lm matrix [Π0 Sm .. C m ]. Since x tK does not appear in the system, the last row of Π0 is a vector of zeros and the last row of Cm is a vector of zeros. So the last row of of F m is a vector of zeros:
Fm
Fm =
((K −1)×L ) m
0
.
(1×L ) m
Dropping x tK from the list of instruments means dropping the last row of F m , which does not alter the full column rank condition. The asymptotic variance of the FIML estimator is given in (4.5.15) with (4.5.16) on p. 278. Using (6) on (4.5.16), we obtain
Amh = F m E(xt xt )Fh = Fm
0
E( xt xt ) E(xtK xt ) E(xtK xt ) E(x2tK )
Fh 0
= Fm E(xt xt )Fh .
This shows that the asymptotic variance is unchanged when x tK is dropped.
3
September 16, 2004
Hayashi Econometrics
Solution to Chapter 9 Analytical Exercises 1. From the hint, we have T
1 1 ∆ξ t ξ t−1 = T t=1 2
·
√ − √ − 2
ξ T T
1 2
ξ 0 T
2
T
1 (∆ξ t )2 . 2T t=1
( )
∗
√ →
√ →
Consider the second term on the RHS of ( ). Since E(ξ 0 / T ) 0 and Var(ξ 0 / T ) 0, ξ 0 / T converges in mean square (by Chevychev’s LLN), and hence in probability, to 0. So the second term vanishes (converges in probability to zero) (this can actually be shown directly from the definition of convergence in probability). Next, consider the expression ξ T / T in the first term on the RHS of ( ). It can be written as
∗
√
√
∗
√ 1 ξ 1 ξ 0 √ = √ (ξ 0 + ∆ξ 1 + · ·· + ∆ξ ) = √ + T T T T T T
T
T
∆ξ t .
t=1
ξ As just seen, √ vanishes. Since ∆ξ t is I(0) satisfying (9.2.1)-(9.2.3), the hypothesis of ProposiT tion 6.9 is satisfied (in particular, the absolute summability in the hypothesis of the Proposition is satisfied because it is implied by the one-summability (9.2.3a)). So 0
T
√
1 T ∆ξ t T t=1
→d λX, X ∼ N (0, 1).
where λ2 is the long-run variance of ∆ξ t . Regarding the third term on the RHS of ( ), since 1 2 ∆ξ t is ergodic stationary, 21T T t=1 (∆ξ t ) converges in probability to 2 γ 0 . Finally, by Lemma 2.4(a) we conclude that the RHS of ( ) converges in distribution to λ2 X 2 12 γ 0 .
∗
2
∗
2. (a) The hint is the answer. (b) From (a), 1 µ
T (ρ
·
−
T
1) =
−
1
T 2
T µ t=1 ∆yt yt−1 . T µ 2 t=1 (yt−1 )
Apply Proposition 9.2(d) to the numerator and Proposition 9.2(c) to the denominator. (c) Since yt is random walk, λ 2 = γ 0 . Just set λ 2 = γ 0 in (4) of the question. (d) First, a proof that α∗ p 0. By the algebra of OLS,
•
{ }
→ − − − − − − √ · − √ T
1 α∗ = (yt T t=1
ρµ yt−1 )
T
1 = (∆yt T t=1
(ρµ
T
1 = ∆yt T t=1 T
1 = ∆yt T t=1
1)yt−1 ) T
1 1) yt−1 T t=1
µ
(ρ
1 T (ρµ T 1
T
1)
1 1 yt−1 T T t=1
.
T 1 The first term after the last equality, T t=1 ∆yt , vanishes (converges to zero in probability) because ∆yt is ergodic stationary and E(∆yt ) = 0. To show that the second term after the last equality vanishes, we first note that √ 1T T (ρµ 1) vanishes because
· −
·
−
•
T 1 T (ρµ 1) converges to a random variable by (b). By (6) in the hint, √ 1T T t=1 yt−1 converges to a random variable. Therefore, by Lemma 2.4(b), the whole second term vanishes. Now turn to s 2 . From the hint,
s =
T
− − 1
2
T
1
α∗ )2
(∆yt
t=1
T
− · − − · · − · · · − · − 2
T
+
[T (ρ
1
1
T
1 1)] (∆yt T t=1
µ
2
µ
[T (ρ
1
1)]
1 T 2
α∗ ) yt−1
T
t=1
(yt−1 )2 .
( )
∗
Since α∗ p 0, it should be easy to show that the first term on the RHS of ( ) converges to γ 0 in probability. Regarding the second term, rewrite it as 2
→
∗
T
1 1)] ∆yt yt−1 T t=1
µ
− 1 · [T · (ρ − ·
T
√ − − · ·
T
2 T [T (ρµ T 1
1 1 yt−1 . ( ) T T t=1
− · · √ 1)] α∗
∗∗
T 1 By Proposition 9.2(b), T t=1 ∆yt yt−1 converges to a random variable. So does T (ρµ 1). Hence the first term of ( ) vanishes. Turning to the second term of ( ), (6) T 1 in the question means √ 1T T t=1 yt−1 converges to a random variable. It should now be routine to show that the whole second term of ( ) vanishes. A similar argument, this time utilizing Proposition 9.2(a), shows that the third term of ( ) vanishes.
−
∗∗
∗∗
∗∗
∗
(e) By (7) in the hint and (3), a little algebra yields ρµ
µ
t =
s
·
−
· 1
1
1 µ T 2 t=1 (yt−1 )
T
= s
T µ t=1 ∆yt yt−1
1
T 2
·
T µ 2 t=1 (yt−1 )
.
Use Proposition 9.2(c) and (d) with λ2 = γ 0 = σ 2 and the fact that s is consistent for σ to complete the proof. 3. (a) The hint is the answer. (b) From (a), we have 1 τ
T (ρ
·
−
1) =
T
1
T 2
T τ t=1 ∆yt yt−1 . T τ 2 t=1 (yt−1 )
Let ξ t and ξ tτ be as defined in the hint. Then ∆yt = δ + ∆ξ t and y tτ = ξ tτ . By construction, T τ t=1 yt−1 = 0. So T 1 τ t=1 ∆ξ t ξ t−1 τ T T (ρ 1) = 1 . T τ 2 (ξ ) − t 1 t =1 T
·
−
2
Since ξ t is driftless I(1), Proposition 9.2(e) and (f) can be used here.
{ }
(c) Just observe that λ 2 = γ 0 if yt is a random walk with or without drift.
{ }
2
4. From the hint, T
T
T
1 1 1 yt−1 εt = ψ(1) wt−1 εt + ηt−1 εt + (y0 T t=1 T t=1 T t=1
T
−
1 η0 ) εt . T t=1
( )
∗
Consider first the second term on the RHS of ( ). Since ηt−1 , which is a function of (εt−1 , εt−2 , . . . ), is independent of εt , we have: E(ηt−1 εt ) = E(ηt−1 ) E(εt ) = 0. Then by the ergodic theorem T 1 this second term vanishes. Regarding the third term of ( ), T p 0. So the whole third t=1 εt term vanishes. Lastly, consider the first term on the RHS of ( ). Since wt is random walk and
∗
∗
∗
→ { } 2 ε = ∆w , Proposition 9.2(b) with λ 2 = γ 0 = σ 2 implies 1 =1 w −1 ε →d 2 [W (1) − 1]. 5. Comparing Proposition 9.6 and 9.7, the null is the same (that {∆y } is zero-mean stationary AR( p), φ(L)∆y = ε , whose MA representation is ∆y = ψ(L)ε with ψ(L) ≡ φ(L)−1 ) but t
t
T t
T
t
t
t
t
σ2
t
t
t
the augmented autoregression in Proposition 9.7 has an intercept. The proof of Proposition 9.7 (for p = 1) makes appropriate changes on the argument developed on pp. 587-590. Let b and β be as defined in the hint. The AT and cT for the present case is
1
AT
=
2
T
√ 1
T
=
T
1
T
1
T (µ) t=1 (∆yt−1 )
√ 1
T µ (µ) t=1 yt−1 (∆yt−1 ) T (µ) ]2 t=1 [(∆yt−1 )
1
T T
ytµ−1
T µ µ t=1 yt−1 εt
1
√ 1
T (µ) t=1 (∆yt−1 )
1
T T
cT
T µ 2 t=1 (yt−1 )
εµt
=
T
T µ t=1 yt−1 εt
T (µ) t=1 (∆yt−1 )
√ 1
T
εt
,
,
where ε µt is the residual from the regression of ε t on a constant for t = 1, 2,...,T .
• (1,1) element of A
T :
Since yt is driftless I(1) under the null, Proposition 9.2(c) can T µ 2 2 (W µ )2 , where λ2 = σ 2 [ψ(1)]2 with σ 2 be used to claim that T 1 d λ t=1 (yt−1 ) Var(εt ). 2
• (2,2) element of A
T :
written as
{ }
→
Since (∆yt−1 )(µ) = ∆yt−1
T
≡
T
T
− 1
−
1 1 [(∆yt−1 )(µ) ]2 = (∆yt−1 )2 T t=1 T t=1
T t=1 ∆yt−1 , T
this element can be
1 ∆yt−1 T t=1
2
.
Since E(∆yt−1 ) = 0 and E[(∆yt−1 )2 ] = γ 0 (the variance of ∆yt ), this expression converges in probability to γ 0 .
• Off diagonal elements of A
T :
T
it equals
√ T
1 1 1 1 (∆yt−1 )(µ) ytµ−1 = (∆yt−1 ) yt−1 T T t=1 T T t=1
√
− √ T
1 1 yt−1 T T t=1
T
1 ∆yt−1 . T t=1
The term in the square bracket is (9.4.14), which is shown to converge to a random variable T 1 (Review Question 3 of Section 9.4). The next term, √ 1T T t=1 yt−1 , converges to a ran1 dom variable by (6) assumed in Analytical Exercise 2(d). The last term, T converges to zero in probability. Therefore, the off-diagonal elements vanish.
Taken together, we have shown that
AT is
asymptotically diagonal:
· →
AT
λ2
1 [W µ (r)]2 dr 0
d
3
0
0 , γ 0
T t=1 ∆yt−1 ,
so (AT Now turn to
)−1
→ ·
1 [W µ (r)]2 0
λ2
dr
0
d
−1
0 . γ 0−1
cT .
Recall that ytµ−1 yt−1 decomposition y t−1 = ψ(1)wt−1 + ηt−1 + (y0
• 1st element of c
T :
T
=1
T
T t
t
t
T
− { } → ≡ · · 1 T t
− 1 =1 y −1. Combine this with the BN − η0) with w −1 ≡ ε1 + ·· · + ε −1 to obtain
≡
1 ytµ−1 εt = ψ(1)
T t
=1
wtµ−1 εt +
t
T
1 T t
=1
ηtµ−1 εt ,
T µ 1 where w tµ−1 wt−1 T t=1 wt−1 . ηt−1 is defined similarly. Since η t−1 is independent of εt , the second term on the RHS vanishes. Noting that ∆ wt = ε t and applying Proposition 9.2(d) to the random walk wt , we obtain
≡
T
1 wtµ−1 εt T t=1
σ2 2
d
[W (1)µ ]2
− [W (0) ]2 − 1 µ
Therefore, the 1st element of cT converges in distribution to σ 2 ψ(1)
c1
1 2
[W (1)µ ]2
µ
≡ ∆y −1 − 1
Using the definition (∆yt−1 )(µ) easy to show that it converges in distribution to
• 2nd element of c
T :
− [W (0) ]2 − 1 t
T
.
. T t=1 ∆yt−1 ,
it should be
∼ N (0, γ 0 · σ2).
c2
Using the results derived so far, the modification to be made on (9.4.20) and (9.4.21) on p. 590 for the present case where the augmented autoregression has an intercept is T (ρµ
·
−
1)
→d
σ2 ψ(1) λ2
1 2
·
− √ · − [W (1)µ ]2
[W (0)µ ]2
1 [W µ (r)]2 dr 0
T (ζ 1
−1
λ2 T (ρµ σ 2 ψ(1)
or
· ·
−
2
ζ 1 )
→d N 0, σγ 0
.
1)
µ ρ
→d DF ,
Repeating exactly the same argument that is given in the subsection entitled “Deriving Test Statistics” on p. 590, we can claim that σ λψ(1) is consistently estimated by 1/(1 ζ ) . This completes the proof of claim (9.4.34) of Proposition 9.7. 2
−
2
6. (a) The hint is the answer.
(b) The proof should be straightforward. 7. The one-line proof displayed in the hint is (with i replaced by k to avoid confusion)
| | ∞
j =0
αj =
− ∞
∞
j =0
k=j +1
ψk
≤ | | | ∞
∞
j =0 k=j +1
ψk =
∞
k=0
k ψk <
| ∞,
( )
∗
where ψk (k = 0, 1, 2,...) is one-summable as assumed in (9.2.3a). We now justify each of the equalities and inequalities. For this purpose, we reproduce here the facts from calculus shown on pp. 429-430:
{ }
4