Matrix Analysis Matrix Analysis for Scientists for Scientists & & Engineers Engineers
This page intentionally intentionally left left blank blank This page
Matrix Matrix Analysis Analysis for Scientists Engineers for Scientists & & Engineers
Alan J. J. Laub Alan Laub University of California Davis, California
slam.
Copyright © 2005 by the the Society Society for Industrial and and Applied Mathematics. Copyright 2005 by for Industrial Applied Mathematics. 10987654321 10987654321 All America. No this book All rights rights reserved. reserved. Printed Printed in in the the United United States States of of America. No part part of of this book may be be reproduced, reproduced, stored, stored, or or transmitted transmitted in in any any manner manner without the written may without the written permission permission of the publisher. For For information, information, write to the the Society Society for Industrial and Applied of the publisher. write to for Industrial and Applied Mathematics, Mathematics, 3600 3600 University University City City Science Science Center, Center, Philadelphia, Philadelphia, PA PA 19104-2688. 19104-2688.
MATLAB® is is a a registered registered trademark trademark of The MathWorks, MathWorks, Inc. Inc. For For MATLAB MATLAB product product information, information, MATLAB® of The please contact The Apple Hill 01760-2098 USA, USA, please contact The MathWorks, MathWorks, Inc., Inc., 3 3 Apple Hill Drive, Drive, Natick, Natick, MA MA 01760-2098 508-647-7000, Fax: Fax: 508-647-7101, 508-647-7101,
[email protected], www.mathworks.com 508-647-7000,
[email protected], wwwmathworks.com Mathematica is is a a registered registered trademark trademark of of Wolfram Wolfram Research, Research, Inc. Mathematica Inc. Mathcad is is a a registered registered trademark of Mathsoft Mathsoft Engineering Engineering & & Education, Education, Inc. Mathcad trademark of Inc. Library of of Congress Congress Cataloging-in-Publication Cataloging-in-Publication Data Data Library Laub, Alan J., 19481948Laub, Alan J., Matrix analysis scientists and and engineers engineers // Alan Matrix analysis for for scientists Alan J. J. Laub. Laub. p. cm. cm. p. Includes bibliographical bibliographical references references and and index. Includes index. ISBN 0-89871-576-8 0-89871-576-8 (pbk.) (pbk.) ISBN 1. Matrices. Matrices. 2. 2. Mathematical Mathematical analysis. analysis. I.I. Title. Title. 1. QA188138 2005 QA 188.L38 2005 512.9'434—dc22 512.9'434-dc22
2004059962 2004059962
About the cover: cover: The The original original artwork artwork featured on the cover was created by by freelance About the featured on the cover was created freelance permission . artist Aaron Tallon artist Aaron Tallon of of Philadelphia, Philadelphia, PA. PA. Used Used by by permission.
• slam
5.lam...
is a a registered registered trademark. is trademark.
To To my my wife, wife, Beverley Beverley (who captivated captivated me in the UBC UBC math library nearly forty years ago) nearly forty
This page intentionally intentionally left left blank blank This page
Contents Contents Preface Preface
xi xi
11
Introduction Introduction and and Review Review 1.1 Notation and 1.1 Some Some Notation and Terminology Terminology 1.2 Matrix Matrix Arithmetic 1.2 Arithmetic . . . . . . . . 1.3 Inner Inner Products and Orthogonality 1.3 Products and Orthogonality . 1.4 Determinants 1.4 Determinants
11 11 33 4 44
2 2
Vector Vector Spaces Spaces 2.1 Definitions Examples . 2.1 Definitions and and Examples 2.2 Subspaces......... 2.2 Subspaces 2.3 2.3 Linear Linear Independence Independence . . . 2.4 Sums and Intersections Intersections of 2.4 Sums and of Subspaces Subspaces
77 77 99 10 10 13 13
33
Linear Linear Transformations Transformations 3.1 Definition Definition and Examples . . . . . . . . . . . . . 3.1 and Examples 3.2 Matrix Representation of Linear 3.2 Matrix Representation of Linear Transformations Transformations 3.3 Composition Transformations . . 3.3 Composition of of Transformations 3.4 Structure of Linear Linear Transformations Transformations 3.4 Structure of 3.5 3.5 Four Four Fundamental Fundamental Subspaces Subspaces . . . .
17 17 17 17 18 18 19 19 20 20 22 22
4 4
Introduction Introduction to to the the Moore-Penrose Moore-Penrose Pseudoinverse Pseudoinverse 4.1 Definitions and Characterizations Characterizations. 4.1 Definitions and 4.2 Examples.......... 4.2 Examples 4.3 Properties and and Applications Applications . . . . 4.3 Properties
29 29 30 30 31 31
55
Introduction Introduction to to the the Singular Singular Value Value Decomposition Decomposition 5.1 5.1 The The Fundamental Fundamental Theorem Theorem . . . 5.2 Some Basic Properties Properties . . . . . 5.2 Some Basic 5.3 Row and Column Compressions 5.3 Rowand Column Compressions
35 35 35 35 38 40
6 6
Linear Linear Equations Equations 6.1 Vector Vector Linear Linear Equations Equations . . . . . . . . . 6.1 6.2 Matrix Linear Equations Equations . . . . . . . . 6.2 Matrix Linear 6.3 6.3 A A More More General General Matrix Matrix Linear Linear Equation Equation 6.4 Some Useful and and Interesting Inverses. 6.4 Some Useful Interesting Inverses
43 43 43 43
vii
44 47 47 47 47
viii viii
Contents Contents
7
Projections, Inner Product Spaces, and Norms 7.1 Projections . . . . . . . . . . . . . . . . . . . . . . 7.1 Projections 7.1.1 The fundamental orthogonal orthogonal projections projections 7.1.1 The four four fundamental 7.2 Inner Product Product Spaces Spaces 7.2 Inner 7.3 7.3 Vector Vector Norms Norms 7.4 Matrix Norms Norms . . . . 7.4 Matrix
51 51 51 51 52 52 54 54 57 57 59 59
8
Linear Least Squares Problems 8.1 Linear Least Least Squares Problem . . . . . . . . . . . . . . 8.1 The The Linear Squares Problem 8.2 8.2 Geometric Geometric Solution Solution . . . . . . . . . . . . . . . . . . . . . . 8.3 Linear Regression Regression and and Other 8.3 Linear Other Linear Linear Least Least Squares Squares Problems Problems 8.3.1 Linear regression 8.3.1 Example: Example: Linear regression . . . . . . . 8.3.2 problems . . . . . . . 8.3.2 Other Other least least squares squares problems 8.4 Least Squares 8.4 Least Squares and and Singular Singular Value Value Decomposition Decomposition 8.5 Least Squares and QR Factorization Factorization . . . . . . . 8.5 Least Squares and QR
65 65 65 65 67 67 67 67 67 67 69 70 70 71 71
9
Eigenvalues and Eigenvectors 9.1 Fundamental Definitions Definitions and Properties 9.1 Fundamental and Properties 9.2 Jordan Jordan Canonical Canonical Form Form . . . . . 9.2 the JCF 9.3 Determination of 9.3 Determination of the JCF . . . . . 9.3.1 Theoretical computation . 9.3.1 Theoretical computation l's in in JCF blocks 9.3.2 On the + 9.3.2 On the +1's JCF blocks 9.4 Geometric Aspects of JCF of the the JCF 9.4 Geometric Aspects 9.5 The The Matrix Sign Function Function. 9.5 Matrix Sign
75 75 75 82 82 85 85 86 86 88 88 89 89 91 91
10 Canonical Forms 10.1 Basic Canonical 10.1 Some Some Basic Canonical Forms Forms . 10.2 Definite 10.2 Definite Matrices Matrices . . . . . . . 10.3 Equivalence Transformations Transformations and 10.3 Equivalence and Congruence Congruence 10.3.1 matrices and 10.3.1 Block Block matrices and definiteness definiteness 10.4 Rational Canonical 10.4 Rational Canonical Form Form . . . . . . . . .
95 95
95 95 99 102 102 104 104 104 104
11 Linear Differential and and Difference Difference Equations Equations 11 Linear Differential 11.1 Differential ILl Differential Equations Equations . . . . . . . . . . . . . . . . 11.1.1 matrix exponential 11.1.1 Properties Properties ofthe of the matrix exponential . . . . 11.1.2 11.1.2 Homogeneous Homogeneous linear linear differential differential equations equations 11.1.3 11.1.3 Inhomogeneous Inhomogeneous linear linear differential differential equations equations 11.1.4 Linear matrix differential equations 11.1.4 Linear matrix differential equations . . 11.1.5 decompositions . . . . . . . . . 11.1.5 Modal Modal decompositions matrix exponential 11.1.6 11.1.6 Computation Computation of of the the matrix exponential 11.2 Difference Equations . . . . . . . . . . . . . . 11.2 Difference Equations 11.2.1 linear difference difference equations 11.2.1 Homogeneous Homogeneous linear equations 11.2.2 Inhomogeneous linear difference equations 11.2.2 Inhomogeneous linear difference equations 11.2.3 powers . 11.2.3 Computation Computation of of matrix matrix powers Equations. . . . . . . . . . . . . . . 11.3 Higher-Order Equations 11.3 Higher-Order
109 109 109 109 109 109 112 112 112 112 113 113 114 114 114 114 118 118 118 118 118 118 119 119 120 120
Contents Contents
ix ix
12 Generalized Eigenvalue Eigenvalue Problems Problems 12 Generalized 12.1 The Generalized EigenvaluelEigenvector 12.1 The Generalized Eigenvalue/Eigenvector Problem Problem 12.2 Forms . . . . . . . . . . . . . . . . . 12.2 Canonical Canonical Forms 12.3 Application to to the the Computation of System Zeros . 12.3 Application Computation of System Zeros 12.4 Generalized Eigenvalue Eigenvalue Problems 12.4 Symmetric Symmetric Generalized Problems . 12.5 Simultaneous Simultaneous Diagonalization 12.5 Diagonalization . . . . . . . . . 12.5.1 Simultaneous Simultaneous diagonalization 12.5.1 diagonalization via via SVD SVD 12.6 Higher-Order Higher-Order Eigenvalue Problems .. 12.6 Eigenvalue Problems 12.6.1 Conversion Conversion to first-order form form 12.6.1 to first-order
125 125 125 127 127 130 131 131 133 133 133 135 135 135
13 Kronecker 13 Kronecker Products Products 13.1 Definition and Examples Examples . . . . . . . . . . . . . 13.1 Definition and 13.2 Properties Properties of of the the Kronecker Kronecker Product Product . . . . . . . 13.2 13.3 Application to to Sylvester and Lyapunov Lyapunov Equations Equations 13.3 Application Sylvester and
139 139 139 139 140 144 144
Bibliography Bibliography
151
Index Index
153
This page intentionally intentionally left left blank blank This page
Preface Preface This intended to for beginning (or even even senior-level) This book book is is intended to be be used used as as aa text text for beginning graduate-level graduate-level (or senior-level) students in the sciences, sciences, mathematics, computer science, science, or students in engineering, engineering, the mathematics, computer or computational computational science who wish to be familar with enough prepared to science enough matrix analysis analysis that they they are are prepared to use its tools and ideas comfortably in aa variety variety of applications. By By matrix matrix analysis analysis II mean mean linear tools and ideas comfortably in of applications. linear algebra and and matrix application to algebra matrix theory theory together together with with their their intrinsic intrinsic interaction interaction with with and and application to linear linear differential text linear dynamical dynamical systems systems (systems (systems of of linear differential or or difference difference equations). equations). The The text can be used used in one-quarter or or one-semester one-semester course course to to provide provide aa compact compact overview of can be in aa one-quarter overview of much important and and useful useful mathematics mathematics that, that, in many cases, cases, students meant to to learn learn much of of the the important in many students meant thoroughly somehow didn't manage to topics thoroughly as as undergraduates, undergraduates, but but somehow didn't quite quite manage to do. do. Certain Certain topics that may may have have been been treated treated cursorily cursorily in in undergraduate undergraduate courses courses are treated in more depth that are treated in more depth and more more advanced is introduced. only the and advanced material material is introduced. II have have tried tried throughout throughout to to emphasize emphasize only the more important and "useful" tools, methods, and mathematical structures. Instructors are encouraged to supplement the book book with with specific specific application from their their own own encouraged to supplement the application examples examples from particular area. particular subject subject area. The choice of algebra and and matrix matrix theory theory is is motivated motivated both both by by The choice of topics topics covered covered in in linear linear algebra applications and computational utility relevance. The The concept of matrix applications and by by computational utility and and relevance. concept of matrix factorization factorization is is emphasized emphasized throughout throughout to to provide provide aa foundation foundation for for aa later later course course in in numerical numerical linear linear algebra. are stressed than abstract vector spaces, spaces, although although Chapters and 3 3 algebra. Matrices Matrices are stressed more more than abstract vector Chapters 22 and do cover cover some geometric (i.e., subspace) aspects aspects of fundamental do some geometric (i.e., basis-free basis-free or or subspace) of many many of of the the fundamental notions. The books by Meyer [18], Noble and Daniel [20], Ortega Ortega [21], and Strang [24] are excellent companion companion texts for this book. Upon course based based on on this this are excellent texts for this book. Upon completion completion of of aa course text, the student is then then well-equipped to pursue, pursue, either via formal formal courses through selftext, the student is well-equipped to either via courses or or through selfstudy, follow-on topics on the computational side (at the level of [7], [II], [11], [23], or [25], for example) or or on on the side (at level of [12], [13], [13], or [16], for example). of [12], or [16], for example). example) the theoretical theoretical side (at the the level essentially just an understanding Prerequisites for for using this this text are quite modest: essentially understanding of and definitely some previous previous exposure to matrices matrices and linear algebra. Basic of calculus calculus and definitely some exposure to and linear algebra. Basic concepts such such as determinants, singularity singularity of eigenvalues and concepts as determinants, of matrices, matrices, eigenvalues and eigenvectors, eigenvectors, and and positive definite matrices matrices should have been covered at least least once, even though their recollection may occasionally occasionally be be "hazy." However, requiring requiring such material as as prerequisite prerequisite permits tion may "hazy." However, such material permits the early "out-of-order" by standards) introduction of topics the early (but (but "out-of-order" by conventional conventional standards) introduction of topics such such as as pseupseudoinverses and and the singular value decomposition (SVD). tools doinverses the singular value decomposition (SVD). These These powerful powerful and and versatile versatile tools can can then be exploited exploited to to provide a unifying foundation foundation upon which to base subsequent subsequent toptopics. Because tools tools such the SVD are not not generally generally amenable to "hand "hand computation," computation," this this ics. Because such as as the SVD are amenable to approach necessarily availability of of appropriate mathematical software software on appropriate mathematical on approach necessarily presupposes presupposes the the availability aa digital digital computer. computer. For For this, this, II highly highly recommend recommend MAlLAB® MATLAB® although although other other software software such such as as
xi xi
xii xii
Preface Preface
Mathcad® is also excellent. Since this text is not intended for a course in Mathematica® or Mathcad® numerical linear algebra per per se, se, the details of most of the numerical aspects of linear algebra are deferred to are deferred to such such aa course. course. The presentation of the material in this book is is strongly influenced influenced by by computacomputational issues for two principal reasons. First, "real-life" "real-life" problems seldom yield to simple closed-form closed-form formulas or solutions. They must generally be solved computationally and it is important to know which types of algorithms can be relied upon and which cannot. Some of of the numerical linear linear algebra, form the Some the key key algorithms algorithms of of numerical algebra, in in particular, particular, form the foundation foundation virtually all of modern modem scientific and engineering computation. A second upon which rests virtually motivation for a computational emphasis is that it provides many of the essential tools for what I call "qualitative mathematics." mathematics." For example, in an elementary linear algebra course, a set of vectors is either linearly independent or it is not. This is an absolutely fundamental fundamental concept. But in most engineering or scientific contexts we want to know more than that. If linearly independent, independent, how "nearly dependent" are the vectors? If If a set of vectors is linearly If they are linearly dependent, are there "best" linearly independent subsets? These tum turn out to be more difficult difficult problems frequently involve involve research-level research-level questions questions when be much much more problems and and frequently when set set in the context of of the finite-precision, finite-range floating-point arithmetic environment of of most modem modern computing platforms. Some of of the the applications applications of of matrix matrix analysis analysis mentioned mentioned briefly briefly in in this this book book derive modem state-space from the modern state-space approach to dynamical systems. State-space State-space methods are modem engineering where, for example, control systems with now standard standard in much of modern large numbers numbers of interacting inputs, outputs, and states often give rise to models models of very high order that must be analyzed, simulated, and evaluated. The "language" in which such described involves vectors and matrices. It is thus crucial to acquire models are conveniently described knowledge of the vocabulary vocabulary and grammar of this language. The tools of matrix a working knowledge analysis are also applied applied on a daily basis to problems in biology, chemistry, econometrics, physics, statistics, and a wide variety of other fields, and thus the text can serve a rather diverse audience. audience. Mastery of the material in this text should enable the student to read and diverse understand the modern modem language of matrices used throughout mathematics, science, and engineering. prerequisites for this text are modest, and while most material is developed developed from While prerequisites basic ideas in the book, the student does require a certain amount of what is conventionally referred to as "mathematical maturity." Proofs Proofs are given for many theorems. When they are referred not given explicitly, obvious or or easily easily found found in literature. This This is is ideal ideal not given explicitly, they they are are either either obvious in the the literature. material from which to learn a bit about mathematical proofs and the mathematical maturity and insight gained thereby. It is my firm conviction conviction that such maturity is neither neither encouraged nor nurtured by relegating the mathematical aspects of applications (for example, linear algebra for elementary state-space theory) to introducing it "on-the-f1y" "on-the-fly" when algebra to an appendix or introducing foundation upon necessary. Rather, Rather, one must must lay lay a firm firm foundation upon which which subsequent applications and and perspectives can be built in a logical, consistent, and coherent fashion. perspectives I have taught this material for many years, many times at UCSB and twice at UC Davis, course has successful at enabling students students from from Davis, and and the the course has proven proven to to be be remarkably remarkably successful at enabling disparate backgrounds to acquire a quite acceptable acceptable level of mathematical maturity and graduate studies in a variety of disciplines. Indeed, many students who rigor for subsequent graduate completed the course, especially especially the first few times it was offered, offered, remarked afterward that completed if only they had had this course before they took linear systems, or signal processing. processing, if
Preface Preface
xiii XIII
or estimation theory, etc., they would have been able to concentrate on the new ideas deficiencies in their they wanted to learn, rather than having to spend time making up for deficiencies background in matrices and linear algebra. My fellow instructors, too, realized that by background requiring this course as a prerequisite, they no longer had to provide as much time for "review" and could focus instead on the subject at hand. The concept seems to work.
-AJL, — AJL, June 2004
This page intentionally intentionally left left blank blank This page
Chapter 1 Chapter 1
Introduction and and Review Introduction Review
1.1 1.1
Some Notation Notation and and Terminology Terminology Some
We begin with with aa brief brief introduction notation and used We begin introduction to to some some standard standard notation and terminology terminology to to be be used throughout the text. This This is review of of some some basic notions in throughout the text. is followed followed by by aa review basic notions in matrix matrix analysis analysis and linear linear algebra. algebra. and The The following following sets sets appear appear frequently frequently throughout throughout subsequent subsequent chapters: chapters:
1. Rnn== the the set set of of n-tuples n-tuples of of real real numbers as column column vectors. vectors. Thus, Thus, xx Ee Rn I. IR numbers represented represented as IR n means means
where Xi xi Ee R for ii Ee !!. n. IR for where Henceforth, the notation!! notation n denotes denotes the the set set {I, {1, ... ..., , nn}. Henceforth, the }. Note: Vectors Vectors are vectors. A vector is where Note: are always always column column vectors. A row row vector is denoted denoted by by y~ yT, where yy G E Rn IR n and and the the superscript superscript T T is is the the transpose transpose operation. operation. That That aa vector vector is is always always aa column vector vector rather rather than row vector vector is entirely arbitrary, arbitrary, but this convention convention makes makes column than aa row is entirely but this it text that, x TTyy is while it easy easy to to recognize recognize immediately immediately throughout throughout the the text that, e.g., e.g., X is aa scalar scalar while T xy is an an nn xx nn matrix. xyT is matrix.
en
2. Cn = the the set set of of n-tuples n-tuples of of complex complex numbers numbers represented represented as as column column vectors. vectors. 2. 3. IR xn = Rrnmxn = the the set set of of real real (or (or real-valued) real-valued) m m xx nn matrices. matrices.
4. 1R;n xn Rmxnr
= xn denotes = the set set of of real real m x n matrices of of rank rank r. Thus, Thus, IR~ Rnxnn denotes the the set set of of real real nonsingular matrices. nonsingular n n xx nn matrices.
e
mxn 5. = 5. Crnxn = the the set set of of complex complex (or (or complex-valued) complex-valued) m xx nn matrices. matrices.
6. e;n xn Cmxn
= n matrices = the the set set of of complex complex m m xx n matrices of of rank rank r. r. 1
Chapter 1. 1. Introduction Introduction and and Review Review Chapter
22
We now classify some of the more familiar "shaped" matrices. A matrix A Ee IRn xn x (or A A E enxn ) is eC" ")is
diagonal if if aij a,7 == 00 for forii i= ^ }.j. •• diagonal upper triangular triangular if if aij a,; == 00 for forii >> }.j. •• upper lower triangular triangular if if aij a,7 == 00 for for i/ << }.j. •• lower tridiagonal if if aij a(y = = 00 for for Ii|z -—JI j\ > > 1. •• tridiagonal 1. pentadiagonal if if aij ai; = = 00 for for Ii|/ -—J j\I >> 2. •• pentadiagonal 2. upper Hessenberg Hessenberg if if aij afj == 00 for for ii -— jj >> 1. •• upper 1. lower Hessenberg Hessenberg if if aij a,; == 00 for for }j -—ii >> 1. •• lower 1. Each of the above also has a "block" analogue obtained by replacing scalar components in nxn mxn the respective definitions definitions by block block submatrices. submatrices. For For example, example, if if A Ee IR Rnxn , , B Ee IR R nxm ,, and C Ee jRmxm, Rmxm, then then the the (m (m + n) n) xx (m (m + n) n) matrix matrix [~ [A0Bc block upper upper triangular. triangular. ~]] isisblock C T A is AT and is the matrix whose entry The transpose of The of aa matrix matrix A is denoted denoted by by A and is the matrix whose (i, j)th j)th entry 7 mx A, that is, (AT)ij A E jRmxn, AT7" e E jRnxm. is the (j, (7, i)th Oth entry of A, (A ),, = aji. a,,. Note that if A e R ", then A E" xm . If A Ee em If A C mxxn, ", then its Hermitian Hermitian transpose (or conjugate conjugate transpose) is denoted by AHH (or H sometimes A*) and j)th entry is (AH)ij the bar bar indicates sometimes A*) and its its (i, j)\h entry is (A ), 7 = = (aji), («77), where where the indicates complex complex = a IX + jf$ jfJ (j = ii = jfJ. A A is conjugation; i.e., i.e., if z = (j = = R), v^T), then z = = IX a -— jfi. A matrix A is symmetric T H if A = A T and Hermitian A = A H. We henceforth if A = A Hermitian if A = A . We henceforth adopt the convention that, that, unless otherwise noted, an equation equation like = A ATT implies implies that that A is real-valued real-valued while while aa statement A = A is statement otherwise noted, an like A H like A A = AH implies that A A is complex-valued. = A complex-valued.
z
Remark While \/—\ most commonly commonly denoted denoted by in mathematics mathematics texts, Remark 1.1. While R isis most by ii in texts, }j is is the common notation notation in in electrical and system system theory. is some some the more more common electrical engineering engineering and theory. There There is advantage to being conversant with both notations. The notation j is used throughout the text but but reminders reminders are text are placed placed at at strategic strategic locations. locations. Example 1.2. 1.2. Example
~
1. A = [ ;
2. A
5
= [ 7+}
3 · A -- [ 7 -5 j
is symmetric symmetric (and Hermitian). ] is (and Hermitian). 7+ is complex-valued symmetric but Hermitian. 2 j ] is complex-valued symmetric but not not Hermitian.
7+} is Hermitian Hermitian (but symmetric). 2 ] is (but not not symmetric).
Transposes block matrices be defined defined in obvious way. is Transposes of of block matrices can can be in an an obvious way. For For example, example, it it is easy to to see see that that if if A,, are appropriately appropriately dimensioned dimensioned subblocks, subblocks, then easy Aij are then
r
= [
1.2. Matrix Arithmetic
3
11.2 .2 Matrix Arithmetic Arithmetic It is assumed that the reader is familiar with the fundamental notions of matrix addition, multiplication of a matrix by a scalar, and multiplication of matrices. A special case of matrix multiplication multiplication occurs when the second second matrix is a column i.e., the matrix-vector product Ax. Ax. A very important way to view this product is vector x, i.e., interpret it as a weighted weighted sum (linear combination) of the columns of A. That is, suppose to interpret (linear combination) suppose
A =
la' ....• a"1
E
m JR " with a,
Then Ax =
Xjal
E
JRm and x =
+ ... + Xnan
Il ;xn~
]
E jRm.
The importance importance of this interpretation interpretation cannot be overemphasized. As a numerical example, take = [96 take A A = [~ 85 74]x ~], x ==
!
2 . Then can quickly quickly calculate dot products rows of [~]. Then we we can calculate dot products of of the the rows of A A
column x to find Ax Ax = = [50[;~], matrix-vector product product can also be computed with the column 32]' but this matrix-vector computed via v1a
3.[ ~ J+2.[ ~ J+l.[ ~ l
For large arrays of numbers, there can be important computer-architecture-related computer-architecture-related advantages to preferring the latter calculation method. mxn nxp multiplication, suppose A e E R jRmxn and and B = [bi,...,b [hI,.'" hpp]] e E R jRnxp with For matrix multiplication, suppose A 1 hi E jRn.. Then the matrix product A AB bi e W B can be thought of as above, applied p times:
There is also an alternative, but equivalent, formulation of matrix multiplication that appears frequently in the text and is presented below as a theorem. Again, its importance cannot be overemphasized. It It is deceptively simple and its full understanding is well rewarded. pxn Theorem 1.3. [Uj, .... Theorem 1.3. Let U U = [MI, . . ,, un] un]Ee jRmxn Rmxn with withUiut Ee jRm Rm and andVV == [VI, [v{.•. ,...,, Vn] vn]Ee lRRPxn p jRP. with Vi vt eE R . Then
n
UV T
=
LUiVr E jRmxp. i=I
If (C D)TT = If matrices C and D are compatible for multiplication, recall that (CD) = DT DT C TT H H H (or (CD} (C D)H =— DH C H).). This gives a dual to the matrix-vector matrix-vector result above. Namely, if if D C mxn jRmxn has C EeR has row row vectors cJ cj Ee jRlxn, E l x ", and and is is premultiplied premultiplied by by aa row row vector yT yTeE jRlxm, Rlxm, then the product can be written as a weighted linear sum of the rows of C as follows: follows:
yTC=YICf +"'+Ymc~
EjRlxn.
Theorem 1.3 can then also be generalized to its "row reader. Theorem "row dual." The details are left left to the readei
4 4
1.3 1.3
Chapter Review Chapter 1. 1. Introduction Introduction and and Review
Inner Inner Products Products and and Orthogonality Orthogonality
For IRn, the Euclidean inner inner product For vectors vectors x, yy E e R", the Euclidean product (or inner inner product, for for short) short) of x and is given given by by yy is n
T (x, y) := x y = Lx;y;. ;=1
Note that that the inner product product is is aa scalar. Note the inner scalar. If we define complex Euclidean inner product product (or (or inner inner product, product, If x, y Ee
(x'Y}c :=xHy
= Lx;y;. ;=1
y)c x}c, Note that (x, (x, y) = (y, (y, x) i.e., the order order in in which which xx and yy appear appear in in the complex inner c = c, i.e., product is is important. important. The The more more conventional conventional definition definition of of the the complex inner product product is is product complex inner ((x, x , yy)c )c = yHxx = Eni=1 x;y; xiyi but the text text we with the = yH = L:7=1 but throughout throughout the we prefer prefer the the symmetry symmetry with the real real case. case.
Example 1.4. Let [1j]] and and yy == [~]. [1/2]. Then Then Example 1.4. Let xx = = [} (x, Y}c = [ }
JH [ ~ ] =
[I
- j] [
~
] = 1 - 2j
while while
and we see that, indeed, (x, (x, Y}c y)c = = {y, (y, x)c' x)c. and we see that, indeed, Note that that xx TTxx = = 0 0 if if and and only only if if xx = = 00 when when xx eE Rn IRn but but that that this this is is not not true true if ifxx eE Cn. en. Note HH What is true complex case and only if x = 0. illustrate, consider consider What is true in in the the complex case is is that that X x x = 00 if if and only if O. To To illustrate, T H the nonzero vector =0 the nonzero vector xx above. above. Then Then X x TXx = 0 but but X x HXX = = 2.2. n Two nonzero nonzero vectors vectors x, x, y eE IR to be be orthogonal if their their inner product is is Two R are are said said to orthogonal if inner product H zero, i.e., xxTTyy = = 0. if X 0. If xx and zero, i.e., O. Nonzero Nonzero complex complex vectors vectors are are orthogonal orthogonal if x Hyy = = O. and yy are are T T orthogonal and and X x TXx = and yyT = 1,1, then then we we say say that that xx and are orthonormal. orthonormal. A A orthogonal = 11 and yy = and yy are nxn T T nxn matrix A eE IR is an orthogonal matrix matrix if if A AT AAT = I, where where /I is is the the n n x x nn matrix R is an orthogonal AA = = AA = /, nx identity matrix. matrix. The notation /„ In is sometimes identity sometimes used used to denote denote the identity matrix in in IRRnxn " x nxn H H (or en xn). A eE en = I. Clearly (orC" "). Similarly, Similarly, a matrix A C xn is said said to be unitary if A H A = = AA H = an orthogonal orthogonal or or unitary unitary matrix rows and is an matrix has has orthonormal orthonormal rows and orthonormal orthonormal columns. columns. There There is mxn no special name attached attached to to aa nonsquare nonsquare matrix matrix A A e E ]Rrn"n (or € E e ))with no special name R mxn (or Cmxn with orthonormal orthonormal rows columns. rows or or columns.
1.4 1.4
Determinants Determinants
It A E IRnnxn xn It is assumed assumed that the reader is familiar with the basic theory of of determinants. determinants. For A eR nxn (or A A 6 E en we use use the the notation det A A for determinant of of A. A. We We list list below below some some of of (or C xn) ) we notation det for the the determinant
1.4. Determinants 1.4. Determinants
5
properties of determinants. Note that this is the more more useful properties is not aa minimal set, i.e., several of one or more of the others. properties are consequences properties are consequences of one or more of the others. 1. If If A A has a zero row or if any two rows of A A are equal, then det A A = = 0.o.
= 0. 2. If If A A has has aa zero zero column column or or if if any any two two columns columns of of A A are are equal, equal, then then det det A A = O. 3. Interchanging of A sign of 3. Interchanging two two rows rows of A changes changes only only the the sign of the the determinant. determinant. 4. Interchanging two columns of A changes only the sign of of the determinant. 5. scalar a 5. Multiplying Multiplying aa row row of of A A by by aa scalar ex results results in in aa new new matrix matrix whose whose determinant determinant is is a det A. exdetA. Multiplying a column of A A by a scalar 6. Multiplying scalar ex a results in a new matrix whose determinant determinant is a det is ex det A. A.
7. Multiplying of A scalar and and then then adding adding it it to 7. Multiplying aa row row of A by by aa scalar to another another row row does does not not change change the the determinant. determinant. 8. Multiplying aa column 8. column of of A by a scalar scalar and then adding it to another column column does does not change the the determinant. change determinant. nxn 9. det detAT = det detA = detA A eE C C"X"). AT = A (detA (det AHH = det A if A ).
10. If A is diagonal, diagonal, then det A = =a11a22 alla22 ... 10. If • • • ann, ann, i.e., i.e., det det AA isis the the product product of of its its diagonal diagonal elements. a22 ... 11. 11. If If A is upper triangular, then det det A = = all a11a22 • • • a"n. ann.
12. If triangular, then = a11a22 • • • ann. ann. 12. If A A is is lower lower triangUlar, then det det A A= alla22 ... 13. A is block block diagonal block upper triangular or block lower triangular), with 13. If A diagonal (or (or block A 11, A22, A 22 , ... An" A == square diagonal blocks A11, • • •,, A (of possibly different different sizes), then det A nn (of det A 11 det det A22 A22 ... det Ann. det A11 • • • det Ann. xn 14. If eRIRnnxn ,thendet(AB) = det 5. 14. If A, A, B B E , then det(AB) = det A A det det B. 1 15. If If A Rnxn, then =1det 15. A € E lR~xn, then det(Adet(A- 1)) = de: AA. . nxn xm mxm 16. A eE R lR~xn and D DE IR m detA det(D –- CA– CA-l 1 B). B). 16. If If A and eR ,, then det det [~ [Ac B~] A det(D D] = del Proof: from the LU factorization Proof" This This follows follows easily easily from the block block LU factorization
[~ ~J=[
~ ][ ~
xn mxm 17. If If A and D D eE RM , then then det det [~ [Ac B~] BD – 11C ). 17. A Ee R IRnnxn and lR~xm, det D D det(A det(A -– B DC). D] = det Proof" This follows easily from the block UL factorization Proof:
BD- 1 I
][
Chapter 1. 1. Introduction Introduction and and Review Chapter Review
6 6
Remark 1.5. The factorization of of aa matrix into the of aa unit lower triangular Remark 1.5. The factorization matrix A A into the product product of unit lower triangular matrix L L (i.e., lower triangular with all l's 1's on the diagonal) and an an upper triangular matrix V U is is called an an LV LU factorization; factorization; see, see, for example, example, [24]. [24]. Another Another such such factorization factorization is is VL UL where V U is unit upper triangular and L is lower triangular. triangular. The factorizations used above are block analogues of these. Remark [~ BD]. ~ ]. Remark 1.6. The matrix D -— e C A –-I1 BB is called the Schur complement of A in[AC
l
D – l C is the Schur complement of in [~ [AC B~D ]. Similarly, A -– B BD-Ie of D Din
EXERCISES EXERCISES 1. If A eE jRnxn a is a scalar, what is det(aA)? What is det(–A)? det(-A)? Rnxn and or A is orthogonal, what is det A? A? If A is unitary, unitary, what is det A? A? 2. If If A If A
3. Let Letx,y jRn. Show Showthatdet(l-xyT) x, y eE Rn. that det(I – xyT) = 11 – yTx. yTx. 4. Let U1, VI, V2, E jRn xn be orthogonal matrices. Show that the product V U2, ... . . .,,Vk Uk € Rnxn U = = VI U1 V2 U2 ... • • •V Ukk is is an an orthogonal matrix. 5. Let A A E of A, denoted denoted TrA, Tr A, is defined as the sum of its diagonal e jRNxn. R n x n . The trace of aii. elements, Eni=1 au· elements, i.e., i.e., TrA TrA = = L~=I linear function; i.e., if A, B eE JRn xn and a, ft f3 eE R, JR, then (a) Show that the trace is a linear Rnxn Tr(aA + f3B) fiB)= + fiTrB. Tr(aA = aTrA aTrA + f3TrB. (b) Show that Tr(AB) = Tr(BA), AB i= BA. Tr(Afl) = Tr(£A), even though in general AB ^ B A. nxn (c) Let S € E R jRnxn be skew-symmetric, skew-symmetric, i.e., S STT = = -So TrS = 0. O. Then -S. Show that TrS either prove the converse or provide a counterexample. x 6. A matrix A A eE W jRnxn A22 = A. " is said to be idempotent if A 22 / x™ . , • , 2cos<9 0 (a) Show that the matrix A _.. A = = --2!I [T|_ 2cos . 2f) sin 2^ sm 0
J. .
sin 20 1 . .d_, ..lor all II _sin. 20 is idempotent for 2sin aII #. o. r 2z 0 2sm2rt # J IS I empotent
X (b) Suppose A eE IR" jRn xn"isisidempotent Suppose A idempotentand andAAi=^ I.I. Show Showthat thatAAmust mustbe besingular. singular.
Chapter 2
Vector Vector Spaces Spaces
In this chapter of some the basic of vector In this chapter we we give give aa brief brief review review of some of of the basic concepts concepts of vector spaces. spaces. The The emphasis is on vector spaces, by special emphasis is on finite-dimensional finite-dimensional vector spaces, including including spaces spaces formed formed by special classes classes of matrices, but but some cited. An An excellent of matrices, some infinite-dimensional infinite-dimensional examples examples are are also also cited. excellent reference reference for the next next chapter chapter is where some proofs that that are are not not given given here here may may for this this and and the is [10], [10], where some of of the the proofs be found. be found.
2.1
Definitions and Examples
Definition A field field is set F IF together together with operations +, IF —> ~ IF such that that Definition 2.1. 2.1. A is aa set with two two operations +, .• : IF F xx F F such
(Al) a (P + y) y) = = (a (a +,8) + p ) + yy ffor o r all all a,,8, a, ft, yy Elf. € F. (Al) a + (,8 (A2) element 0 IF such such that 0= for all all a a Ee F. IF. (A2) there there exists exists an an element 0 Ee F that aa + 0 = aa. for (A3) for all IF, there element (-a) IF such a + (-a) O. (A3) for all aa eE F, there exists exists an an element (—a) eE F such that that a (—a) = 0. (A4) a a+ = ft ,8 + afar a for all all a, a, ,8 Elf. (A4) + ,8 p= ft e F.
(Ml) aa·- ((,8, p - yy)) = (a·,8)· ( a - p ) - yyf for o r all all a,,8, a, p, yy Elf. e F. (Ml) (M2) IF such that a .• II = for all aa Ee F. IF. (M2) there exists an element I1 Ee F = a for (M3) IF, a f. IF such that a .• a-I 1. (M3) for all a Ee ¥, ^ 0, 0, there exists an element a-I a"1 E€ F a~l == 1. (M4) (M4) aa·,8 • p =,8 = P .• afar a for all all a, a, ,8 p Ee IF. F. (D) (D)
= a·,8 +a· yy for for all a, ,8, y Elf. aa·- ((,8 p + y) y)=ci-p+aalia, p,ye¥.
Axioms (Al)-(A3) that (IF, +) is is aa group an abelian if (A4) also holds. Axioms (A1)-(A3) state state that (F, +) group and and an abelian group group if (A4) also holds. to), .)•) isis an Axioms Axioms (MI)-(M4) (M1)-(M4) state state that that (IF (F \\ {0}, an abelian abelian group. group. Generally speaking, speaking, when when no no confusion confusion can can arise, arise, the the multiplication multiplication operator operator "." is Generally "•" is not explicitly. not written written explicitly. 7
8
Chapter 2. Vector Spaces
Example 2.2. 2.2. Example 1. addition and is aa field. IR with with ordinary ordinary addition and multiplication multiplication is field. I. R 2. C with complex addition multiplication is 2. e with ordinary ordinary complex addition and and multiplication is aa field. field. 3. Raf.r] = = the field of 3. Ra[x] the field of rational rational functions functions in in the the indeterminate indeterminate xx =
{ao+
f30 +
atX f3t X
+ ... + apxP + ... + f3qXq
:aj,f3i EIR ;P,qEZ
+} ,
where = {O,l,2, {0,1,2, ... ...},}, is where Z+ Z+ = is aa field. field. mxn IR~ xn = 4. 4.RMr = {m m xx nn matrices matrices of of rank rank rr with with real real coefficients} coefficients) is is clearly clearly not not aa field field since, since, x for (Ml) does m= = n. n. Moreover, " is is not not aa field for example, example, (MI) does not not hold hold unless unless m Moreover, R" lR~xn field either either since (M4) (M4) does does not not hold (although the other 88 axioms hold). since hold in in general general (although the other axioms hold).
Definition vector space V together operations Definition 2.3. 2.3. A A vector space over over a a field field F IF is is a a set set V together with with two two operations -^VV and· and- :: IFF xxV -»•VV such such that that + ::VV xx VV -+ V -+ (VI) group. (VI) (V, (V, +) +) is is an an abelian abelian group. all a, a, f3 E F IF and andfor all vv E (V2) (V2) (a· ( a - pf3)) -. vv = = aa - .( (f3 P ' V. v) ) f ofor r all p e for all e V. V.
(V3) (a + f3). ft) • vv == a· a • vv + + pf3.• vv for F and for all vv e (V3) (a for all all a, a, p f3 € Elf andforall E V. V. (V4) a· a-(v w)=a-v w for all aa eElF F and for all v, w w Ee V. (V4) (v + w) = a . v + aa .w for all andfor all v, V. for all all vv E (V5) (V5) I· 1 • vv = = vv for eV V (1 (1 eElf). F). A vector vector space space is is denoted denoted by by (V, (V, F) IF) or, or, when when there there is is no no possibility possibility of of confusion confusion as as to to the the A underlying Id, simply V. underlying fie field, simply by by V.
Remark 2.4. Note that + + and from the + and and .• in Definition Remark 2.4. Note that and·• in in Definition Definition 2.3 2.3 are are different different from the + in Definition 2.1 in on different different objects in different different sets. In practice, practice, this this causes causes 2.1 in the the sense sense of of operating operating on objects in sets. In no confusion and the • operator operator is even written is usually usually not not even written explicitly. explicitly. no confusion and the· Example 2.5. Example 2.5. 1. (R", R) IR) with with addition addition defined defined by by I. (IRn,
and scalar multiplication defined by and scalar multiplication defined by
(en, e). is vector space. Similar definitions definitions hold hold for for (C", is aa vector space. Similar C).
2.2. Subspaces 2.2. Subspaces
99
JR) is vector space with addition addition defined defined by by 2. (JRmxn, (E mxn , E) is aa vector space with 2.
A+B=
[ ." P" a21 + + fJ2I .
amI
+ fJml
al2 a22
+ fJI2 + fJ22
aln + fJln a2n + fJ2n
am2 + fJm2
a mn
and scalar scalar multiplication and multiplication defined defined by by [ ya" y a 21 yA =
y a l2 y a 22
.
yaml
yam 2
ya," ya2n
.
+ fJmn
l
l
.
yamn
3. be an vector space be an be the 3. Let Let (V, (V, IF) F) be an arbitrary arbitrary vector space and and '0 V be an arbitrary arbitrary set. set. Let Let cf>('O, O(X>, V) V) be the set of of functions functions f/ mapping D to V. Then Then cf>('O, O(D, V) V) is is aa vector space with with addition addition set mapping '0 to V. vector space defined defined by by (f
+ g)(d) =
fed)
+ g(d)
for all d E '0 and for all f, g E cf>
and multiplication defined by and scalar scalar multiplication defined by (af)(d) = af(d) for all a E IF, for all d ED, and for all f E cf>. Special Special Cases: Cases: n (a) '0 V = = [to, [to, td, t\], (V, (V, IF) F) = = (JR (IR", E), and and the functions are are piecewise (a) the functions piecewise continuous continuous , JR), n n =: (C[to, td)n. =: (PC[to, (PC[f0, td)n t\]) or continuous continuous =: =: (C[? , h]) . 0
(b) '0
= [to, +00),
(V, IF)
= (JRn, JR), etc.
A E Ax(t)} is vector space 4. Let 4. Let A € JR(nxn. R"x". Then Then {x(t) (x(t) :: x(t) x(t) = = Ax(t}} is aa vector space (of (of dimension dimension n). n).
2.2 2.2
Subspaces Subspaces
Definition 2.6. 2.6. Let (V, IF) F) be be aa vector vector space space and and let let W W c~ V, V, W W f= = 0. 0. Then Then (W, (W, IF) F) is Definition Let (V, is aa subspace is itself space or, subspace of of (V, (V, IF) F) if if and and only only ifif (W, (W, IF) F) is itself aa vector vector space or, equivalently, equivalently, if if and and only only i f ( a w 1 + fJw2) ßW2) eE W for all a, a, fJß eE IF ¥ andforall and for all WI, w1, W2 w2 Ee W. if(awl foral! Remark 2.7. 2.7. The The latter latter characterization characterization of of aa subspace subspace is is often often the the easiest easiest way way to to check check Remark or that something in or prove prove that something is is indeed indeed aa subspace subspace (or (or vector vector space); space); i.e., i.e., verify verify that that the the set set in question Note, too, too, that this question is is closed closed under under addition addition and and scalar scalar multiplication. multiplication. Note, that since since 00 Ee IF, F, this implies that the zero vector must be in in any any subspace. subspace. implies that the zero vector must be Notation: When the the underlying underlying field field is understood, we we write write W W c~ V, the symbol Notation: When is understood, V, and and the symbol ~, c, when with vector vector spaces, spaces, is is henceforth henceforth understood to mean mean "is "is aa subspace subspace of." of." The The when used used with understood to less restrictive restrictive meaning meaning "is "is aa subset subset of' of" is is specifically specifically flagged flagged as as such. such. less
10
Chapter 2. Vector Spaces
Example 2.S. Example 2.8. x 1. (V,lF) and let W = {A e E R" JR.nxn A is symmetric}. Then 1. Consider (V, F) = = (JR.nxn,JR.) (R" X ",R) and = [A " :: A
We V. W~V.
Proof: symmetric. Then easily shown shown that + f3A2 fiAi is Proof' Suppose Suppose A\, AI, A A22 are are symmetric. Then it it is is easily that ctA\ aAI + is symmetric for for all all a, a, f3 R symmetric ft eE R. x ]Rnxn not a subspace of JR.nxn. 2. Let W = {A €E R" " :: A is orthogonal}. Then W is /wf R"x". 2 2 3. (V, F) = (R = [v1v2 identify v1 3. Consider Consider (V, IF) = (]R2,, R) JR.) and and for for each each vv €E R ]R2 of of the the form form vv = [~~ ]] identify VI with with with the the y-coordinate. y-coordinate. For For a, f3 R define the jc-coordinate x-coordinate in in the the plane plane and and V2 the u2 with ß eE R, define
W",/l =
{V : v =
[ ac
~
f3 ]
;
c
E
JR.} .
Then Wa,ß is V if and only = 0. interesting exercise, Then W",/l is aa subspace subspace of of V if and only if if f3ß = O. As As an an interesting exercise, sketch sketch W2,1, W2,o,W1/2,1, andW1/2, too, that that the the vertical vertical line line through through the the origin origin (i.e., (i.e., W2.I, W2,O, Wi,I' and Wi,o, Note, too, 0. Note, a == 00) a oo) is is also also aa subspace. subspace. All origin are Shifted subspaces Wa,ß with All lines lines through through the the origin are subspaces. subspaces. Shifted subspaces W",/l with f3ß = =1= 0 0 are are called linear called linear varieties. varieties. Henceforth, dependence of space on Henceforth, we we drop drop the the explicit explicit dependence of aa vector vector space on an an underlying underlying field. field. Thus, usually denotes denotes aa vector vector space space with with the the underlying underlying field field generally generally being being R JR. unless Thus, V V usually unless explicitly stated stated otherwise. explicitly otherwise. Definition 12, and vector spaces (or subspaces), then RR = Definition 2.9. 2.9. IfffR and SS are are vector spaces (or subspaces), then = SS if if and and only only ifif S and S C R. RC ~SandS ~ R. To prove prove two two vector vector spaces are equal, equal, one one usually usually proves proves the the two two inclusions inclusions separately: Note: To spaces are separately: An is shown shown to arbitrary s5 E€ S is is shown shown to An arbitrary arbitrary rr eE R is to be be an an element element of of S and and then then an an arbitrary to be an element of be an element of R. R.
2.3 2.3
Linear Independence Independence Linear
Let • •}} be V. Let X X = {v1, {VI, v2, V2, ••.• be aa nonempty nonempty collection collection of of vectors vectors u, Vi in in some some vector vector space space V. Definition 2.10. 2.10. X X is a linearly linearly dependent set set of of vectors if and only Definition if and only if if there exist exist k distinct distinct X and and scalars scalars aI, not all all zero zero such such that that elements VI, elements v1, ... . . . ,, Vk vk eE X a1, ..• . . . ,, (Xk ak not
X linearly independent if and and only any collection collection of distinct X is is aa linearly independent set set of of vectors vectors if only ififfor for any of kk distinct elements VI, v1, ... . . . ,,Vk of X . . . ,, ak, elements Vk of X and and for for any any scalars scalars a1, aI, ••• ak, al VI
+ ... + (XkVk = 0 implies
al
= 0, ... , ak = O.
11 11
2.3. Linear Independence Independence 2.3. Linear Example 2.11. Example 2.11.
~,
~
1. LetV = R3. Then Then {[ I. 1£t V =
However, [ Howe,."I
HiHi] } Ime~ly
is a linearly independent set. Why? i" independent.. Why?
i1[i1[l ]}
de~ndent ~t
linearly dependent set iss aa Iin=ly
(since v2 + + v3 0). (since 2v\ 2vI — - V2 V3 = = 0). xm tA m A E ]Rnxm. Thenconsider considerthe therows rows of ofeetA as vectors vectors in in C em[t [to, tIl 2. Let A e ]Rnxn R xn and 5B eE R" . Then BB as 0, t1] fA (recall that etA e denotes the matrix exponential, which is discussed in more detail in Chapter 11). 11). Independence these vectors vectors turns concept Chapter Independence of of these turns out out to to be be equivalent equivalent to to aa concept called to be be studied further in in what what follows. follows. called controllability, to studied further
]Rn, ii E consider the matrix V = , Vk] eE ]Rnxk. Let Vi vf eE R", e If, k, and consider = [VI, [ v 1 , ... ... ,Vk] Rnxk. The The linear dependence of of this this set of vectors vectors is is equivalent to the the existence existence of nonzero vector vector a eE ]Rk dependence set of equivalent to of aa nonzero Rk O. An equivalent condition linear dependence dependence is that the k x x k matrix such that Va = 0. condition for linear VT VT VV is is singular. singular. If If the the set set of of vectors vectors is is independent, independent, and and there there exists exists aa Ee ]Rk R* such such that that Va then a = = 0. O. An An equivalent equivalent condition for linear independence is is that that the the matrix Va = = 0, 0, then condition for linear independence matrix V TTVV is is nonsingular. nonsingular.
Definition 2.12. 2.12. Let X X = Vi E span of of Definition = {VI, [ v 1 , V2, v 2 , ...• . . }} be a collection of of vectors vi. e V. Then the span X is as X is defined defined as Sp(X) = Sp{VI, = {v :
V
V2, ... }
= (Xl VI
+ ... + (XkVk
;
(Xi ElF,
Vi
EX, kEN},
where N = {I, {1, 2, ... ...}. }. n 2.13. Let V = ]Rn and Example 2.13. =R and define
el
=
0 0
o
, e2 =
0 1 0
,'" ,en =
0 0 0
o
SpIel, e2, , en} ]Rn. Then Sp{e1, e2, ... ...,e = Rn. n} =
Definition of vectors V if if and Definition 2.14. 2.14. A A set set of vectors X X is is aa basis basis for for V and only only ijif
1. X X is a linearly independent set (of (of basis vectors), and and 2. Sp(X) Sp(X) = = V. 2. V.
Chapter 2. Vector Spaces
12 12
Example 2.15. {el, ... , en} for]Rn [e\,..., en} is a basis for IR" (sometimes called the natural natural basis). Now let bb1, ..., , bnn be a basis (with a specific order associated with the basis vectors) l , ... for V. Then Then for for all all v E e V there there exists exists aa unique unique n-tuple {E1 , ... . . . , ,E~n} n } such such that that for n-tuple {~I'
v= where
B
~
~Ibl
+ ... + ~nbn
[b".,b.l. x
= Bx,
DJ
~
Definition 2.16. The scalars {Ei} components (or sometimes the coordinates) coordinates) Definition 2.16. {~i } are called the components of ... , b } and are unique. We that the vector x of of , of v with respect to the basis {b (b1, ..., ] unique. We say nn l represents the vector v with respect to the basis B. B. components represents Example 2.17. In]Rn, In Rn, VI ]
:
+ V2e2 + ... + vne n ·
= vlel
Vn
We can can also also determine determine components components of of vv with with respect respect to to another another basis. For example, example, while We basis. For while [
~
] = I . el
+ 2 . e2,
with respect respect to to the basis with the basis
{[-~l[-!J} we have we have [
~
] = 3.[
-~
]
-~
+ 4· [
l
To see this, write [
~
] =
XI • [
= [ -~ Then Then
[ ~~ ] = [ -;
-
~ + ]
X2 • [
_! ]
-! ][ ~~ l -1
r
I [
;
]
=[ ~
l
Theorem 2.18. 2.18. The The number number of of elements elements in in aa basis basis of of aa vector is independent independent of of the the Theorem vector space space is particular basis considered. particular basis considered. Definition 2.19. 2.19. If V= 0) 0) has n elements, V V is is said to Definition If a basis X X for for a vector space V(Jf be n.dimensional n-dimensional or or have have dimension dimension nn and and we we write write dim dim(V) n or or dim dim V V — n. n. For be (V) = n For
=
=
2.4. 2.4. Sums Sums and and Intersections Intersectionsof of Subspaces Subspaces
13 13
consistency, space, we we define define dim(O) O. A A consistency, and and because because the the 00 vector vector is is in in any any vector vector space, dim(O) = = 0. vector space V is is finite-dimensional finite-dimensional if there exists exists aa basis basis X with nn < < +00 elements; if there X with +00 elements; vector space V otherwise, otherwise, V V is is infinite-dimensional. infinite-dimensional.
Thus, Theorem 2.18 says says that number of in aa basis. basis. the number of elements elements in Thus, Theorem 2.18 that dim(V) dim (V) = the Example Example 2.20. 2.20. 1. dim(~n) = n. dim(Rn)=n. 2. dim(~mXn) dim(R mxn ) = mn. mn.
Note: Check basis for by the mn matrices m, jj Ee ~, Note: Check that that aa basis for ~mxn Rmxn is is given given by the mn matrices Eij; Eij; ii eE m, n, where Efj is all of elements are are 00 except except for (i, J)th j)th location. location. where Eij is aa matrix matrix all of whose whose elements for aa 11 in in the the (i, The collection of E;j Eij matrices matrices can can be be called called the the "natural "natural basis matrices." The collection of basis matrices." 3. dim(C[to, tJJ) t1]) = - +00. +00. T A =A AT} 4. dim{A dim{A E€ ~nxn Rnxn :: A } = = !n(n {1/2(n + 1). 1 (To see see why, why, determine 1) symmetric basis matrices.) matrices.) (To determine1/2n(n !n(n + 1) symmetric basis 2
5. A is upper (lower) triangular} = !n(n + 1). 1). 5. dim{A dim{A Ee ~nxn Rnxn :: A is upper (lower) triangular} =1/2n(n
2.4 2.4
Sums and Intersections of Subspaces Subspaces
Definition 2.21. 2.21. Let (V, JF') F) be vector space let 71, c V. sum and and intersection Definition Let (V, be a a vector space and and let R, SS S; V. The The sum intersection ofR and SS are defined respectively respectively by: of R, and are defined by:
1. R n+S S = {r {r + ss :: rr eE R, U, ss eE S}. 5}. 1. 2. ft R n S = R and S}. 2. H5 = {v {v :: vv Ee 7^ and vv Ee 5}.
Theorem 2.22. 2.22. Theorem kK
1. K CV V (in (in general, U\ -\+ 1. R + SS S; general, RI
=: L ]T R; ft/ S; C V, V, for for finite finite k). k). ... +h 7^ Rk =: 1=1
;=1
2. D5 CV V (in (in general, 2. 72. R n S S; general,
f] n
a eA CiEA
*R, CV V/or an arbitrary arbitrary index index set A). Raa S; for an set A).
Remark 2.23. 2.23. The U S, Remark The union union of of two two subspaces, subspaces, R C S, is is not not necessarily necessarily aa subspace. subspace.
Definition = R 0 SS is is the the direct direct sum sum of R and and SS ifif Definition 2.24. 2.24. T = REB ofR 1. R n S == 0, 0, and 1. n and
(L
L
2. (in general am/ ]P ft,2. U R + SS = = T (in general, ft; R; n (^ ft,-) R j ) == 00 and Ri == T). T). H; y>f
« The subspaces R, Rand are said said to complements of of each The subspaces and SS are to be be complements each other other in in T. T.
14 14
Chapter 2. 2. Vector Vector Spaces Spaces Chapter
2 Remark 2.25. unique. For example, consider V = jR2 2.25. The complement of R ft (or S) S) is not unique. =R and let let R ft be be any any line line through through the the origin. any other other distinct line through origin is and origin. Then Then any distinct line through the the origin is a complement of R. ft. Among all the complements there is a unique unique one orthogonal to R. ft. We discuss more about orthogonal complements elsewhere in the text.
Theorem 2.26. Suppose =RR O Theorem 2.26. Suppose T = EB S. Then Then
1. every written uniquely uniquely in every tt E€ T can can be be written in the the form form tt = rr + ss with with rr Ee Rand R and ss Ee S. S. 2. 2. dim(T) = = dim(R) dim(ft) + + dim(S).
Proof: To Proof: To prove the first part, suppose an arbitrary vector tt Ee T can be written in two ways as tt = r1 S2, where r2 Ee Rand R. and s1, e S. Then r2 = s2— s\. But rl + s1 Sl = r2 r2 + S2, where r1, rl, r2 SI, S2 S2 E Then r1 r, — - r2 S2 - SI. But as r1 -–r2 ft and 52 S. Since Since Rft n fl S = 0, 0, we r\ = r2 r-i and and s\ from rl r2 £ E Rand S2 -— si SI eE S. we must must have have rl SI = si S2 from which uniqueness follows. which uniqueness follows. 0 The statement of the second part is a special case of the next theorem. D Theorem 2.27. ft, S S of of a vector space space V, V, Theorem 2.27. For For arbitrary arbitrary subspaces subspaces R, a vector dim(R + S) = dim(R)
+ dim(S) -
dim(R n S).
x Example 2.28. jRn xn the Example 2.28. Let U be the subspace of upper triangular matrices in E" " and let £.c be the nxn subspace of lower triangUlar jRn xn. xn triangular matrices in R . Then it may be checked that U + + .c L= = jRn Rnxn nxn un.c jRnxn. while U n £ is the set of diagonal matrices in R . Using the fact that dim {diagonal (diagonal matrices} = = n, n, together with Examples 2.20.2 and 2.20.5, one can easily verify the validity validity of formula given given in Theorem 2.27. 2.27. of the the formula in Theorem
Example 2.29. Example 2.29. x jRnxn, and let R" ", and let S
Let (V, jR), let R (V, IF) F) = (jRnxn, (R n x n , R), ft be the set of skew-symmetric matrices in x be the set of symmetric matrices in jRnxn. the set in R" ". Then V = U $0 S. S.
n
x Proof: This follows easily from the fact that any A A E jRnxn written in the form Proof: e R" " can be written
1
TIT
A=2:(A+A )+2:(A-A).
The first matrix on the right-hand side above is in S while the second is in R. ft.
EXERCISES EXERCISES 1. ... , Vk} vd is a linearly dependent set. Then show that one of the vectors 1. Suppose {VI, {vi,..., must be a linear combination of the others. XI, *2, X2, ... Xk E jRn be nonzero mutually ... , 2. Let x\, . . . ,, x/c E R" mutually orthogonal vectors. Show that {XI, [x\,..., Xk} must be linearly independent independent set. set. Xk} must be aa linearly
3. Let VI, ... ,v , Vn jRn. Show that Av\,..., Av" •.. , Av AVnn are orv\,... are also orn be orthonormal vectors in R". x jRnxn thonormal if and only if A Ee R" " is orthogonal. 4. Consider = [2 Consider the vectors VI v\ — [2 1l]fr and V2 1*2== [3[3 1f. l] r .Prove Provethat thatVIviand andV2V2form forma abasis basis 2 for R v= = [4 [4 If l]r with respect to this basis. jR2.. Find the components of the vector v
Exercises Exercises
15
5. Let Let P denote set of polynomials of degree less than or or equal two of the form form 5. denote the the set of polynomials of degree less than equal to to two of the 2 Po p\xX + pix where Po, po, PI, p\, p2 e R. Show that is aa vector vector space space over over R E. Show Show Po + PI P2x2,, where P2 E R Show that P is x, and - 1 basis for Find the the that the polynomials polynomials 1, that the 1, *, and 2x2 2x2 — 1 are are aa basis for P. Find the components components of of the 22 polynomial 22 + + 3x 3x + 4x basis. 4x with with respect respect to to this this basis. polynomial 6. Prove Theorem case of only). 6. Prove Theorem 2.22 2.22 (for (for the the case of two two subspaces subspaces Rand R and S only).
7. Let denote the vector space space of of degree degree less equal to and of 7. Let Pnn denote the vector of polynomials polynomials of less than than or or equal to n, n, and of n the form p ( x ) = po + p\x + • • • + p x , where the coefficients /?, are all real. Let PE the form p(x) Po + PIX + ... + Pnxn, where the coefficients Pi are all real. Let PE n denote subspace of all even even polynomials in Pnn,, i.e., i.e., those that satisfy satisfy the property denote the the subspace of all polynomials in those that the property p(—x} = p(x). Similarly, let let PQ denote the subspace of of all all odd polynomials, i.e., i.e., p( -x) = p(x). Similarly, Po denote the subspace odd polynomials, those satisfying p(—x} = – p ( x ) . Show that P = P © POthose satisfying p(-x) = -p(x). Show that nn = PE E EB Po· 8. Repeat using instead instead the subspaces T 7" of of tridiagonal 8. Repeat Example Example 2.28 2.28 using the two two subspaces tridiagonal matrices matrices and and U of of upper upper triangular triangular matrices. matrices. U
This page intentionally intentionally left left blank blank This page
Chapter 3 Chapter 3
Linear Linear Transformations Transformations
3.1 3.1
Definition Definition and and Examples Examples
definition of of aa linear linear transformation (or (or linear map, linear function, function, We begin with the basic definition or linear operator) between two two vector vector spaces. or linear operator) between spaces. Let (V, F) IF) and and (W, IF) be be vector vector spaces. spaces. Then I:- :: V -+ Definition 3.1. Let (W, F) Then C -> W is aa linear transformation if and if transformation if and only only if I:-(avi + {3V2) al:-vi + {3I:-V2 for all all a, a, {3 ElF and for all v VI, V22e E V. £(avi pv2) = = aCv\ fi£v2 far £e F and far all V. },v The vector space space V is called called the I:- while while VV, W, the space into into the domain of of the the transformation transformation C the space The vector which it it maps, maps, is called the which is called the co-domain.
Example 3.2. Example 3.2. 1. Let F IF = R JR and take V W = PC[f PC[to, and take V= W +00). 1. Let 0, +00). Define £ I:- :: PC[t PC[to, +00) -+ PC[to, +00) by by Define -> PC[t 0, +00) 0, +00) vet)
f--+
wet) = (I:-v)(t) =
11
e-(t-r)v(r) dr.
to mxm 2. Let Let F IF = R JR and W = JRmxn. Fix M MEe R JRmxm.. and take V V= W R mx ". Fix mx -+ JRmxn mxn by Define £ I:- :: JRmxn R " -> M by
X
f--+
Y
= I:-X = MX.
n : a, 3. Let F IF = =n R JR and take V = P" pn = {p(x) ao0 + ct alx + ... +h aanx ai E E R} JR} and and 3. Let and take V= (p(x) = a }x H nx" 1 W = pn-l. w = -p -. I:- : V -+ p', where' where I denotes Define C.: —> W by I:-Lpp =— p', denotes differentiation differentiation with respect to x. x.
17
Chapter 3. Linear Chapters. Li near Transformations Transformations
18
3.2 3.2
Matrix Representation Representation of Linear Transformations Transformations Matrix of Linear
Linear conLinear transformations transformations between between vector vector spaces spaces with with specific specific bases bases can can be be represented represented conSpecifically, suppose £L : (V, F) IF) —>• ~ (W, IF) is linear and further veniently in matrix form. Specifically, (W, F) suppose that {Vi, ~} and {Wj, {w j, j E {u,, i eE n} e !!!.} m] are bases for V V and W, respectively. Then the ith column of A = = Mat £ L (the matrix representation of £L with respect to the given bases for V V and and W) of £i>, {w}j,•, jj eE m}. raj. In for W) is is the the representation representation of LVi with with respect respect to to {w In other other words, words,
al
n
:
A=
]
E
JR.mxn
a mn
represents £ since since represents L LVi = aliwl
+ ... + amiWm
=Wai,
where W= = [w\,..., wm]]and where W [WI, ... , w and
L depends on the particular bases for V is the ith z'th column of A. Note that A = Mat £ V and W. This could be reflected by subscripts, say, in the notation, but this is usually usually not done. uniquely determined determined (by linearity) The action of £L on an arbitrary vector Vv eE V V is uniquely by action on on aa basis. Thus, if v = = E1v1 + ... ••• + + E vn = = V Vxx (where and hence by its its action basis. Thus, if V ~I VI + ~nnVn (where u, v, and hence jc, x, is is arbitrary), then arbitrary), then LVx = Lv = ~ILvI
+ ... + ~nLvn
=~IWal+"'+~nWan
= WAx.
Thus, £V WA since xx was was arbitrary. arbitrary. Thus, LV = W A since When V= = R", W == lR. Rmm and and {Vi, [ v i , ii Ee n}, [ w jj', jj eE !!!.} m} are are the (natural) bases, bases When V JR.n, W ~}, {W the usual usual (natural) WA linea LV = W A becomes simply £ L = = A. A. We We thus commonly identify identify A A as a linear the equation £V transformation with its matrix i.e., transformation with its matrix representation, representation, i.e.,
m Thinking of as aa matrix matrix and from Rn Rm usually Thinking of A both both as and as as aa linear linear transformation transformation from JR." to to lR. usually causes causes no no naturally to appropriate matrix multiplication. confusion. Change of basis then corresponds naturally
3.3. Composition Transformations 3.3. Composition of ofTransformations
3.3
19 19
Composition Composition of Transformations
Consider three vector spaces U, V, and W Wand and transformations B from U to V and A from V to to W. W. Then Then we we can can define define aa new new transformation transformation C C as as follows: follows:
C The above diagram C = = AB. The above diagram illustrates illustrates the the composition composition of of transformations transformations C AB. Note Note that that in in most texts, the arrows above are reversed as follows:
C However, it might be useful to prefer the former since the transformations A and B appear in the same order order in dimZ// = = p, = n, n, in the same in both both the the diagram diagram and and the the equation. equation. If If dimU p, dimV dim V = and W = m, and if if we associate matrices the transformations transformations in in the and dim dim W m, and we associate matrices with with the the usual usual way, way, then composition composition of corresponds to to standard standard matrix multiplication. That That is, then of transformations transformations corresponds matrix mUltiplication. is, we have C C —A AB B .. The above is sometimes expressed expressed componentwise by the mxp
nxp
formula n
cij
=
L
aikbkj.
k=1
Two Two Special Special Cases: Inner Product: Let x, y eE Rn. ~n. Then their inner product is the scalar Inner Product: n
xTy = Lx;y;. ;=1
m Outer ~m, Outer Product: Product: Let x eE R , yy eE ~n. Rn. Then their outer product is the m x n matrix matrix
Note that any rank-one matrix A eE ~mxn Rmxn can be written in the form A = = xyT xyT H mxn mxn above (or xy xyH if A Ee C c ).). A rank-one symmetric matrix can be written in the form XX xx TT (or xx XXHH).).
20 20
Chapter Chapter 3. 3. LinearTransformations Li near Transformations
3.4 3.4
Structure Structure of of Linear Linear Transformations Transformations
Let A :: V --+ W be transformation. Let A V —> be aa linear linear transformation.
Definition3.3. A, denotedR(A), set {w Av for for some Definition 3.3. The The range range of of A, denotedlZ( A), is is the the set {w Ee W : w w= = Av some vv Ee V}. V}. Equivalently, R(A) =— {Av {Av : v Ee V}. V}. The range of of A is also known as the image of of A and denoted denoted Im(A). Im(A). The Av = of The nullspace of of A, denoted denoted N(A), N(A), is is the the set {v {v Ee V V : Av = O}. 0}. The The nullspace nullspace of kernel of of A and and denoted Ker (A). A is also known as the kernel (A). Theorem 3.4. Let Let A A :: V V --+ —>• W be be aa linear linear transformation. transformation. Then Then 1. R(A) 1. R ( A ) S; C W. W.
2. V. 2. N(A) N(A) S; c V.
Note N(A) and Note that that N(A) and R(A) R(A) are, are, in in general, general, subspaces subspaces of of different different spaces. spaces. mxn Theorem 3.5. Let A Ee R ~mxn.. If ... ,,a an], If A is written in in terms of of its columns as A = = [ai, [a\,... n], then then R(A) = Sp{al, ... , an} . then
Proof: The the defiProof: The proof proof of of this this theorem theorem is is easy, easy, essentially essentially following following immediately immediately from from the definition. 0 nition. D
Remark 3.6. Note Note that is that in in Theorem Theorem 3.5 and and throughout throughout the the text, text, the the same same symbol symbol (A) (A) is used to denote both aa linear the used to denote both linear transformation transformation and and its its matrix matrix representation representation with with respect respect to to the usual usual (natural) (natural) bases. bases. See See also also the the last paragraph of of Section Section 3.2. 3.2. Definition 3.7. ... , vk] vd be a set of 3.7. Let {VI, {v1,..., of nonzero vectors Vi u, Ee ~n. Rn. The set is said to be orthogonal orthogonal if if' vr vjvjv j = 00 for ^ jj and and orthonormal orthonormal if if vr vf vvjj = 88ij' 8tj is is the for ii f= where 8ij the be ij, where Kronecker Kronecker delta delta defined defined by by
8 = {I0 ij
ifi=j, if i f= j.
Example 3.8. 3.8.
J. [-: J}
1. {[
~
2. {[
~~i
is an an orthogonal orthogonal set. set. is
],[ -:~~ J}
is an an orthonormal orthonormal set. set. is
. h Vi • hogonaI set, . isan an en {I~ ~, ... ~ | IS 33.. If {{VI, t > i •.• , . . . ,,Vk Vk}} Wit with u, E.IN,. € 1Tlln M." IS is an ort orthogonal set,ththen —/==, - -.,, ~} —/=== an orthonormal orthonormal set. set.
I ~VI ^/v, VI vi
^/v'k vk ~~~
]
3.4. of Li near Transformations Transformations 3.4. Structure Structure of Linear
21 21
Definition 3.9. Let S <; ]Rn. Then the orthogonal complement of c Rn. Then the of S is defined defined as the set 1 S~={VE]Rn: S - = {v e Rn : vTs=OforallsES}. VTS = 0 for all s e S}.
Example 3.10. 3.10. Let
Then it can be shown that
Working from the definition, the computation involved is simply to find all nontrivial (i.e., nonzero) solutions of the system of equations 3xI -4xI
+ 5X2 + 7X3 = 0, + X2 + X3 = 0.
Note that there is nothing special about the two vectors in the basis defining S being orthogonal. Any set of vectors will do, including dependent spanning vectors (which would, of course, then give rise to redundant equations).
n,
n Theorem 3.11. 311 Let Theorem Let R SS C <; R ]Rn. The Then
2. S \B S~ = ]Rn. 3. (S~)l.
= S.
4.
n <; S
5.
(n + S)~ = nl. n S~.
6.
(n n S)~
if and only if S~ <;
= n~
n~.
+ S~.
Proof: Proof: We prove and discuss only item 2 here. The proofs of the other results are left left as exercises. Let {VI, ]Rn be an arbitrary {v1, ... ..., , Vk} vk} be an orthonormal basis for S and let x E e Rn vector. vector. Set Set k
L (xT Vi)Vi,
XI
=
X2
=X
;=1 -XI.
22 22
Chapter 3. Li Linear Chapters. near Transformations Transformations
Then e
we see that is orthogonal orthogonal to ..., , Vk Vk and and hence of these these we see that x2 X2 is to v1, VI, .•. hence to to any any linear linear combination combination of vectors. other words, S. We vectors. In In other words, X2 X2 is is orthogonal orthogonal to to any any vector vector in in S. We have have thus thus shown shown that that IRn. We We also have that SS U n S.l S + S.l S1 == Rn. S1 ==00 since the the only vector s Ee S orthogonal orthogonal to everything in (i.e., including everything in S (i.e., including itself) itself) is is 0. O. It decompositions, we It is also easy to see directly that, when we have such direct sum decompositions, can write vectors vectors in unique way way with with respect respect to to the the corresponding corresponding subspaces. can write in aa unique subspaces. Suppose, Suppose, 1 for example, = x'1+ , where x\, x 1 E S and x2, x' e S . Then Then for example, that that xx = = x1 XI + x2. X2 = x; + x' x~, where XI, x; E Sand X2, x~ E S.l. 2 2 T T (x; — - x1) XI/(x' (x~ - x2) X2) = 0 by definition definition of of ST. S.l. But But then then (x'1 (x; — - XI)T xd = 00 since (x'1 0 by x1) (x; (x'1 -– x1) since 2 — xx~2 — X2 = (x'1 — x1) (which x'2). Thus, -X2 = — -(x; -XI) (which follows follows by by rearranging rearranging the the equation equation x1+x2 XI +X2 = = x'1 x; + +x~). Thus, XI — = x'1 x; andx2 0 x1 and x2 == xx~. D 2. m Theorem 3.12. 3.12. Let Let A A :: IR -+ R IRm. Then Theorem Rnn —> . Then R(A Tr ).). (Note: for finite-dimensional 1. N(A).l N(A)1" = 7£(A (Note: This This holds only for finite-dimensional vector spaces.) spaces.) 1 2. R(A).l = J\f(A N(ATT).). (Note: also holds holds for for infinite-dimensional infinite-dimensional vector vector spaces.) 2. 'R,(A) ~ — (Note: This This also spaces.)
Proof: To To prove the first part, take an N(A). Then Ax Ax = = 0 and Proof: an arbitrary xx eE A/"(A). and this is T T Ax = But yyT Ax = = (AT x. Thus, Thus, Ax Ax = = 0 if and and only only if if xx equivalent to to yyT equivalent Ax = 00 for for all all y. v. But Ax ( A T yy{ ) x. 0 if T r orthogonal to all vectors of the form AT y, is orthogonal form A v, i.e., i.e., xx eE R(AT).l. R(A ) . Since Since xx was arbitrary, we ). have established established thatN(A).l that N(A)1 = R(A U(ATT}. The proof proof of of the the second second part part is is similar similar and and is left as as an an exercise. 0 The is left exercise. D m Let A A :: R IRnn -+ IRm. IRn :: Av Av = = 0} O} is is sometimes sometimes called called the the Definition 3.13. 3.13. Let Definition -> R . Then Then {v {v Ee R" m m TT right nullspace nullspace of of A. A. Similarly, Similarly, (w {w e E R IR :: w A = = 0} O} is is called called the left nullspace nullspace of right W A the left of A. A. Clearly, the right right nullspace nullspace is is A/"(A) N(A) while while the the left ). Clearly, the left nullspace nullspace is is N(A J\f(ATT).
Theorem 3.12 and and part Theorem 3.12 part 22 of of Theorem Theorem 3.11 3.11 can can be be combined combined to to give give two two very very funfundamental and useful decompositions decompositions of vectors in the domain and damental and co-domain of a linear transformation See also 2.26. A. See also Theorem Theorem 2.26. transformation A. m Theorem R"n -> . Then Theorem 3.14 3.14 (Decomposition (Decomposition Theorem). Theorem). Let Let A A :: IR -+ R IRm. Then
1. every every vector space R" IRn can can be written in in a a unique unique way way as as vv = 7. vector vv in in the the domain domain space be written = xx + y, y, ± E M(A) N(A) and E J\f(A) N(A).l = R(AT) N(A) EB ». where x € and y € ft(Ar) (i.e., (i.e., IR R"n = M(A) 0 R(A ft(ATr)).
2. every in the the co-domain Rmm can a unique asww = x+y, every vector vector w in co-domain space space IR can be be written written in ina unique way way as = x+y, 1 R(A) and and y e E ft(A) R(A).l- = Af(A N(AT)T ) (i.e., IRmm = R(A) 0 EBN(A ». where x eE U(A) (i.e., R = 7l(A) M(ATT)). This key key theorem theorem becomes becomes very very easy easy to to remember remember by by carefully studying and underThis carefully studying and understanding Figure Figure 3.1 in the the next next section. standing 3.1 in section.
3.5 3.5
Four Four Fundamental Fundamental Subspaces Subspaces
x Consider aa general general matrix matrix A A € E E^ lR;,xn. When thought thought of of as as aa linear linear transformation transformation from Consider ". When from IR E"n m to of A can be in terms fundamental subspaces subspaces to R IRm,, many many properties properties of A can be developed developed in terms of of the the four four fundamental
3.5. Four Four Fundamental Fundamental Subspaces Subspaces 3.5.
23 23
A
N(A)1-
r
r
X
EB {OJ
{O}Gl
m -r
n-r
Figure fundamental subspaces. Figure 3.1. 3.1. Four fundamental subspaces. R(A), 'R.(A)^, R(A)1-, AN(A), properties seem almost 7£(A), f ( A ) , and N(A)1-. N(A)T. Figure 3.1 3.1 makes many key properties obvious and and we return to to this this figure figure frequently frequently both both in in the the context context of of linear linear transformations obvious we return transformations and in in illustrating illustrating concepts concepts such such as as controllability controllability and and observability. observability. and
be aa linear linear transfortransforDefinition 3.15. Let W be spaces and and let let A Definition 3.15. Let V and and W be vector vector spaces A :: V -+ W be motion. mation. 1. A is onto onto (also (also called called epic epic or or surjective) surjective) ifR(A) ifR,(A) = = W. W. 1. A is 2. A is one-to-one one-to-one or or 1-1 1-1 (also (also called called monic monic or or injective) infective) if ifJ\f(A) 0. Two Two equivalent equivalent 2. A is N(A) == O. characterizations of A 1-1 that that are are often often easier to verify verify in in practice are the the characterizations of A being being 1-1 easier to practice are following: following: (a) AVI = AV2 (b)
VI
===} VI
= V2 .
t= V2 ===} AVI t= AV2 .
m Definition 3.16. 3.16. Let A : E" IR n -+ IRm. rank(A) = dim R(A). This is sometimes called -> R . Then rank(A) dimftCA). the column column rank rank of of A (maximum number of of independent independent columns). The row row rank rank of of A is
24 24
Chapter 3. LinearTransformations Chapter3. Linear Transformations
r dim 7£(A R(AT) ) (maximum number of of independent independent rows). rows). The dual notion to rank is the nullity of A, sometimes denoted of A, denoted nullity(A) nullity(A) or or corank(A), corank(A), and and is is defined defined as as dimN(A). dim A/"(A). n m Theorem 3.17. 3.17. Let A :: R ]Rn -> ~ R ]Rm.. Then dim K(A) R(A) = dimNCA)-L. dimA/'(A) ± . (Note: (Note: Since 1 TT N(A)-L" = = 7l(A R(A ),), this theorem is sometimes colloquially A/^A) colloquially stated "row rank of of A == column rank of of A.") A.")
Proof: Define a linear transformation T : N(A)-L Proof: J\f(A)~L ~ —>•R(A) 7£(A)byby Tv
=
Av for all v
E
N(A)-L.
Clearly T is 1-1 (since A/"(T) N(T) = = 0). To To see that T is also onto, take any W w eE R(A). 7£(A). Then by definition there is a vector xx Ee ]Rn Ax = R" such that Ax — w. w. Write xx = Xl x\ + X2, X2, where 1 Xl N(A)-L N(A). Then Ajti AXI = W N(A)-L.1. The last equality x\ Ee A/^A) - andx2 and jc2 eE A/"(A). u; = TXI r*i since Xl *i eE A/^A)shows that T R(A) = T is onto. We thus have that dim dim7?.(A) = dimN(A)-L dimA/^A^ since it is easily shown 1 basis for N(A)-L,, then {TVI, basis for R(A). if that if {VI, {ui, ... . . . ,, viv} abasis forA/'CA) {Tv\, ... . . . ,, Tv Tvrr]} is aabasis 7?.(A). Finally, if r } is a following string of equalities follows follows easily: we apply this and several previous results, the following T "column A" = rank(A) R(A) = R(AT) "column rank of A" rank(A) = = dim dim7e(A) = dimN(A)-L dim A/^A)1 = = dim dim7l(A ) = = rank(AT) rank(A r ) == "row rank of 0 of A." D The following corollary is immediate. Like the theorem, it is a statement about equality of dimensions; the subspaces subspaces themselves themselves are are not not necessarily in the the same same vector vector space. space. of dimensions; the necessarily in m Corollary 3.18. ]Rn ~ ]Rm.. Then dimN(A) R(A) = = n, where n is the 3.18. Let A : R" -> R dimA/"(A) + + dim dimft(A) dimension of dimension of the the domain domain of of A. A.
Proof: Theorems 3.11 3.11 and and 3.17 3.17 we we see see immediately Proof: From From Theorems immediately that that n = dimN(A) = dimN(A)
+ dimN(A)-L + dim R(A) .
0
For completeness, completeness, we include here a few miscellaneous results about ranks of sums and products of matrices. xn Theorem 3.19. ]Rnxn. 3.19. Let A, B Ee R" . Then
1. O:s rank(A 2. rank(A)
+ B)
:s rank(A)
+ rank(B) -
+ rank(B).
n :s rank(AB) :s min{rank(A), rank(B)}.
3. nullity(B) :s nullity(AB) :s nullity(A) 4.
if B is nonsingular,
rank(AB)
+ nullity(B).
= rank(BA) = rank(A) and N(BA) = N(A).
Part 44 of of Theorem 3.19 suggests suggests looking looking atthe at the general general problem of the four fundamental fundamental Part Theorem 3.19 problem of the four subspaces of matrix products. The basic results are contained in the following following easily proved theorem.
3.5. 3.5. Four Four Fundamental Fundamental Subspaces Subspaces
25 25
mxn nxp Theorem 3.20. IRmxn, IRnxp. 3.20. Let A Ee R , B Ee R . Then
1. RCAB) S; RCA). 2. N(AB) ;2 N(B). 3. R«AB)T) S; R(B T ). 4. N«AB)T) ;2 N(A T ).
The It The next next theorem theorem is is closely closely related related to to Theorem Theorem 3.20 3.20 and and is is also also easily easily proved. proved. It is and is extremely extremely useful useful in in text text that that follows, follows, especially especially when when dealing dealing with with pseudoinverses pseudoinverses and linear linear least least squares squares problems. problems. mxn Theorem 3.21. Let A Ee R IRmxn. 3.21. Let . Then
1. R(A)
= R(AA T ).
2. R(AT)
= R(A T A).
3. N(A) = N(A T A). 4. N(A T ) = N(AA T ).
We now now characterize characterize I-I 1-1 and and onto onto transformations transformations and and provide provide characterizations characterizations in We in terms of of rank and invertibility. terms rank and invertibility. Theorem Theorem 3.22. 3.22. Let A :: IR Rnn -+ -» IRm. Rm. Then 1. A is onto onto if if and and only only if //"rank(A) —m m (A (A has has linearly linearly independent independent rows rows or or is is said said to to 1. A is rank (A) = have full row AATT is have full row rank; rank; equivalently, equivalently, AA is nonsingular). nonsingular). 2. A is said 2. A is 1-1 1-1 if if and and only only ifrank(A) z/rank(A) = = nn (A (A has has linearly linearly independent independent columns columns or or is is said T to full column AT A is nonsingular). to have have full column rank; rank; equivalently, equivalently, A A nonsingular).
Proof' Proof part 1: A is R(A) = Proof: Proof of of part 1: If If A is onto, onto, dim dim7?,(A) —m m = — rank(A). rank (A). Conversely, Conversely, let let yy Ee IRm Rm T T ] n be arbitrary. Let jc x =A AT(AA (AAT)-I IRn.. Then y = Ax, i.e., y Ee R(A), A is onto. )~ y Y Ee R 7?.(A), so A A is = Proof Proof of of part part 2: 2: If If A is 1-1, 1-1, then then N(A) A/"(A) = = 0, 0, which which implies implies that that dimN(A)1dim A/^A)-1 = —nn — dim R(A 7£(ATr ),), and and hence hence dim dim 7£(A) Theorem 3.17. 3.17. Conversely, Conversely, suppose suppose AXI Ax\ = Ax^. dim R(A) = nn by by Theorem AX2. T Then A ATr A;ti AXI = A AT AX2, which implies x\ XI = X2 since A ATrAA is invertible. Thus, A A is Ax2, = x^. 1-1. D 1-1. D
Definition A :: V Definition 3.23. 3.23. A V -+ —» W W is is invertible invertible (or (or bijective) bijective) if if and and only only if if it it is is 1-1 1-1 and and onto. onto. Note that that if if A is invertible, invertible, then then dim dim V V = — dim dim W. W. Also, -»• E" is invertible invertible or or A is Also, A A :: W IRn1 -+ IR n is Note nonsingular ifand nonsingular if and only only ifrank(A) z/rank(A) = = n. n. x A E€ R" IR~xn, Note that in the special case when A ", the transformations A, A, AT, Ar, and A-I A"1 ± are N(A)1- and R(A). The are all all 1-1 1-1 and and onto onto between between the the two two spaces spaces M(A) and 7£(A). The transformations transformations AT AT ! and -I have range but is and A A~ have the the same same domain domain and and range but are are in in general general different different maps maps unless unless A A is T orthogonal. Similar remarks apply to A A and A~ A -T. .
26
Chapter 3. linear Chapters. Li near Transformations Transformations
If linear transformation is not invertible, it may still be right or left left invertible. DefiIf a linear concepts are followed by a theorem characterizing left left and right invertible nitions of these concepts transformations.
Definition V -> Definition 3.24. 3.24. Let Let A A :: V -+ W. Then Then 1. A is said to be right invertible ifif there exists a right inverse transformation A~ A-RR :: R AA -R = W -+ —> V such that AA~ = Iww,, where IIw transformation on W. w denotes the identity transfonnation L left inverse transformation A -L -+ 2. A is said to to be left invertible ifif there exists a left transformation A~ :: W —> L V such -L A A == Iv, such that that AA~ Iv, where where Iv Iv denotes denotes the the identity identity transfonnation transformation on on V. V.
Let A : V -+ Theorem 3.25. Let -> W. Then 1. A A is right right invertible invertible ifif and and only only ifif it it is onto. 1. onto. left invertible and only ifit 2. A is is left invertible if if and if it is 1-1. and only if and left left invertible, i.e., both Moreover, A is is invertible if if and if it is both right and invertible, i.e., both1-1 1-1 and and R L onto, in in which case A~ A -Il = = A~ A -R = A~ A -L. = . m Theorem 3.22 3.22 we see that if A : E" ]Rn -+ ]Rm Note: From Theorem ->• E is onto, then a right inverse R T T is given by A~ A -R = = A AT(AA (AAT) left inverse is given by ) -I.. Similarly, if A is 1-1, then a left L T L = (ATTA)-I1AT. AA~ = (A A)~ A .
3.26. Let Let A : V -» -+ V. V. Theorem 3.26. 1. If A - RR such that AA~ A A - RR = = I, then A is invertible. If there exists a unique right inverse A~ L left inverse A~ A -L A -LLA A = 2. If If there exists a unique left such that A~ = I, then A is invertible.
Proof: We prove the first part and proof of second to the reader. Notice the Proof: and leave leave the proof of the second the following: following: A(A- R + A-RA -I)
= AA- R + AA-RA = I
+IA -
A
A since AA -R = I
= I. R (A -R + AA -RRAA — - /)I) must must be be aa right right inverse inverse and, and, therefore, Thus, (A + therefore, by by uniqueness uniqueness itit must must be be R R R A -R + A~ A -RRA A -— I = A -R. A -RRA A = = /, I, i.e., i.e., that A~ A -R the case that A~ + = A~ . But this implies that A~ is aa left left inverse. inverse. It It then then follows follows from from Theorem Theorem 3.25 3.25 that that A A is is invertible. invertible. D 0
Example 3.27. 1. Let A = 2]:]R2 -+ E ]R1I.. Then A is onto. (Proof: (Proo!' Take any a E ]R1I; = [1 [1 2] : E2 -»• € E ; then one 2 can such that rank can always always find find vv eE E ]R2 such that [1 [1 2][^] 2][ ~~] = = a). a). Obviously Obviously A A has has full full row row rank (= 1) and A - RR = _~]j is a right (=1) and A~ = [ _j right inverse. inverse. Also, it is clear that there are are infinitely infinitely many A. In Chapter right inverses for A. Chapter 6 we characterize characterize all right inverses of a matrix by characterizing all solutions of the linear linear matrix matrix equation equation AR AR = characterizing all solutions of the = I.I.
27
Exercises
2. LetA ~ ]R2. Then A is 1-1. The only 2. Let A = [i]:]Rl [J] : E1 -> E2. ThenAis 1-1. (Proof (Proof: The only solution solution toO to 0 = = Av Av = = [i]v [I2]v is N(A) = A is that A A has has full is vv = 0, 0, whence whence A/"(A) = 00 so so A is 1-1). 1-1). It It is is now now obvious obvious that full column column L rank (=1) and A~ A -L = = [3 [3 -—1] 1] is a left inverse. Again, it is clear that there are A. In we characterize infinitely infinitely many many left left inverses inverses for for A. In Chapter Chapter 66 we characterize all all left left inverses inverses of of aa matrix LA = matrix by characterizing characterizing all all solutions solutions of of the the linear linear matrix matrix equation equation LA = I.I.
3. The matrix 3. The matrix A =
1 1 2 1 [ 3 1
when onto. give when considered considered as as aa linear linear transformation on on IE]R3,\ isisneither neither 1-1 1-1nor nor onto.We We give below bases bases for four fundamental below for its its four fundamental subspaces. subspaces.
EXERCISES EXERCISES 3 1. Let A A = consider A A as a linear linear transformation transformation mapping E ]R3 to ]R2. 1. Let = [[~8 5;3 i) J4 and consider E2. Find A with respect to Find the the matrix matrix representation representation of of A to the bases bases
{[lHHU]} of R3 and
{[il[~J}
2
of E . nx 2. Consider vector space ]Rnxn ]R, let 2. Consider the the vector space R " over over E, let S denote denote the the subspace subspace of of symmetric symmetric matrices, R denote matrices, and and let let 7£ denote the the subspace subspace of of skew-symmetric skew-symmetric matrices. matrices. For For matrices matrices nx ]Rnxn y) = Y). Show that, with X, Y Y Ee E " define their inner product by (X, (X, Y) = Tr(X Tr(XTr F). J. . respect this inner inner product, product, R respect to to this 'R, = —SS^.
3. Consider £, defined in Example 3.2.3. Is £, £, Consider the differentiation differentiation operator C £ I-I? 1-1? IsIs£ onto? onto? 4. Prove Theorem Theorem 3.4. 4. Prove 3.4.
Chapter 3. Linear Transformations Chapters. Linear Transformations
28 5. Prove Theorem 3.11.4. 3.Il.4. Theorem 3.12.2. 6. Prove Theorem
7. Determine Detennine bases for the four fundamental fundamental subspaces of the matrix
A=[~2 5~ 5~ ~]. 3 mxn 8. Suppose xn has a left left inverse. Show that ATT has a right inverse. Suppose A Ee IR Rm
n
9. Let = [[~J o]. Determine A/"(A) and and 7£(A). Are they equal? Is general? 9. Let A = DetennineN(A) R(A). Are they equal? Is this this true true in in general? If If this is true in general, prove it; if not, provide a counterexample. 9x48 E Mg 1R~9X48. linearly independent independent solutions 10. 10. Suppose A € . How many linearly solutions can be found to the homogeneous = 0? Ax = O? homogeneous linear linear system system Ax T 3.1 to illustrate the four fundamental subspaces associated e 11. Modify Figure 3.1 associated with A ATE nxm m IR nxm thought of as a transformation from from R IR m to IRn. R R".
Chapter Chapter 4 4
Introduction to the the Introduction to Moore-Penrose Moore-Pen rose Pseudoinverse Pseudoinverse In this introduction to generIn this chapter chapter we we give give aa brief brief introduction to the the Moore-Penrose Moore-Penrose pseudoinverse, pseudoinverse, aa generalization of the inverse of a matrix. The Moore-Penrose pseudoinverse is defined for any matrix and, as is is shown in the the following text, brings brings great notational and conceptual clarity matrix and, as shown in following text, great notational and conceptual clarity to of solutions solutions to arbitrary systems of linear linear equations equations and and linear linear least to arbitrary systems of least squares squares to the the study study of problems. problems.
4.1 4.1
Definitions Definitions and and Characterizations Characterizations
Consider aa linear linear transformation —>• y,y, where whereX Xand andY y arearearbitrary arbitraryfinitefiniteConsider transformation A A :: X X ---+ 1 dimensional N(A).l dimensional vector spaces. Define Define a transformation transformation T T :: Af(A) - ---+ —>• R(A) Tl(A) by by Tx = Ax for all x E NCA).l.
Then, as noted in the 3.17, T T is (1-1 and and onto), onto), and Then, as noted in the proof proof of of Theorem Theorem 3.17, is bijective bijective Cl-l and hence hence we we can define a unique inverse transformation TRCA) ---+ can T~l 1 :: 7£(A) —>•NCA).l. J\f(A}~L. This Thistransformation transformation can be used to give our first first definition A ++,, the the Moore-Penrose Moore-Penrose pseudoinverse pseudoinverse of of A. can be used to give our definition of of A A. neither provides provides nor suggests a good computational strategy Unfortunately, the definition neither good computational strategy for determining AA++.. for determining Definition A and and T as defined defined above, above, define define aa transformation transformation A A++ : Y ---+ X X by Definition 4.1. 4.1. With With A T as y —»• by
L + where y = = YI y\ + Yz j2 with y\ eE 7£(A) yi eE Tl(A} Then A is the where Y with Yl RCA) and and Yz RCA).l.. Then A+ is the Moore-Penrose Moore-Penrose pseudoinverse A. pseudoinverse of of A.
Although X X and and Y were arbitrary vector spaces let us us henceforth henceforth consider consider the the Although y were arbitrary vector spaces above, above, let 1 X X =W ~n and Y lP1.mm.. We We have thus defined A+ A + for all A A Ee IR™ lP1.;" xn. case X y =R ". A purely algebraic characterization A ++ is is given in the the next next theorem, theorem, which proved by by Penrose Penrose in characterization of of A given in which was was proved in 1955; 1955; see see [22]. [22].
29
30
Chapter 4. Introduction to to the the Moore-Penrose Moore-Penrose Pseudoinverse Pseudoinverse Chapter 4. Introduction
xn Theorem Let A A Ee lR;" A++ if Theorem 4.2. 4.2. Let R?xn. . Then Then G G= =A if and and only only ifif
(Pl) AGA = A. (PI) AGA = A.
(P2) GAG GAG = G. (P2) G.
=
(P3) (P3) (AG)T (AGf = AG. AG. (P4) (P4) (GA)T (GA)T == GA. GA.
Furthermore, A++ always Furthermore, A always exists exists and and is is unique. unique.
Note that nonsingular matrix matrix satisfies Penrose properties. Note that the the inverse inverse of of aa nonsingular satisfies all all four four Penrose properties. Also, Also, aa right right or or left left inverse inverse satisfies satisfies no no fewer fewer than than three three of of the the four four properties. properties. Unfortunately, Unfortunately, as as with 4.1, neither its proof with Definition Definition 4.1, neither the the statement statement of of Theorem Theorem 4.2 4.2 nor nor its proof suggests suggests aa computacomputational However, the the great providing aa tional algorithm. algorithm. However, the Penrose Penrose properties properties do do offer offer the great virtue virtue of of providing checkable the following following sense. that is is aa candidate checkable criterion criterion in in the sense. Given Given aa matrix matrix G G that candidate for for being being the G the pseudoinverse pseudoinverse of of A, A, one one need need simply simply verify verify the the four four Penrose Penrose conditions conditions (P1)-(P4). (P1)-(P4). If If G satisfies all four, must be A++.. Such often relatively satisfies all four, then then by by uniqueness, uniqueness, it it must be A Such aa verification verification is is often relatively straightforward. straightforward.
[a
[!
+ Example Verify directly A+ = Example 4.3. 4.3. Consider Consider A A == [']. Verify directly that that A = [| ~] f ] satisfies satisfies (PI)-(P4). (P1)-(P4). L A -L = Note Note that that other other left left inverses inverses (for (for example, example, A~ = [3 [3 -— 1]) 1]) satisfy satisfy properties properties (PI), (PI), (P2), (P2), and and (P4) (P4) but but not not (P3). (P3).
A++ is given in the following Still another characterization Still another characterization of of A is given in the following theorem, theorem, whose whose proof proof can While not this can be be found found in in [1, [1, p. p. 19]. 19]. While not generally generally suitable suitable for for computer computer implementation, implementation, this characterization can can be be useful for hand calculation of of small small examples. examples. characterization useful for hand calculation xn Theorem Let A A Ee lR;" Theorem 4.4. 4.4. Let R™xn. . Then Then
A+
= lim (AT A + 82 1) -I AT
(4.1)
= limAT(AAT +8 2 1)-1.
(4.2)
6--+0 6--+0
4.2 4.2
Examples Examples
verified by by using the above Each of Each of the the following following can can be be derived derived or or verified using the above definitions definitions or or characcharacterizations. terizations. T Example AT (AATT) A is Example 4.5. 4.5. X A+t == A (AA )~-I if if A is onto onto (independent (independent rows) rows) (A (A is is right invertible).
Example 4.6. A)-I AT A is invertible). Example 4.6. A+ A+ = = (AT (AT A)~ AT if if A is 1-1 1-1 (independent (independent columns) columns) (A (A is is left left invertible). Example Example 4.7. 4.7. For For any any scalar scalar a, a, if a
t= 0,
if a =0.
4.3. Properties Properties and and Applications 4.3. Applications
31 31
Example jRn, Example 4.8. 4.8. For For any any vector vector v Ee M", if v i= 0, if v = O.
Example 4.9. Example 4.9.
Example 4.10. Example 4.10.
4.3 4.3
r
[~ ~
[~
=[
~
~l
0
r 1 I
4
=[
I
4
I
4 I
4
Properties and and Applications Properties Applications
This section miscellaneous useful useful results on pseudoinverses. these This section presents presents some some miscellaneous results on pseudoinverses. Many Many of of these are are used used in in the the text text that that follows. follows. mx jRmxn"and orthogonal Theorem 4.11. 4.11. Let A Ee R andsuppose supposeUUEejRmxm, Rmxm,VVEejRnxn R n x "areare orthogonal(M(Mis is T -11 orthogonal if if MT M = MM ). Then ). Then orthogonal
Proof: For For the simply verify verify that that the the expression expression above above does indeed satisfy satisfy each each cof Proof: the proof, proof, simply does indeed the four 0 the four Penrose Penrose conditions. conditions. D nxn Theorem Let S jRnxn be with U SU = D, where where U and Theorem 4.12. 4.12. Let S Ee R be symmetric symmetric with UTTSU = D, U is is orthogonal orthogonal an + + TT + D is diagonal. diagonal. Then Then S S+ = U D+U where D D+ is is again again a a diagonal diagonal matrix matrix whose whose diagonc diagonal D is UD U , , where elements are are determined to Example elements determined according according to Example 4.7. 4.7.
Theorem 4.13. A E 4.13. For For all A e jRmxn, Rmxn,
1. A+
= (AT A)+ AT = AT (AA T)+.
2. (A T )+ = (A+{.
Proof: Both results can can be proved using the limit limit characterization characterization of of Theorem Theorem 4.4. The Proof: Both results be proved using the 4.4. The proof of of the the first is not particularly easy easy and and does not even even have the virtue virtue of of being being proof first result result is not particularly does not have the especially illuminating. illuminating. The The interested interested reader reader can can consult consult the proof in in [1, [1, p. p. 27]. The especially the proof 27]. The proof of the the second second result (which can can also also be easily by by verifying the four four Penrose Penrose proof of result (which be proved proved easily verifying the conditions) is is as as follows: follows: conditions) (A T )+ = lim (AA T ~--+O
+ 82 l)-IA
= lim [AT(AAT ~--+O
= [limAT(AAT
+ 82 l)-1{ + 82 l)-1{
~--+O
= (A+{.
0
32
Chapter 4. Introduction to to the the Moore-Penrose Moore-Penrose Pseudo Pseudoinverse Chapter 4. Introduction inverse
4.12 and 4.13 Note that by combining Theorems 4.12 4.13 we can, can, in theory at least, compute the Moore-Penrose pseudoinverse of any matrix (since AAT A AT and AT AT A are symmetric). This e.g., [7], [7], [II], [11], turns out to be a poor poor approach in finite-precision arithmetic, however (see, (see, e.g., [23]), and better methods are suggested in text that follows. Theorem Theorem 4.11 4.11 is suggestive of a "reverse-order" property for pseudoinverses of prodnets of of matrices such as as exists exists for of products. nroducts TTnfortnnatelv. in general, peneraK ucts matrices such for inverses inverses of Unfortunately, in
As example consider [0 1J B= A = = [0 I] and and B = [LI. : J. Then Then As an an example consider A (AB)+ = 1+ = I
while while B+ A+
= [~
[]
~J ~ = ~.
sufficient conditions under which the reverse-order reverse-order property does However, necessary and sufficient hold are known and we quote a couple of moderately useful results for reference. + + Theorem 4.14. 4.14. (AB)+ (AB)+ = = B B+ A A + ifif and and only only if if
1. n(BB T AT) ~ n(AT) and 2. n(A T AB) ~ nCB) .
Proof: For the proof, see [9]. Proof: [9].
0 D
+ Theorem 4.15. = B?A+, where BI AB\B+. 4.15. (AB) (AB)+ = B{ Ai, where BI = = A+AB A+ AB and and A) AI = = ABIB{.
Proof: For the proof, see [5]. Proof: [5].
0 D
n xr r xm lR~xr, lR~xm, A+. Theorem 4.16. 4.16. If If A eE R eR (AB)+ == B+ B+A+. r , B E r , then (AB)+ n xr T + Proof' Since A Ee R lR~xr, A)-IlAAT, A+ Proof: A+ = = (AT (ATA)~ , whence A AA = fIrr .• Similarly, Similarly, since r , then A+ xm + T T + B e E W lR;xm, we B+ BT(BBT)-I, BB+ f The by . , we have B = B (BB )~\ whence BB = I . The result then follows by r rr taking BIt = = B,At B, A\ = =A in Theorem Theorem 4.15. 4.15. D takingB A in 0
The following theorem gives some additional useful properties properties of pseudoinverses. mxn Theorem 4.17. 4.17. For For all A E e lR Rmxn ,,
1. (A+)+ = A. 2. (AT A)+ = A+(A T)+, (AA T )+ = (A T)+ A+. 3. n(A+)
= n(A T) = n(A+ A) = n(A TA).
4. N(A+)
= N(AA+) =
5.
If A
N«AA T)+)
is normal, then AkA+
=
= N(AA T) = N(A T).
A+ Ak and (Ak)+ = (A+)kforall integers k > O.
Exercises
33
xn Note: Recall Recall that A eE R" IRn xn is normal A ATT = = A ATTA. A. For For example, example, if if A A is is symmetric, symmetric, Note: that A is normal if if AA then it it is is normal. normal. However, However, aa matrix matrix can can be be none none of the skew-symmetric, skew-symmetric, or or orthogonal, orthogonal, then of the preceding but but still be normal, normal, such as preceding still be such as
A=[ -ba ab] for scalars a, E. for scalars a, b b eE R The next next theorem facilitating aa compact and unifying approach The theorem is is fundamental fundamental to to facilitating compact and unifying approach to studying studying the of solutions solutions of equations and linear least squares to the existence existence of of (matrix) (matrix) linear linear equations and linear least squares problems. problems. nxp MXm IRnxp, IRnxm. Theorem 4.18. Suppose Suppose A Ee R , B Ee E . Then Then R(B) K(B) cS; R(A) U(A) if if and and only only ifif B. AA+B == B. m Proof: Suppose R(A) and and take arbitrary jc x E IRm. RCA), so so Proof: Suppose R(B) K(B) cS; U(A) take arbitrary eR . Then Then Bx Bx eE R(B) H(B) cS; H(A), p there exists aa vector such that = Bx. have there exists vector yy Ee R IRP such that Ay Ay = Bx. Then Then we we have
Bx
= Ay = AA + Ay = AA + Bx,
where one the Penrose is used arbitrary, we where one of of the Penrose properties properties is used above. above. Since Since xx was was arbitrary, we have have shown shown that B = AA+ B. that B = AA+B. + To prove prove the converse, assume assume that that AA AA +B B = B take arbitrary arbitrary yy eE K(B). R(B). Then To the converse, B and and take Then m m there vector xx E IR such that Bx Bx = y, whereupon whereupon there exists exists aa vector eR such that = y, 0
y = Bx = AA+Bx E R(A).
EXERCISES EXERCISES
U ;].1 •
1. Use Theorem 4.4 to to compute pseudoinverse of of \ 2 1. Use Theorem 4.4 compute the the pseudoinverse
2
T + T + T x, Y IRn, show that (xyT)+ 2. If jc, y eE R", (xyT)+ == (x T(xx)+(yT x) (yy)+ y) yx yxT. mxn r 3. For For A A eE R IRmxn, prove that that 7£(A) RCA) = = 7£(AA R(AAT) using only only definitions definitions and and elementary 3. , prove ) using elementary properties Moore-Penrose pseudoinverse. pseudoinverse. of the the Moore-Penrose properties of mxn 4. For A A e E R IRmxn, , prove that R(A+) ft(A+) = R(A ft(ATr). pxn mx 5. For A A E IRPxn and BE IRmxn, thatN(A) S; A/"(S) N(B) if and A = B. eR 5 €R ", show that JV(A) C and only if BA+ fiA+A B. xn m A G E M" IRn xn, IRmmxm xm and suppose further that D 6. Let A , 5B eE JRn E n xxm , and D E€ E D is nonsingular. 6.
(a) Prove Prove or or disprove disprove that that
[~
AB D
(b) (b) Prove Prove or or disprove disprove that that
[~
B D
r r=[ =[
A+
0
A+
0
-A+ABD- i D- i
-A+BD- 1 D- i
l
].
This page intentionally intentionally left left blank blank This page
Chapter Chapter 5 5
Introduction to Introduction to the the Singular Singular Value Decomposition Value Decomposition
In this this chapter chapter we we give give aa brief brief introduction introduction to to the the singular value decomposition decomposition (SVD). (SVD). We We In singular value show that matrix has an SVD SVD and and describe describe some show that every every matrix has an some useful useful properties properties and and applications applications of this this important important matrix matrix factorization. factorization. The The SVD plays aa key key conceptual and computational of SVD plays conceptual and computational role throughout throughout (numerical) and its applications. role (numerical) linear linear algebra algebra and its applications.
5.1
The Fundamental Theorem Theorem
xn mxm Theorem 5.1. Let A eE R™ IR~xn.. Then there exist orthogonal matrices U E IRmxm and and Theorem 5.1. e R nxn nxn V V E€ IR R such such that that
A
n
= U~VT,
(5.1)
rxr
IRrxr,, and a\ UI > ur ) e E R diag(ul, ... where = [J ... ,,o>) > ••• > > U orr > More > 0. O. More where S ~ = [~ °0], SS = diagfcri, specifically, we have specifically,
A
= [U I
U2) [
~
= Ulsvt·
0 0
V IT VT
][ ]
(5.2)
2
(5.3)
nxr The submatrix sizes are all determined by r (which must be S n}), i.e., i.e., UI IRmxr,, < min{m, min{m, «}), U\ eE W U2 eE ^x(m-r) «xr j yV22 €E Rnxfo-r^ U2 IRrnx(m-rl,; Vi VI eE RIRnxr, IRnx(n-r),and andthethe0-O-subblocks inE~are arecompatibly compatibly JM^/ocJb in dimensioned. dimensioned.
r r Proof: Since AT A (ATAAi is symmetric and and nonnegative nonnegative definite; recall, for example, Proof: Since A A >:::::00( A s symmetric definite; recall, for example, [24, Ch. 6]), eigenvalues are are all real and and nonnegative. nonnegative. (Note: The rest rest of the proof proof follows [24, Ch. 6]), its its eigenvalues all real (Note: The of the follows analogously if if we we start start with with the the observation observation that that A AAT analogously A T ::::: > 00 and and the the details detailsare are left left to to the the reader reader T of eigenvalues AT A A by by {U?, with UI as an exercise.) Denote the the set as an exercise.) Denote set of eigenvalues of of A {of , i/ eE !!.} n} with a\ ::::: > ... • • • ::::: >U arr >> 0 = Ur+1 o>+i = = ... • • • = Un. an. Let Let {Vi, {u, , ii Ee !!.} n} be be aa set set of of corresponding corresponding orthonormal orthonormal eigenvectors eigenvectors 0= and V\ = [v\, ...,,Vvr r),] , V2Vi == [Vr+I, [vr+\,... . . .,V, vn n].].LettingS Letting S =—diag(uI, diag(cri,... . . .,u , rcf),r),we wecan can and let let VI [VI, ... r 2 T 2 A TAVi A VI = = VI S2.. Premultiplying by vt A TAVi A VI = vt VI S2 = the latter latter write A write ViS Premultiplying by Vf gives gives vt Vf A VfV^S = S2, S2, the equality following andpostmultiplying postmultiplyingby by of the the r;, Vi vectors. vectors. PrePre- and equality following from from the the orthonormality orthonormality of S-I the emotion equation S~l gives eives the
(5.4)
35
Chapter to the Chapter 5. 5. Introduction Introduction to the Singular Singular Value Value Decomposition Decomposition
36 36
Turning now to the the eigenvalue eigenvalue equations equations corresponding to the the eigenvalues eigenvalues ar+l, or+\,... . . . ,, a Turning now to corresponding to ann we we T have that A A TTAV A V2z = VzO = 0, whence Vi A T A V = O. Thus, A V = O. Now define the V20 Vf A AV22 0. AV2 0. Now mx/ l matrix VI IRmxr VI = AViS~ AViS-I. Ui E e M " by U\ . Then from (5.4) (5.4) we see see that VrVI UfU\ = = /; i.e., the 77IX( r) columns of VI are orthonormal. Choose any matrix V2 E IRmx(m-r) such that [VI columns U\ orthonormal. Choose U2 £ ^ ™~ [U\ V2] U2] is orthogonal. Then T V AV
=[ =[
VrAVI
Vr AVz
VIAVI
vI AVz
VrAVI
~]
vIA VI
]
since A AV V22 ==0.O. Referring the equation equation V U\I == A A VI V\ S-I S l defining since Referring to to the defining U\, VI, we we see see that that U{ V r AV\ A VI = = S and and vI 1/2 AVi = vI U^UiS = O. 0. The The latter latter equality equality follows follows from from the the orthogonality orthogonality of of the S A VI = VI S = the V 2.. Thus, we see that, in fact, VT A V = [~ ~], and defining this matrix columns of VI U\ and andU UTAV [Q Q], to S completes completes the to be be ~ the proof. proof. D 0 Definition Definition 5.2. 5.2. Let A A == V"i:. t/E VT VT be an SVD SVD of of A A as in Theorem 5.1. 5.1. 1. The set {ai, ... , ar}} is called called the set of [a\,..., of (nonzero) singular values values of of the matrix A and iI T proof of A;'-(2 (AT A) == is denoted ~(A). £(A). From the proof of Theorem 5.1 we see that ai(A) cr,(A) = A (A A) I
AtA.? (AA (AATT).).
min{m, n} Note that there are also min{m, n] -— r zero singular singular values.
2. The columns ofUV are called called the left singular vectors orthonormal columns of left singular vectors of of A (and are the orthonormal eigenvectors of of AA AATT).). eigenvectors 3. The columns of right singular of V are called called the right singular vectors vectors of of A (and are the orthonormal orthonormal eigenvectors of of AT A1A). A). x Remark complex case in which A E IC~ xn" is quite straightforward. Remark 5.3. 5.3. The analogous analogous complex e C™ straightforward. H The decomposition A = proof is essentially decomposition is A = V"i:. t/E V V H,, where V U and V V are unitary and the proof identical, except for Hermitian transposes replacing transposes.
Remark 5.4. Note that V Remark 5.4. U and V can be be interpreted interpreted as changes changes of basis in both the domain domain and co-domain co-domain spaces spaces with respect to has aa diagonal diagonal matrix matrix representation. representation. and with respect to which which A A then then has Specifically, Specifically, let C, C denote denoteAAthought thought of ofasasaalinear linear transformation transformation mapping mapping IRWn totoIRm. W. Then Then T rewriting A A = VT as as AV A V = V"i:. Mat C the bases = V"i:. U^V U E we we see see that Mat £ is is "i:. S with respect respect to the m (see [v\,..., for IR R"n and and {u {u\,..., for R (see the Section 3.2). 3.2). See See also also {VI, ... , vn }} for I, •.. , u m IRm the discussion discussion in in Section m]} for Remark 5.16. 5.16. Remark Remark decomposition is not unique. Remark 5.5. 5.5. The !:ingular singular value decomposition unique. For example, an examination of the proof proof of Theorem Theorem 5.1 reveals that any orthonormal orthonormal basis basis for for N(A) jV(A) can can be be used used for for V2. V2. • £lny there may be nonuniqueness nonuniqueness associated the columns V\ (and (and hence hence VI) U\) corcor• there may be associated with with the columns of of VI responding to multiple cr/'s. responding to multiple O'i'S.
37 37
5.1. 5.1. The The Fundamental Fundamental Theorem Theorem
• any U2 C/2can be used so long as [U [U\I U2] Ui] is orthogonal. orthogonal. U and V V can be changed (in tandem) by sign (or multiplier of the form form • columns of U eejej8 in the the complex case). case). What is unique, however, is the matrix I: E and the span of the columns of UI, U\, U2, f/2, VI, Vi, and V ¥22 (see Theorem Theorem 5.11). Note, too, too,that thataa"full "full SVD" SVD"(5.2) (5.2)can canalways alwaysbe beconstructed constructedfrom from a "compact SVD" SVD" (5.3). (5.3).
Computing an SVD by working directly with the eigenproblem for A ATT A A or Remark 5.6. 5.6. Computing T AA T is numerically poor in finite-precision arithmetic. Better algorithms exist that work AA directly on A via a sequence of orthogonal orthogonal transformations; transformations; see, e.g., [7], see, e.g., [7], [11], [11],[25], [25]. F/vamnlp Example 5.7.
A -- [10 01] - U I UT,
2 x 2 orthogonal orthogonal matrix, is an SVD. where U U is an arbitrary arbitrary 2x2 5.8. Example 5.8. A _ [ 1
-
0
-~ ]
sin e cose
cose = [ - sine
J[~ ~J[
cose sine
Sine] -cose '
where e 0 is arbitrary, is an SVD. Example 5.9. 5.9. Example I
A=U
-2y'5
3
-5-
2
y'5
n=[ [] 3 2
3
S-
0
2~
4y'5 15
][
3~ 0
_y'5 -3-
0
0][ 0 0
v'2 T v'2 T
v'2 T -v'2 -2-
]
I
3
=
2
3 2
3J2
[~ ~]
3
is an SVD. MX A e E IR Example 5.10. 5.10. Let A R nxn " be symmetric symmetric and positive definite. Let V V be an orthogonal orthogonal matrix of eigenvectors A, i.e., AV = A = A VTT is an eigenvectors that diagonalizes A, i.e., VT VT AV =A > > O. 0. Then A = V VAV SVDof A. SVD of A.
A factorization UI: VTr of m x nn matrix A A qualifies as an SVD if U t/SV o f aann m U and V are orthogonal and I: £ is an m x n "diagonal" matrix whose diagonal elements in the upper left comer A = UI:V A, then corner are positive (and ordered). For example, if A f/E VTT is an SVD of A, r r T T VI:TU V S C / i is s aan n SSVD V D ooff AT. A .
38 38
Chapter Introduction to the Singular Decomposition Chapter 5. 5. Introduction to the Singular Value Value Decomposition
5.2 5.2
Some Some Basic Basic Properties Properties
mxn Theorem 5.11. Let A A Ee R jRrnxn have singular value value decomposition A = VTT.. Using Theorem 5.11. Let have aa singular decomposition A = U'£ VLV Using the notation the following hold: the notation of of Theorem Theorem 5.1, 5.1, the following properties properties hold:
1. A. 1. rank(A) rank(A) = = rr == the the number number of of nonzero nonzero singular singular values values of of A. 2. Let Let U V =. = [HI, [UI, .... and V A has has the the dyadic dyadic (or 2. . . ,, uurn] V = = [VI, [v\,... ..., , vvnn].]. Then Then A (or outer outer m] and product) expansion product) expansion r
A = Laiuiv;.
(5.5)
i=1
3. The singular vectors vectors satisfy satisfy the the relations relations 3. The singular AVi
= ajui,
AT Uj = aivi
for i E
(5.6) (5.7)
r.
4. LetUI = [UI, ... , u r ], U2 = [Ur+I, ... , urn], VI = [VI, ... , vr ], andV2 = [Vr+I, ... , Vn]. Then (a) R(VI) = R(A) = N(A T / . (b) R(U2) = R(A)1- = N(A T ). (c) R(VI)
= N(A)1- = R(A T ).
(d) R(V2)
= N(A) =
R(AT)1-.
Remark 5.12. Part Part 4 4 of theorem provides provides aa numerically numerically superior superior method method for Remark 5.12. of the the above above theorem for finding bases for four fundamental to methods finding (orthonormal) (orthonormal) bases for the the four fundamental subspaces subspaces compared compared to methods based based column echelon echelon form. form. Note Note that that each each subspace on, for example, reduction reduction to row or on, for example, to row or column subspace requires requires knowledge of the The relationship subspaces is is summarized summarized knowledge of the rank rank r. r. The relationship to to the the four four fundamental fundamental subspaces nicely in Figure 5.1. nicely in Figure 5.1. Remark 5.13. 5.13. The the dyadic decomposition (5.5) as aa sum of outer outer products Remark The elegance elegance of of the dyadic decomposition (5.5) as sum of products SVD and the key vector vector relations relations (5.6) explain why why it conventional to to write the SVD and the key (5.6) and and (5.7) (5.7) explain it is is conventional write the as = U'£V UZVTT rather say, A = U,£V. UZV. as A A = rather than, than, say, A = mx Theorem Let A A E jRmxn singular value value decomposition A = in Theorem 5.14. Let e E " have have aa singular decomposition A = U,£V UHVTT as as in Theorem 5.1. Then Then TheoremS.].
(5.8)
where where
39 39
5.2. Some Basic Properties 5.2. Some Basic Properties
A
r
r
E9 {O}
/
{O)
n-r
m-r
Figure 5.1. and the subspaces. Figure 5.1. SVD SVD and the four four fundamental fundamental subspaces. with Q-subblocks appropriately U and we let let the the columns columns of of U and V V with the the O-subblocks appropriately sized. sized. Furthermore, Furthermore, ifif we be as defined then be as defined in in Theorem Theorem 5.11, 5.11, then
r
=
L
1
-v;u;,
(5.10)
;=1 U;
Proof' The proof follows follows easily easily by by verifying verifying the the four Penrose conditions. conditions. Proof: The proof four Penrose
0 D
+ Remark expressions above an SVD SVD of Remark 5.15. 5.15. Note Note that that none none of of the the expressions above quite quite qualifies qualifies as as an of AA+ if insist that singular values ordered from smallest. However, However, aa simple simple if we we insist that the the singular values be be ordered from largest largest to to smallest. reordering reordering accomplishes accomplishes the the task: task:
(5.11)
This also be identity matrix matrix This can can also be written written in in matrix matrix terms terms by by using using the the so-called so-called reverse-order reverse-order identity (or ..., , e^, symmetric. (or exchange exchange matrix) matrix) P P = = \e [e rr,,eer-I, e2, e\\, ed, which which is is clearly clearly orthogonal orthogonal and and symmetric. r^\, ...
40 40
Chapters. Introduction to to the Singular Value Decomposition Chapter 5. Introduction the Singular Value Decomposition
Then Then A+
= (VI p)(PS-1 p)(PVr)
is the the matrix matrix version version of of (5.11). A "full be similarly similarly constructed. is (5.11). A "full SVD" SVD" can can be constructed.
Remark 5.16. 5.16. Recall Recall the the linear linear transformation transformation T used in in the the proof proof of of Theorem Theorem 3.17 and Remark T used 3.17 and is determined determined by by its its action action on on aa basis, basis, and and since in Definition Definition 4.1. 4.1. Since in Since T T is since ({VI, v \ , ... . . .,,vvr r}}isisaa basisforN(A).l, then TT can can be be defined defined by by TVj u rr}} basis forJ\f(A)±, then TV; == OjUj cr, w,, ,i / E~. e r. Similarly, Similarly, since since {UI, [u\,... ... , ,u is a basis forR(A), then then TcanbedefinedbyT-Iu; = tv; From Section Section 3.2, the isabasisfor7£(.4), T~lI can be defined by T^'M, = ^-u, ,i , / eE~. r. From 3.2, the with respect respect to to the the bases bases {{VI, and {u clearly matrix representation representation for matrix for T T with v \ , ... ..., , vvrr}} and {MII,, ... . . . ,, uurr]} is is clearly with respect respect to to S, while the the matrix matrix representation representation for the inverse linear transformation transformation TS, while for the inverse linear T~lI with 1 the same bases is is 5"" S-I.. the same bases
5.3 5.3
Rowand Column Compressions Row and Column Compressions
Row compression Let A A E E lR. have an by (5.1). (5.1). Then Let Rmxn have an SVD SVD given given by Then
VT A = :EVT =
[~ ~ ] [ ~i
-- [ SVr 0 ]
]
mxn E lR. .
rx Notice that that M(A) N(A) = and the the matrix matrix SVf SVr Ee R lR. rxll" has has full row Notice - N(V M(UT T A) = N(svr> A/"(SV,r) and full row T other words, words, premultiplication premultiplication of of A A by by VT is an an orthogonal orthogonal transformation transformation that that rank. In rank. In other U is A by by row row transformations. transformations. Such row compression compression can can also also be be accomplished "compresses" "compresses" A Such aa row accomplished by orthogonal orthogonal row row transformations transformations performed performed directly directly on A to to reduce reduce it it to to the the form form [~], by on A 0 , where R R is is upper upper triangular. triangular. Both Both compressions compressions are are analogous analogous to to the the so-called where so-called row-reduced row-reduced echelon form form which, which, when when derived by aa Gaussian Gaussian elimination elimination algorithm implemented in in echelon derived by algorithm implemented finite-precision arithmetic, arithmetic, is is not not generally generally as as reliable reliable aa procedure. finite-precision procedure. D _
Column compression compression Column Again, SVD given Then Again, let let A A eE R lR.mxn have have an an SVD given by by (5.1). (5.1). Then AV = V:E
=
[VI
U2]
[~ ~
]
=[VIS 0] ElR.mxn. mxr This time, time, notice notice that that H(A) R(A) = K(AV) R(A V) = R(UI S) and and the the matrix matrix UiS VI S eE R lR. m xr has has full This K(UiS) full In other other words, words, postmultiplication postmultiplication of of A A by by V is an transformation column rank. rank. In column V is an orthogonal orthogonal transformation A by by column I;olumn transformations. transformations. Such compression is is analogous to the the that "compresses" "compresses" A Such aa compression analogous to
Exercises Exercises
41 41
so-called column-reduced column-reduced echelon echelon form, form, which which is not generally generally aa reliable reliable procedure procedure when when so-called is not performed by by Gauss transformations in in finite-precision For details, see, for for performed Gauss transformations finite-precision arithmetic. arithmetic. For details, see, example, [7], [7], [11], [11],[23], [23],[25]. [25].
EXERCISES EXERCISES mx T 1. Let X E IRmxn. XT = 0, show that X == 0. o. €M ". If If X XX = T 2. Prove Prove Theorem Theorem 5.1 5.1 starting starting from the observation that AA AAT ~ 0. O. 2. from the observation that > xn A eE E" IRnxn indefinite. Determine an SVD of A. 3. Let A be symmetric but indefinite. an SVD A. m n 4. IRm, ~n be nonzero vectors. Determine Determine an SVD of A E ~~ xn 4. Let x eE R , yy eE R of the matrix A e R™ defined by by A A = xyT. defined xyT.
Determine SVDs the matrices matrices 5. Determine SVDs of of the 5. (a) (b)
[ ] [ ~l -1 0
-1
mxn nxn mxm and 6. Let Let A A e E R ~mxn and E IRmxm and Y ~nxn are are orthogonal. and suppose W W eR 7 eE R
(a) that A and and WAY (a) Show Show that W A F have have the the same same singular singular values values (and (and hence hence the the same same rank). rank). (b) Suppose that W Wand Yare A and Y are nonsingular but not necessarily orthogonal. Do A and have the the same they have have the the same same rank? rank? and WAY WAY have same singular singular values? values? Do Do they XM Let A € E R" ~~xn. . Use Use the the SVD to determine factorization of of A, i.e., i.e., AA== QQP P 7. 7. Let SVD to determine aa polar factorization where Q Q is is orthogonal orthogonal and and P p TT > > 0. O. Note: Note: this this is is analogous to the the polar polar form form where P = P analogous to iO zz = rel& ofa of a complex complex scalar scalar zz (where (where ii = jj = V^T). J=I).
This page intentionally intentionally left left blank blank This page
Chapter 6 6 Chapter
Linear Equations Equations Linear
In this this chapter we examine uniqueness of In chapter we examine existence existence and and uniqueness of solutions solutions of of systems systems of of linear linear equations. the form equations. General General linear linear systems systems of of the form (6.1)
are special case, case, the the familiar familiar vector vector system are studied studied and and include, include, as as aa special system Ax = b; A
6.1 6.1
E ]Rn xn,
b
E ]Rn.
(6.2)
Vector Equations Vector Linear Linear Equations
We begin review of principal results We begin with with aa review of some some of of the the principal results associated associated with with vector vector linear linear systems. systems. Theorem 6.1. system of Theorem 6.1. Consider Consider the the system of linear linear equations equations
Ax = b; A
E lRmxn,
b
E lRm.
(6.3)
1. solution to b E R(A). 1. There There exists exists aa solution to (6.3) (6.3) if if and and only only ififbeH(A). m, i.e., 2. for all b Ee lR 2. There exists a solution to (6.3) (6.3} for Rmm if if and only only ifR(A) ifU(A) == lR W", i.e., A is onto; equivalently, equivalently, there there exists exists aa solution if and and only only ifrank([A, j/"rank([A, b]) b]) = = rank(A), and rank(A), and onto; solution if this is possible only ifm ifm :::: < nn (since (since m m = dim dimT^(A) = rank(A) < min{m, min{m, nn}). this is possible only R(A) = rank(A) ::::
n.
3. A solution solution to N(A) = A is 3. A to (6.3) (6.3) is is unique unique if if and and only only if ifJ\f(A) = 0, 0, i.e., i.e., A is 1-1. 1-1. 4. for all 4. There There exists exists aa unique unique solution to to (6.3) (6.3) for all bb Ee ]Rm W" if if and and only only if if A is is nonsingular; nonsingular; mxm equivalently, A Mmxm and A has neither singular value nor aa 0 eigenvalue. eigenvalue. equivalently, A EG lR and A has neither aa 0 singular value nor 1 5. at most for all if of 5. There There exists exists at most one one solution solution to to (6.3) (6.3) for all bb Ee lR Wm if and and only only if if the the columns columns of A n. A are are linearly linearly independent, independent, i.e., i.e., A/"(A) N(A) = 0, 0, and and this this is is possible possible only only ifm ifm > ::: n.
6. Ax = 6. There exists a nontrivial solution to the homogeneous system Ax = 0 0 if if and only only ifif rank(A) < n. rank(A) < n. 43
44
Chapter Linear Equations Chapter 6. 6. Linear Equations
Proof: The The proofs proofs are are straightforward straightforward and and can can be be consulted consulted in in standard standard texts texts on on linear Proof: linear algebra. Note Note that that some parts of of the the theorem theorem follow follow directly directly from from others. others. For example, to to algebra. some parts For example, prove part part 6, note that that xx = 0 0 is is always to the the homogeneous homogeneous system. Therefore, we we prove 6, note always aa solution solution to system. Therefore, must have have the the case case of of aa nonunique nonunique solution, A is not I-I, which implies implies rank(A) rank(A) < < n n must solution, i.e., i.e., A is not 1-1, which by part part 3. 0 by D
6.2 6.2
Matrix Linear Equations
In some of and uniqueness In this this section section we we present present some of the the principal principal results results concerning concerning existence existence and uniqueness of solutions to to the the general general matrix matrix linear linear system (6.1). Note Note that that the the results results of of solutions system (6.1). of Theorem Theorem 6.1 follow from those below below for the special case k = = 1,1, while while results results for (6.2) follow 6.1 follow from those for the special case for (6.2) follow by by specializing even even further = n. n. specializing further to to the the case case m m= Theorem 6.2 6.2 (Existence). equation Theorem (Existence). The The matrix matrix linear linear equation
AX = B; A
E JR. mxn ,
BE
JR.mxk,
(6.4)
has has aa solution solution ifif and and only only ifl^(B) ifR(B) C S; 7£(A); R(A); equivalently, equivalently, aa solution solution exists exists ifif and and only only ifif + AA B = B. AA+B B. Proof: follows essentially range Proof: The The subspace subspace inclusion inclusion criterion criterion follows essentially from from the the definition definition of of the the range of aa matrix. criterion is is Theorem of matrix. The The matrix matrix criterion Theorem 4.18. 4.18. 0 mxn mxk + Theorem 6.3. 6.3. Let A eE R JR.mxn,, B E JR.mxk and suppose that AA +B B = Theorem eR = B. Then any matrix of form of the the form X = A+ B + (/ - A+ A)Y, where Y E JR.nxk is arbitrary, (6.5)
is is aa solution solution of of AX=B.
(6.6)
Furthermore, all solutions of (6.6) (6.6) are of this form. Furthermore, all solutions of are of this form. Proof: To To verify verify that that (6.5) (6.5) is is aa solution, premultiply by by A: Proof: solution, premultiply A: AX
= AA+ B + A(I = B
+ (A -
A+ A)Y
AA+ A)Y by hypothesis
= B since AA + A = A by the first Penrose condition.
That all solutions arc of this seen as follows. Let Let Z arbitrary solution That all solutions are of this form form can can be be seen as follows. Z be be an an arbitrary solution of of (6.6). i.e .. AZ AZ :::: B. Then Then we we can can write write (6.6), i.e., — B.
Z=A+AZ+(I-A+A)Z =A+B+(I-A+A)Z
and (6.5). and this this is is clearly clearly of of the the form form (6.5).
0
6.2. Matrix Matrix Linear Linear Equations Equations 6.2.
45
+ + Remark A is square and nonsingular, A A+ = A" A-I1 and so (I A+ A) = O. Remark 6.4. When A (/ -— A A) 0. Thus, 1 = A-I B. there is no "arbitrary" component, leaving only the unique solution X X• = A~ B.
Remark Remark 6.5. 6.5. It It can be shown that the particular particular solution X = A++BB is the solution of (6.6) (6.6) 7 that minimizes minimizes TrXT TrX X. (TrO (Tr(-) denotes denotes the the trace of aa matrix; that TrXT TrX r X = = £\ jcj.) trace of matrix; recall recall that Li,j•xlj.) that
Theorem 6.6 (Uniqueness). (Uniqueness). A of the the matrix linear equation equation Theorem 6.6 A solution solution of matrix linear AX
= B;
A E lR,mxn, BE lR,mxk
(6,7)
and only only if A ++AA = I; equivalently, has aa unique and only only if if is unique unique if if and if A = /; equivalently, (6.7) (6.7) has unique solution solution if if and M(A) = 0. N(A) = O. Proof: The first equivalence is immediate from Theorem 6.3. The second follows by noting Proof: thatA+ A = rank(A) (recallr n), But Butrank(A) that A+A = I/ can occur only ifr if r = — n, n, wherer where r = rank(A) (recall r ::: < h). rank(A) = = nn if and and only if A is I-lor 1-1 or _/V(A) 0. D if only if A is N(A) = = O. 0
Example A Ee lR,nxn. Ax = Example 6.7. 6.7. Suppose A E"x". Find all solutions of the homogeneous system Ax — 0, 0. Solution: x=A+O+(I-A+A)y = (I-A+A)y, + where yy eE lR,n A+ t= I,I. R" is arbitrary. Hence, there exists a nonzero solution if and only if A AA /= rank(A) A being singular. Clearly, if there exists a This is equivalent to either rank (A) = = r < < n or A unique, nonzero solution, it is not unique. Computation: Since yy is arbitrary, it is easy to see that all solutions are generated from a basis for 7£(7 R(I -— A A ++ A). A). But if A A has an SVD given by A A = = U f/Eh VT, VT, then it is easily r checked that 1/ - A+A V2V and R(Vz U(V2V^) = = R(Vz) K(V2) = =N(A), N(A). A+ A = Vz V[ 2 and
vD
Example A Ee lR,mxn; Example 6.S. 6.8. Characterize Characterize all right inverses of a matrix A ]Rmx"; equivalently, find all AR = solutions R of the equation AR = 1Imm., Here, we write 1m Im to emphasize the m x m identity matrix, matrix. Solution: There exists a right inverse if and only if R(Im) R(A) and this is 7£(/m) S; c 7£(A) equivalent to Im. Clearly, Clearly, this can occur occur if if and only if if rank(A) rank(A) = = rr = m (since AA + +I1m this can and only m (since equivalent to AA m = 1m. + rr ::: is then a right inverse). All right inverses < m) m) and this is equivalent to A A being onto (A (A+ of A are then of the form of A R = A+ 1m
+ (In
- A+ A)Y
=A++(I-A+A)Y, + A+ A = I/ where Y Ee lR,nxm E"xm is arbitrary, arbitrary. There is a unique right inverse if and and only if A A 1 (N(A) = 0), in which case A A must be invertible and R R = A-I. (AA(A) = A" .
Example 6.9. 6.9. Consider Consider the system of linear first-order difference equations Example (6,8)
46 46
Equations Chapter 6. Linear Equations
nxmxm IR nxxn" and B E IR (n(rc>l,ra>l). ~ I, m ~ I). The vector Jt* Xk in linear system theory is with A Ee R" fieR" at time time k while while Uk is the the input (control) vector. known as as the known the state state vector vector at Uk is input (control) vector. The The general general solution solution of of (6.8) (6.8) is is given given by by
k-J Xk
= Akxo
+ LAk-J-j BUj
(6.9)
j=O
k
~Axo+[B.AB •...• A
Uk-J ] Uk-2
k-J
~o
B]
(6.10)
[
for kk > 1. We the question: question: Given Given XQ 0, does does there exist an sequence for ~ 1. We might might now now ask ask the Xo = 0, there exist an input input sequence k {uj x^ va in [Uj }}y~Q jj^ such such that takes an an arbitrary arbitrary value W ? In In linear linear system system theory, is aa question {u j 1 ~:b that Xk Xk takes value in 1R"? theory, this this is question of of reacbability. reachability. Since Since m ~ > I, 1, from from the the fundamental fundamental Existence Existence Theorem, Theorem, Theorem 6.2, we see that (6.8) is reachable if and only if if R([ B, AB, ... , A n - J B]) = 1R"
or, equivalently, if if or, equivalently, if and and only only if rank [B, AB, ... , A n - J B]
= n.
A related related question question is is the the following: following: Given Given an an arbitrary arbitrary initial initial vector vector XQ, does there there exexA Xo, does j such ist an an input input sequence sequence {u {"y}"~o such that that xXnn = = O? 0? In linear linear system system theory, theory, this this is is called called controllability. if controllability. Again from Theorem Theorem 6.2, we see that (6.8) is controllable if and only if
l'/:b
Clearly, reachability always implies controllability and, if A A is nonsingular, control1 lability and and reachability are equivalent. equivalent. The The matrices = [~ [ ° ~] andB5 == [~] f ^ 1provide providean an A = lability reachability are matrices A Q1and example example of of aa system system that that is is controllable controllable but but not not reachable. reachable. The standard conditions conditions with analogues for continuous-time models The above are standard with analogues for continuous-time models (i.e., (i.e., linear linear differential differential equations). equations). There There are are many many other other algebraically algebraically equivalent equivalent conditions. conditions.
Example We now now introduce Example 6.10. 6.10. We introduce an an output output vector vector Yk yk to to the the system system (6.8) (6.8) of of Example Example 6.9 6.9 by the equation by appending appending the equation (6.11) pxn E IR Pxn e R
pxm E IR Pxm €R
with C and (p pose some the with and D (p ~ > 1). 1). We We can can then then pose some new new questions questions about about the overall system that are are dual to reachability reachability and and controllability. overall system that dual in in the the system-theoretic system-theoretic sense sense to controllability. The The answers answers are are cast cast in in terms terms that that are are dual dual in in the the linear linear algebra algebra sense sense as as well. well. The The condition condition dual reachability is knowledge of l';:b dual to to reachability is called called observability: observability: When When does does knowledge of {u {"7j r/:b }"!Q and and {Yj {y_/}"~o suffice to determine xo? As aa dual we have have the of suffice to determine (uniquely) (uniquely) Jt dual to to controllability, controllability, we the notion notion of 0? As reconstructibility: When does knowledge of r/:b and and {;y/}"Io {YJ lj:b suffice to determine reconstructibility: When does knowledge of {u {wjy }"~Q suffice to determine result from theory is the following: following: (uniquely) xxn? The fundamental fundamental duality duality result from linear linear system system theory is the (uniquely) nl The
(A. [controllablcl if (AT,T. B TT)] is observable observable [reconsrrucrible] (A, B) B) iJ is reachable [controllable] if and and only if if(A [reconstructive].
6.4 Inverses 6.4 Some Some Useful Useful and and Interesting Interesting Inverses
47
To To derive derive aa condition condition for for observability, observability, notice notice that that
k-l
Yk = CAkxo
+L
CAk-1-j BUj
+ DUk.
(6.12)
j=O
Thus, Thus,
Yo - Duo Yl - CBuo - Du] (6.13)
r
Yn-] -
Lj:~ CA n - 2 -j BUj - DUn-l
Let denote the the (known) (known) vector vector on on the the left-hand of (6.13) (6.13) and denote the the matrix on Let v denote left-hand side side of and let let R denote matrix on the By the fundamental the right-hand right-hand side. side. Then, Then, by by definition, definition, v Ee R(R), Tl(R), so so aa solution solution exists. exists. By the fundamental Uniqueness Theorem, Theorem, Theorem Theorem 6.6, Uniqueness 6.6,the thesolution solutionisisthen thenunique uniqueififand andonly onlyififN(R) N(R) ==0,0, or, if or, equivalently, equivalently, if if and and only only if
6.3 6.3
A More Equation A More General General Matrix Matrix Linear Linear Equation
mxn mxq q , and C E jRpxq. Theorem 6.11. Let A Ee R jRmxn, B Ee R jRmx ,B , and C e Rpxti. Then the the equation
AXC=B
(6.14)
+ + has AA + BC+C B, in case the general solution solution is the has aa solution solution if if and and only only if if AA BC C = = B, in which which case the general is of of the form (6.15) n p jRnxp where Y €E R * is arbitrary. arbitrary.
A the notion A compact compact matrix matrix criterion criterion for for uniqueness uniqueness of of solutions solutions to to (6.14) (6.14) requires requires the notion + of the Kronecker product of matrices for its statement. Such a criterion (C C+ ® A of the Kronecker product of matrices for its statement. Such a criterion (CC
6.4 6.4
Some Some Useful Useful and and Interesting Interesting Inverses Inverses
In interest are nonsingular. Listed In many many applications, applications, the the coefficient coefficient matrices matrices of of interest are square square and and nonsingular. Listed below is useful matrix block matrices, below is aa small small collection collection of of useful matrix identities, identities, particularly particularly for for block matrices, asasnxn nxm jRnxn,, B E jRnxm,, C E sociated sociated with matrix inverses. In these identities, A Ee R ER e jRmxn, Rmxn, mxm and D D € E E jRm xm.. Invertibility Invertibility is is assumed assumed for for any any component or subblock subblock whose whose inverse inverse is is and component or indicated. Verification of of each identity is recommended as indicated. Verification each identity is recommended as an an exercise exercise for for the the reader. reader.
48
Chapter 6. 6. Linear Linear Equations Equations Chapter 1. A-Il -- A-IB(D1. (A (A + BDC)-I BDCr1 == A~ A~lB(D~lI + CA-IB)-ICA-I. CA~lB)~[CA~l. This the Sherman-Morrison-Woodbury many This result result is is known known as as the Sherman-Morrison-Woodbury formula. formula. It It has has many applications (and is frequently "rediscovered") including, for example, formulas applications (and is frequently "rediscovered") including, for example, formulas for for the matrices such the inverse inverse of of aa sum sum of of matrices such as as (A (A + + D)-lor D)"1 or (A-I (A"1 + + D-I)-I. D"1) . It It also also as yields yields very very efficient efficient "updating" "updating" or or "downdating" "downdating" formulas formulas in in expressions expressions such such as —1 x (A xx TT )) -I (with A E lRnxn x Ee E") lRn) that that arise in optimization (A + + JUT (with symmetric symmetric A e R" " and and ;c arise in optimization theory. theory.
r
2.
[~ ~
3.
[~ !/
4.
l
= [
~
r [~ -~ l [~ ~/ r [~ -~ 1 l
l
= = Both of these matrices satisfy the matrix equation which it Both of these matrices satisfy the matrix equation X2 X^ == /I from from which it is is obvious obvious l that X-I = X. Note that the positions of the / and / blocks may be exchanged. that X~ X. Note that the positions of the / and — / blocks may be exchanged.
r A~I [~ ~ r [-D~I~A-I D~I 1 ~r ~~B 1 r l [~ ~ r [-D~CF +-~~I~;BD-I l [~ ~
l
-A-I BD- I ] D- I
= [
.
l
5.
=
l
BC 6. [ / +c
7.
[~ ~
= [!C
l
= [ A-I
/
+_~~!~CA-I -A~BE
where the inverse where E E == (D (D -— CACA I B)-I B) (E (E is is the inverse of of the the Schur Schur complement complement of of A). A). This This result property 16 1.4. result follows follows easily easily from from the the block block LU LU factorization factorization in in property 16 of of Section Section 1.4. l
8.
=
D- I
where D- I C) where F F = = (A (A -—B ED C) -I.. This This result result follows follows easily easily from from the the block block UL UL factorfactorization in property property 17 1.4. ization in 17 of of Section Section 1.4.
EXERCISES EXERCISES mx
1. A E lR m xn".. 1. As As in in Example Example 6.8, 6.8, characterize characterize all all left left inverses inverses of of aa matrix matrix A eM mxk 2. A E€ lRmxn, and 2. Let Let A E mx ",BB EelRRfflxk andsuppose supposeAAhas hasananSVD SVDasasininTheorem Theorem5.1. 5.1.Assuming Assuming R(B) all solutions of the matrix linear 7Z(B) ~ c R(A), 7£(A), characterize characterize all solutions of the matrix linear equation equation
AX=B in in terms terms of of the the SVD SVD of of AA.
Exercises Exercises
49
3. Let Let jc, x, yy Ee E" IRn and that X x TTyy i= that 3. and suppose suppose further further that ^ 1. 1. Show Show that T -1
(/ - xy)
1
= I -
xTy -1
T
xy .
4. IRn and x TTyy ^i= 1. 4. Let x, y E€ E" and suppose suppose further that that X 1. Show Show that that
-cxJ C
'
where Cc = 1/(1 where 1/(1 -— x xTTy). y). x 5. Let Let A A e E R" 1R~ xn and let A -11 have have columns columns c\, Cl, ... and individual elements Yij. " and let A" ..., ,Ccn and individual elements y;y. l T Assume that Yji x/( i= 7^ 00 for some i/ and and j.j. Show Show that that the —A —eie : (i.e., (i.e., Assume that for some the matrix matrix B B = A -— ~i e;e; A with with — yl subtracted subtracted from from its its (zy)th (ij)th element) element) is is singular. A singular. l' Hint: Show that ct E<=N(B). M(B). Hint: Show that Ci
6. reconstructibility takes the 6. As in in Example Example 6.10, 6.10, check check directly directly that the condition condition for for reconstructibility the form form
N[
fA J
~
CA n -
1
N(A n ).
This page intentionally intentionally left left blank blank This page
Chapter 7 Chapter 7
Projections, Projections, Inner Inner Product Product Spaces, and and Norms Norms Spaces,
7.1 7.1
Projections
Definition 7.1. Let V be vector space with V V=X 0 Y. y. By Theorem 2.26, 2.26, every every vv Ee V Definition 7.1. Let V be a a vector space with X EEl By Theorem V has aa unique unique decomposition with xx eE X and and yy Ee y. y :• V V ---+ —>• X <; c V y. Define Define PX pX,y V has decomposition vv = xx + yy with by by PX,yV = x for all v
E
V.
PX,y is called the projection on on X X along y. Px,y is called the (oblique) (oblique) projection along 3^.
Figure 7.1 7.1 displays projection of on both and Y 3^ in the case case V = = Figure displays the the projection of vvon both X and in the
]R2.
y
x
Figure Figure 7.1. 7.1. Oblique Oblique projections. projections.
Theorem px.y is and P# pl. yy — = px.y. Theorem 7.2. 7.2. Px,y is linear linear and Px,y-
Theorem 7.3. A linear transformation transformation P is aa projection if and if it it is Theorem 7.3. A linear P is projection if and only only if is idempotent, idempotent, i.e., i.e., P a projection if and only if I —P a projection. Infact, Px,yp22 = = P. P. Also, Also, P P is isaprojectionifandonlyifl -P is isaprojection. Infact, Py,x Py.x — = II — -px.y. Proof: Suppose P say on along Y y (using (using the the notation of Definition Definition 7.1). 7.1). Proof: Suppose P is is aa projection, projection, say on X X along notation of 51 51
52 52
Chapter Product Spaces, Norms Chapter 7. 7. Projections, Projections, Inner Inner Product Spaces, and and Norms
2 Let u e V V be be arbitrary. arbitrary. Then Then Pv Pv = = P(x P(x + + y) y) = = Px Px = = x. x. Moreover, Moreover, P = P PPv Let v E p 2vv = Pv — = 2 2 Px Pv. Thus, p2 = p2 = P. Let X = v} Px = = xx = = Pv. Thus, P = P. P. Conversely, Conversely, suppose suppose P = P. Let X = {v {v Ee V V :: Pv Pv = = v} and Y y = {v {v E€ V V :: Pv 0}. It It is is easy easy to to check check that that X and Y 3^are aresubspaces. subspaces. We Wenow nowprove prove and Pv = OJ. X and that V= X y. First First note note that that iftfveX, then Pv If vv Ee Y, y, then = O. 0. Hence Hence that V X $0 y. v E X, then Pv = v. v. If then Pv Pv = if X ny, be arbitrary. Let if vv E€ X n y, then then vv = = O. 0. Now Now let let vu Ee V V be arbitrary. Then Then vv = Pv Pv + (I (I -- P)v. P)v. Let xx = = Pv, Pv, y y = = (I (I -- P)v. P)v. Then Then Px Px = = P p 22vv = = Pv Pv = = x x so so xx Ee X, while Py = P(l P)v X, while Py = P(I - P}v== 2 2 Pv -- P 0 so so Y y Ee y. Thus, Thus, V V= X y and and the on X along Y y is is P. P. Pv p vv = 0 X $0 Y the projection projection on X along Essentially the the same same argument argument shows shows that is the the projection on Y y along along X. D Essentially that /I -— P P is projection on X. 0 L Definition 7.4. where Y X1-, PX.X px.xl. is Definition 7.4. In In the the special special case case where y = X^, *s called called an an orthogonal orthogonal projecprojecL tion and tion and we we then then use use the the notation notation P PX = PX.XL PX,X x = xn Theorem 7.5. P E jRnxn is projection (onto R(P)) if 7.5. P e E" is the the matrix matrix of of an an orthogonal orthogonal projection (onto K(P)} if and and only only 2 T ifPp2 = p P . if P = pT. L Proof: Let Let P be an an orthogonal orthogonal projection projection (on (on X, say,along alongXX1-) } and andlet letx,jc,yy Ee jR" R"bebe Proof: P be X, say, arbitrary. Note that (I (/ -- P)x = (I (I -- PX,X^X = P Theorem 7.3. 7.3. Thus, Thus, P)x = px.xJ.)x = PXJ..xx by Theorem arbitrary. Note that x±,xx by L (I P)x Ee X X1-. Py Ee X, X, we (I - - P)x (/ -- P)x . Since Py wehave have(py)T ( P y f ((II - - P)x P)x==yT yTpT PT(I P)x==O.0. T T T Since and yy were arbitrary, we have P pT (I - P) P) = pT = pT Since xx and were arbitrary, we must must have (I — = O. 0. Hence Hence P = P PP = = P, P, T with the second second equality equality following following since since P is symmetric. symmetric. Conversely, Conversely, suppose suppose P is is aa with the pTPP is symmetric projection projection matrix and let let xx be arbitrary. Write Write xx = = P Pxx + (I (I -— P)x. Then symmetric matrix and be arbitrary. P)x. Then T x TTPpT P)x = = x x TTP(I P(l -- P}x P)x = = 0. O. Thus, since Px Px e E U(P), R(P), then (/ (I -- P)x P)x 6 E R(P)1x (I(I -- P)x ft(P)1 and P P must must be an orthogonal orthogonal projection. projection. D and be an 0
7.1.1 7.1 .1
The four orthogonal projections projections The four fundamental fundamental orthogonal
mxn Using the notation of Theorems A E jRmxII with SVD A A = Theorems 5.1 5.1 and 5.11, 5.11, let A 6 R = U!:V UT,VTT = UtSVf. Then Then U\SVr r
PR(A)
AA+
U\U[
Lu;uT, ;=1 m
PR(A).L
1- AA+
U2 U
!
LUiUT,
i=r+l 11
PN(A)
1- A+A
V2V{
L
ViVf,
i=r+l
PN(A)J.
A+A
VIV{
r LViVT i=l
are easily easily checked checked to to be be (unique) (unique) orthogonal orthogonal projections projections onto onto the the respective four fundafundaare respective four mental mental subspaces. subspaces,
7.1. 7.1. Projections Projections
53
n Example 7.6. Determine the the orthogonal orthogonal projection M" on another another nonzero Example 7.6. Determine projection of of aa vector vector v Ee IR on nonzero n vector w Ee IRn. R. Solution: Think Think of of the the vector w as as an an element element of of the the one-dimensional one-dimensional subspace subspace R( IZ(w). Solution: vector w w). Then desired projection Then the the desired projection is is simply simply
Pn(w)v = ww+v wwTv
=
(using 4.8) (using Example Example 4.8)
(WTV) T W. W
W
Moreover, the the vector orthogonal to w and such such that that v = P Pvv + zz is is given given by Moreover, vector zz that that is is orthogonal to wand by zz = = PK( Pn(w)"' = (/(l — - PK(W))V Pn(w»v = = vv — - (^-^ (:;~)j w. w. See See Figure Figure 7.2. 7.2. A A direct direct calculation calculation shows shows W)±Vv = that z and and u; are, in fact, orthogonal: orthogonal: ware, in fact, that
v
z
w
Pv
Figure 7.2. projection on on aa "line." Figure 7.2. Orthogonal Orthogonal projection "line."
Example 7.7. 7.7. Recall Recall the of Theorem Theorem 3.11. 3.11. There, { v \ ,... . . . ,, Vk} Vk} was was an an orthomormal orthornormal Example the proof proof of There, {VI, basis for aa subset subset S of arbitrary vector vector xx Ee R" chosen and and aa formula formula for for XI x\ basis for of W IRn.1. An An arbitrary IRn was was chosen appeared rather XI is simply the orthogonal projection projection of of rather mysteriously. The expression for x\ xX on on S. Specifically, Specifically,
Example 7.8. 7.8. Recall Recall the diagram of the four four fundamental subspaces. The indicated direct direct Example the diagram of the fundamental subspaces. The indicated sum decompositions of the domain E" IR n and co-domain IR Rmm are given easily as follows. Let Wn1 be arbitrary vector. vector. Then Then be an an arbitrary Let Xx Ee IR X
=
PN(A)u
+ PN(A)X
= A+ Ax + (I = VI
- A+ A)x
vt x + V Vi x 2
(recall VVT = I).
Chapter 7. 7. Projections, Projections, Inner Inner Product Product Spaces, Spaces, and and Norms Norms Chapter
54
Similarly, let y E e ]R arbitrary vector. Then Similarly, let Y IR mm be be an an arbitrary vector. Then Y
= PR(A)Y + PR(A)~Y = AA+y + ( l - AA+)y = U1Ur y + U2U[ Y (recall UU T =
I).
Example 7.9. 7.9. Let Let Example
Then Then
1/4 1/4 ] 1/4 1/4
o o 4] into the sum of of aa vector in N(A)-L A/'CA)-1 4V uniquely uniquely into the sum vector in r
and can decompose [2 3 and we we can decompose the the vector vector [2 3 and aa vector vector in in J\f(A), N(A), respectively, respectively, as as follows: follows: and
[!]~ =
=
7.2
A' Ax
+ (l -
A' A)x
1/2 -1/2 1/2 1/2 0] [ 2] [ -1/2 1/2 + [ 1~2 1~2 ~ o o
!
5/2] [-1/2] 1~2 . [ 5~2 +
Inner Product Product Spaces Inner
Definition V be vector space Then (', { • , .)• ) :: V V xx V is a inner Definition 7.10. 7.10. Let Let V be aa vector space over over R. IR. Then V -+ IR is a real real inner product ifif 1. (x, x) ::: Ofor aU E V and (x, only ifx O. > Qfor all x 6V ( x , xx)} ==00 if if and only ifx = = 0.
2. (x, y) (y,x)forallx,y 2. (x, y) = = (y, x) for all x, y eE V. V. 3. (x, {*, aYI cryi + + PY2) ^2) == a(x, a(x,Yl) y\) + + f3(x, /3(jt, Y2) y^}for for all allx, jc,Yl, yi,Y2 j2 E^ VVand/or and for all alia, R. a, f3ftEe IR. 3. T Example 7.11. 7.11. Let Let V = R". IRn. Then Then {^, (x, y} y) = X x TyY is is the the "usual" Euclidean inner inner product product or or Example V= "usual" Euclidean dot product. T Example IRn. Then (x, y)QQ = X T Qy, Qy, where Q Q = = Q Q TT > Example 7.12. 7.12. Let V V= = E". (jc, y) =X > 0 is is an an arbitrary n x n positive definite definite matrix, defines defines a "weighted" inner product. T Definition 7.13. 7.13. IfIf A Ee R IRmmxxn, ATE IR nnxm xm is the unique linear transformation transformation or map Definition ", then A e R T E R IRmm and andfor IRn. such that {x, (x, Ay) =- {AT (A x, y) for all x € for all y e R".
7.2. 7.2. Inner Inner product Product Spaces Spaces
55 55
It is easy easy to to check check that, that, with with this this more more "abstract" of transpose, transpose, and It is "abstract" definition definition of and if if the the T (i, y)th j)th element element of of A A is is a aij, then the the (i, (i, y)th j)th element element of of A AT is ap. It can also be checked (/, is a/,. It can also be checked (; , then T T that all the usual usual properties properties of of the the transpose transpose hold, hold, such = B BT AT. the that all the such as as (AB) (Afl) = A . However, However, the
definition above allows us us to to extend the concept concept of of transpose transpose to to the the case case of of weighted weighted inner inner definition above allows extend the mxn products in the following way. Suppose A A eE R ]Rm xn and let (., .) Q and (., .) R, with Q {-, -}g (•, -}R, with Qand and R positive positive definite, definite, be be weighted weighted inner inner products products on on R IRmm and and W, IRn, respectively. respectively. Then Then we we can can define the the "weighted transpose" A A## as the unique unique map map that that satisfies define "weighted transpose" as the satisfies # m (x, Ay) AY)Q (A#x, all xx E IRm IRn.1. (x, = (A x, Y)R y)R for all eR and for all Yy Ee W Q =
T # By Example Example 7.12 7.l2 above, above, we we must must then then have have X xT QAy = x x TT(A (A#{ Ry for all x, x, y. y. Hence Hence we we By QAy ) Ry for all # T # = (A#{ R. Taking transposes transposes (of AT Q = = RA RA#. must have QA QA = (A ) R. (of the usual variety) gives A Q . Since R is is nonsingular, nonsingular, we we find find Since R
A# = R-1A TQ. Q. A* = /r'A'
We can generalize the notion of = 0) to Q-orthogonality We can also also generalize the notion of orthogonality orthogonality (x (xTTyy = 0) to Q -orthogonality (Q (Q is is aa positive positive definite definite matrix). matrix). Two Two vectors vectors x, x, yy Ee IRn W are are Q-orthogonal <2-orthogonal (or (or conjugate conjugate with with T Q) if if (x, X T Qy O. Q Q-orthogonality is an important tool tool used used in in respect to to Q) respect ( x , yy)} QQ = X Qy = 0. -orthogonality is an important studying conjugate conjugate direction direction methods methods in in optimization optimization theory. studying theory. Let V be a a vector vector space space over over
C is is aa complex complex inner product ifif inner product
1. 0 for all all xx eE V and ((x, and only only if O. 1. (x, ( x , xx)) :::: > Qfor V and x , xx)) = =00 ifif and ifxx = = 0.
2. (x, y) (y, x) e V. V. 2. (x, y) = (y, x) for for all all x, x, yy E 3. (x,ayi = a(x, y2}forallx, y\, yY22 Ee V V and for alia, 3. (x, aYI + fiy f3Y2) a(x, y\) yll + fi(x, f3(x, Y2) for all x, YI, andfor all a, f3ft 6 E C. c. 2) = Remark 7.15. could use Remark 7.15. We We could use the the notation notation {•, (., -} ·)ec to to denote denote aa complex complex inner inner product, product, but but if the the vectors vectors involved complex-valued, the the complex complex inner inner product product is is to to be be understood. if involved are are complex-valued, understood. Note, too, too, from from part part 22 of of the the definition, definition, that that ((x, must be be real real for for all all x. Note, x , xx)) must x. Remark 7.16. Note from parts 22 and and 3 3 of of Definition Definition 7.14 7.14 that that we we have have Remark 7.16. Note from parts
(ax\ + fix2, y) = a(x\, y) + P(x2, y}. Remark 7.17. The Euclidean Euclidean inner inner product product of x, y E is given given by by Remark 7.17. The of x, eC C"n is n
(x, y)
= LXiYi = xHy. i=1
H The conventional the complex Euclidean inner inner product product is is (x, (x, y} y) = yyHxx but but we we The conventional definition definition of of the complex Euclidean HH use its its complex complex conjugate conjugate x yy here here for for symmetry symmetry with with the the real real case. use case.
Remark 7.1S. 7.18. A (x, y} Remark A weighted weighted inner inner product product can can be be defined defined as as in in the the real real case case by by (x, y)Q = Q — H Qy, for arbitrary arbitrary Q Q = Q QH > 0. o. The notion notion of Q Q-orthogonality Xx HHQy, > -orthogonality can can be be similarly similarly generalized to the the complex generalized to complex case. case.
56 56
Chapter 7. 7. Projections, Projections, Inner Inner Product Product Spaces, and Norms Chapter Spaces, and Norms
Definition 7.19. (V, IF) F) endowed is called Definition 7.19. A A vector vector space space (V, endowed with with aa specific specific inner inner product product is called an an inner If F = C, call V V aa complex complex inner space. If inner product product space. space. If IF = e, we we call inner product product space. If FIF == R, R we we call V Va space. a real real inner inner product product space. call Example 7.20. 7.20. Example T 1. Check that = IRR"n xxn" with with the the inner inner product product (A, (A, B) B) = = Tr Tr A AT B is is aa real real inner inner product product 1. Check that V = B space. Note other choices choices are since by of the function, space. Note that that other are possible possible since by properties properties of the trace trace function, T T BTTAA = Tr A BTT = = Tr BAT. Tr AT TrA BB = = Tr TrB = TrAB TrBA . nx H 2. V= = e Cnxn " with the inner inner product (A, B) B) = Tr Tr A is aa complex complex inner 2. Check Check that that V with the product (A, AHBB is inner product space. Again, other choices choices are possible. product space. Again, other are possible.
Definition V be inner product V, we (or Definition 7.21. 7.21. Let Let V be an an inner product space. space. For For vv eE V, we define define the the norm norm (or length) \\v\\ = = */(v, v). This This is ( - , -.).) . length) ofv ofv by by IIvll -J(V,V). is called called the the norm norm induced induced by by (', Example Example 7.22. 7.22. n 1. If If V V = = IR E." with inner product, 1. with the the usual usual inner product, the the induced induced norm norm is is given given by by II||i>|| v II = n 2 21
(Li=l V i )2.(E,=i
9\ 7
2. If V V = = en C" with inner product, 2. If with the the usual usual inner product, the the induced induced norm norm is is given given by by II\\v\\ v II = 22 ! "n (L...i=l IVi I )*. ) . (£? = ,l»,-l Theorem 7.23. Let be an an orthogonal an inner inner product Then Theorem 7.23. Let P P be orthogonal projection projection on on an product space space V. Then \\Pv\\ < Ilvll \\v\\forallv V. IIPvll ::::: for all v e E V. # Proof: Since P is is an projection, P p22 = P = pH. the notation p## denotes Proof: Since P an orthogonal orthogonal projection, = P =P . (Here, (Here, the notation P denotes # the unique transformation that that satisfies satisfies (Pu, ( P u , vv)} = = (u, (u, P v) for for all If this the unique linear linear transformation p#v) all u, u, vv eE V. If this # T = R" IRn (or (or en), where P p# is simply simply the usual P pT (or seems little too abstract, consider seems aa little too abstract, consider V = C"), where is the usual (or H # pH)). Hence ((Pv, v) = = (P (P 22v, v, v) v) = = (Pv, (Pv, P p#v) = ((Pv, Pv) = = \\Pv\\ IIPvll 22 ::: O. Now Now // -- PPisis P )). Hence P v , v) v) = P v , Pv) > 0. also aa projection, so the the above applies and and we also projection, so above result result applies we get get
0::::: ((I - P)v. v) = (v. v) - (Pv, v) =
from which the theorem follows. follows. from which the theorem
IIvll2 - IIPvll 2
0
Definition norm induced on an "usual" inner product The norm induced on an inner inner product product space space by by the the "usual" inner product Definition 7.24. 7.24. The is called norm. natural norm. is called the the natural In case V = = C" en or or V == R", IR n, the the natural natural norm norm is is also also called the Euclidean Euclidean norm. norm. In In In case called the the next next section, section, other on these spaces are are defined. defined. A converse to the other norms norms on these vector vector spaces A converse to the the above above IIx II — = .j(X,X}, an inner inner procedure is is also also available. That is, is, given norm defined defined by by \\x\\ procedure available. That given aa norm •>/(•*> x), an product can be defined via product can be defined via the the following. following.
7.3. 7.3. Vector Vector Norms Norms
57 57
Theorem 7.25 Theorem 7.25 (Polarization (Polarization Identity). Identity). 1. For x, x, yy E product is 1. For € m~n, R", an an inner inner product is defined defined by by
IIx+YIl2~IIX_YI12_
(x,y)=xTy=
IIx + yll2 _ IIxll2 _ lIyll2 2
2. For For x, x, yy eE C", en, an an inner inner product product is by 2. is defined defined by
where = ii = = \/—T. where jj = .J=I.
7.3 7.3
Vector Norms Vector Norms
Definition 7.26. vector space. IR is Definition 7.26. Let Let (V, (V, IF) F) be be aa vector space. Then Then II\ \ -. \ II\ : V V ---+ ->• R is aa vector vector norm norm ifit if it satisfies following three satisfies the the following three properties: properties: 1. Ilxll::: Ofor all x E V and IIxll = 0 ifand only ifx
2. Ilaxll = lalllxllforallx
E
Vandforalla
E
= O.
IF.
3. IIx + yll :::: IIxll + IIYliforall x, y E V. (This seen readily from the illus(This is is called called the the triangle triangle inequality, inequality, as as seen readily from the usual usual diagram diagram illus two vectors vectors in in ]R2 .) trating sum of trating the the sum of two R2.) Remark 7.27. 7.27. It the remainder this section to state for complexRemark It is is convenient convenient in in the remainder of of this section to state results results for complexvalued vectors. The specialization specialization to the real real case case is is obvious. obvious. valued vectors. The to the A vector said to Definition 7.28. Definition 7.28. A vector space space (V, (V, IF) F) is is said to be be aa normed normed linear linear space space if if and and only only ifif there exists exists aa vector vector norm norm II|| .• II|| :: V V ---+ -> ]R R satisfying satisfying the the three three conditions conditions of of Definition there Definition 7.26. 7.26.
Example Example 7.29. 7.29.
1. HOlder norms, p-norms, are by 1. For For x Ee en, C", the the Holder norms, or or p-norms, are defined defined by
Special Special cases: cases: (a) Ilx III = L:7=1
IXi
I (the "Manhattan" norm). 1
(b) Ilxllz = (L:7=1Ix;l2)2 = (c) Ilxlioo
= maxlx;l IE!!
=
(X
H
1
X)2
(the Euclidean norm).
lim IIxllp-
p---++oo
(The that requires (The second second equality equality is is aa theorem theorem that requires proof.) proof.)
58 58
Chapter 7. Projections, Projections, Inner Inner Product Spaces, and and Norms Chapter 7. Product Spaces, Norms 2. Some weighted weighted p-norms: p-norms: 2. Some L~=ld;lx;l, whered; O. (a) IIxll1.D ||JC||,.D = = E^rf/l*/!, where 4 > > 0. 1
(b) IIx IIz.Q — = (x = QH Ikllz.g (xhH Qx) QXY 2,> where Q = QH > > 0 (this norm is more commonly denoted II|| .• IIQ)' ||c). denoted
3. vector space space (C[to, (C[to, ttl, t \ ] , 1Ft), R), define define the vector norm 3. On On the the vector the vector norm 11111 = max 1/(t)I· to:::.t~JI
On the vector space space «e[to, ((C[to, ttlr, t\])n, 1Ft), R), define define the the vector On the vector vector norm norm 1111100 = max II/(t) 11 00 , tO~t:5.tl Theorem Inequality). Let Let x, x, yy E Fhcorem 7.30 7.30 (HOlder (Holder Inequality). e en. C". Then Ther, I
I
p
q
-+-=1. A particular particular case the Holder HOlder inequality A case of of the inequality is is of of special special interest. interest.
Theorem 7.31 (Cauchy-Bunyakovsky-Schwarz Inequality). Inequality). Let C". Then Theorem 7.31 (Cauchy-Bunyakovsky-Schwarz Let x, x, y y eE en. Then
with equality are linearly dependent. with equality ifif and and only only ifif xx and and yyare linearly dependent. x2 Proof' Consider the matrix [x y] y] e E en Proof: Consider the matrix [x C"x2 .. Since Since
is definite matrix, matrix, its must be be nonnegative. nonnegative. In words, is aa nonnegative nonnegative definite its determinant determinant must In other other words, H H H H H H H H H o y, we yl ~< 0 ~ < (x ( x xx)(yH ) ( y y y) ) -— (x ( x yy)(yH ) ( y x x). ) . Since Since yH y xx == xx y, we see see immediately immediately that that IXH \XHy\ D IIxll2l1yllz. 0 \\X\\2\\y\\2Note: This is not not the algebraic proof proof of of the the Cauchy-Bunyakovsky-Schwarz Note: This is the classical classical algebraic Cauchy-Bunyakovsky-Schwarz (C-B-S) e.g., [20, However, it to remember. remember. (C-B-S) inequality inequality (see, (see, e.g., [20, p. p. 217]). 217]). However, it is is particularly particularly easy easy to Remark 7.32. The between two x, yy eE C" en may by Remark 7.32. The angle angle e 0 between two nonzero nonzero vectors vectors x, may be be defined defined by cos# = 1I;~~1~1112' I, „ |.^|| , 0 < 0 < 5-. The C-B-S inequality is thus equivalent to the statement cos e = 0 ~ e ~ I' The C-B-S inequality is thus equivalent to the statement Il-Mmlylb — ^ | cose COS 01| ~< 1. 1. 1 Remark 7.33. Theorem 7.31 and Remark Remark 7.32 product spaces. Remark 7.33. Theorem 7.31 and 7.32 are are true true for for general general inner inner product spaces. x nxn Remark 7.34. The The norm norm II|| .• 112 ||2 is unitarily invariant, if U U E€ e C" " is is unitary, unitary, then Remark 7.34. is unitarily invariant, i.e., i.e., if then H H H \\Ux\\2 = IIxll2 \\x\\2 (Proof (Proof. IIUxili \\Ux\\l = xXHUHUx U Ux = xHx X X = = IIxlli)· \\x\\\). However, However, 11·111 || - ||, and || - 1^ IIUxll2 and 1I·IIClO
7.4. Matrix Matrix Norms Norms 7.4.
59 59
are not invariant. Similar Similar remarks remarks apply apply to to the the unitary unitary invariance invariance of of norms norms of of real real are not unitarily unitarily invariant. vectors under orthogonal transformation. vectors under orthogonal transformation. Remark 7.35. 7.35. If If x, yy E€ en C" are are orthogonal, orthogonal, then then we we have have the Identity Remark the Pythagorean Pythagorean Identity
Ilx ± YII~
= IIxll~
+ IIYII~,
_ _//. the proof proof of of which follows easily easily from from liz ||z||2 z z. the which follows II~2 = ZH
Theorem 7.36. All norms are equivalent; there exist 7.36. All norms on on en C" are equivalent; i.e., i.e., there exist constants constants CI, c\, C2 c-i (possibly (possibly depending on onn) depending n) such such that that
Example 7.37. 7.37. For For xx EG en, C", the the following following inequalities inequalities are are all all tight bounds; i.e., i.e., there there exist exist Example tight bounds; vectors for which equality holds: holds: vectors xx for which equality
Ilxlll :::: Jn Ilxlb Ilxll2:::: IIxll» IIxlloo :::: IIxll»
Ilxlll :::: n IIxlloo; IIxl12 :::: Jn Ilxll oo ; IIxlioo :::: IIxllz.
Finally, we Finally, we conclude conclude this this section section with with aa theorem theorem about about convergence convergence of of vectors. vectors. ConConvergence of of aa sequence sequence of of vectors to some some limit vector can can be converted into into aa statement vergence vectors to limit vector be converted statement about numbers, i.e., terms of about convergence convergence of of real real numbers, i.e., convergence convergence in in terms of vector vector norms. norms.
Theorem 7.38. 7.38. Let Let II· \\ • II\\ be be aa vector vector norm norm and and suppose suppose v, v, v(l), i» (1) , v(2), v(2\ ... ... Ee en. C". Then Then lim
V(k)
k4+00
7.4 7.4
=
v
if and only if
lim k~+oo
II v(k)
-
v
II = O.
Matrix Norms Norms Matrix
In this section we we introduce introduce the the concept concept of of matrix norm. As As with with vectors, vectors, the for In this section matrix norm. the motivation motivation for using matrix norms is is to to have have aa notion notion of of either either the the size size of of or or the the nearness of matrices. matrices. The The using matrix norms nearness of of former the latter to make make sense former notion notion is is useful useful for for perturbation perturbation analysis, analysis, while while the latter is is needed needed to sense of "convergence" vector space xn ,, IR) is "convergence" of of matrices. matrices. Attention Attention is is confined confined to to the the vector space (IRm (Wnxn R) since since that that is what arises arises in in the majority of of applications. applications. Extension Extension to to the complex case case is is straightforward what the majority the complex straightforward and and essentially essentially obvious. obvious. mx Definition 7.39. 7.39. II· || • II|| : IR Rmxn " -> E is is aa matrix matrix norm if it it satisfies the following Definition ~ IR norm if satisfies the following three three properties: properties:
IR mxn and
IIAII
2.
lIaAl1 =
3.
IIA + BII :::: IIAII + IIBII for all A, BE IRmxn. (As with vectors, this is called the triangle inequality.)
~
Ofor all A
E
lalliAliforall A E
IR
IIAII
if and only if A
1.
mxn
= 0
andfor all a E IR.
= O.
60
Chapter Chapter 7. 7. Projections, Projections, Inner Inner Product Product Spaces, Spaces, and and Norms Norms
Example 7.40. 7.40. Let A Ee lR,mxn. R mx ". Then the Frobenius norm (or matrix Euclidean norm) is defined by
IIAIIF
t
~ (t. ai;) I ~ (t.
altA)) 1
~
(T, (A' A)) 1
~
(T, (AA '));
(where rank(A)). ^wncic r = = laiiK^/i;;. Example 7.41. Let A A E e lR,mxn. Rmxn. Then the matrix matrix p-norms are defined by
=
IIAII P
max
IIAxll = max Ilxli p IIxllp=1
-_P
Ilxllp;60
IIAxll
. p
The following three special cases are important because they are "computable." "computable." Each is a theorem and requires a proof. I. The "maximum column sum" norm is 1.
2. 2. The "maximum row sum" norm is IIAlioo = max rE!!l.
(t
laUI).
J=1
3. 3. The spectral norm is tTL
T
IIAII2 = Amax(A A) = A~ax(AA ) = a1(A).
Note: IIA+llz
=
l/ar(A), where r
= rank(A).
mxn
Example 7.42. lR,mxn.. The Schattenp-norms Example 7.42. Let A EE R Schatten/7-norms are defined by I
IIAlls.p = (at'
+ ... + a!)"".
Some special cases of Schatten /?-norms p-norms are equal to norms defined previously. previously. For example, || . || 5 2 = || • ||5i00 = 11·115.2 = ||II . \\IIFF and and 11'115,00 = ||II •. ||112'2. The The norm norm ||II .• ||115.1 is often often called called the the trace trace norm. norm. 5>1 is mx Example 7.43. lR,mxn Example 7.43. Let A Ee K "._ Then "mixed" norms can also be defined by
IIAII p,q
= max IIAxil p 11.<110#0 IIxllq
Example 7.44. 7.44. The "matrix analogue analogue of of the the vector vector I-norm," 1-norm," IIAlis || A\\s == Li.j ^ j laij \ai}; I,|, isisaa norm. norm. Example The "matrix The concept of a matrix norm alone is not altogether useful since it does not allow us to estimate estimate the the size size of of aa matrix matrix product product A B in in terms of the the sizes sizes of of A A and and B B individually. individually. to AB terms of
7.4. 7.4. Matrix Matrix Norms Norms
61 61
Notice that that this this difficulty did not not arise vectors, although although there there are are analogues analogues for, Notice difficulty did arise for for vectors, for, e.g., e.g., inner definition. inner products products or or outer outer products products of of vectors. vectors. We We thus thus need need the the following following definition. mxn nxk Let A A eE R ]Rmxn,, B B eE R ]Rnxk.. Then norms \\II .• II", Ilfl'p,and Definition 7.45. 7.45. Let Definition Then the the norms \\a, \\II·• \\ and II \\. •lIy\\y are are mutually consistent if \\ A B \\ a < IIAllfllIBlly. \\A\\p\\B\\y. AA matrix matrix norm norm 11·11 \\ • \\isis said said toto be be consistent consistent mutuallyconsistentifIlABII,,::S ifif \\AB\\ < ||II A the matrix defined. II A B II ::s A ||1111|| Bfi||II whenever whenever the matrix product product is is defined.
Example Example 7.46. 7.46. 1. ||II·• ||/7 and II ||. •II ||pp for for all all pp are are consistent consistent matrix matrixnorms. norms. 1. II F and 2. The "mixed" 2. "mixed" norm norm
II· 11 100 ,
IIAxll1 = max - - = max laijl x;60 Ilx 1100 i,j
is consistent. For =B = [: \\ is aa matrix matrix norm norm but but it it is is not not consistent. For example, example, take take A A = B = ||Afl|| l. IIABIII,oo 2 while IIAIII,ooIlBIII,oo li00 = 2while||A|| li00 ||B|| 1>00 = 1.
J1. Then Then :].
The -norms are are examples examples of (or induced The pp-norms of matrix matrix norms norms that that are are subordinate subordinate to to (or induced by) by) i.e., aa vector vector norm, norm, i.e., IIAxl1 IIAII = max - - = max IIAxl1 x;60 IIx II Ilxll=1 IIAxll Pp . 11^4^11 (or, ., . . )),. For such subordinate oper(or, more more generally, generally, ||A|| IIAllp,q == max^o maxx;60 IIxll For such subordmate norms, norms, also also called caUedoperq ator norms, we clearly have ||Ajc|| < ||A||1|jt||. ||Afijc|| ::s < ||A||||fljc|| < IIAIIIIBllllxll, ||A||||fl||||jt||, atornorms, wec1earlyhave IIAxll ::s IIAllllxll· Since Since IIABxl1 IIAlIllBxll ::s it follows follows that that all all subordinate norms are are consistent. it subordinate norms consistent. Theorem = ||A|| ||jc*|| ifif the the matrix norm is There exists exists a a vector vector x* x* such such that that ||Ajt*|| IIAx*11 = IIAllllx*11 matrix norm is Theorem 7.47. 7.47. There subordinate the vector norm. subordinate to to the vector norm. Theorem 7.48. 7.48. IfIf \\II •. 11m is aa consistent matrix norm, norm, there there exists norm \\II .• \\IIvv Theorem \\m is consistent matrix exists aa vector vector norm consistent < \\A\\ \\x\\vv.' consistent with with it, it, i.e., i.e., HAjcJI^ IIAxliv ::s IIAlimm Ilxli
Not consistent matrix example, is subordinate subordinate to to aa vector vector norm. norm. For For example, Not every every consistent matrix norm norm is consider Then||A^|| consistentwith withII ||. •II ||F,F,but butthere theredoes does II F'F.Then II Ax 1122 ::s< II ||A|| A II Filx 112,2,sosoII ||. •112||2isisconsistent consider ||II •. \\ F ||jc|| not exist aa vector \^ • not exist vector norm norm ||II .• II|| such such that that ||A|| IIAIIFF is is given given by by max^o max x ;60 ",~~i'. Useful Results The following following miscellaneous miscellaneous results results about about matrix matrix norms norms are collected for for future future reference. The are collected reference. reader is invited to prove each of them as an exercise. The interested reader exercise. 1. II In II p
= 1 for all p, while x ]Rnxn, eE R" ", the
IIIn II F
= .jii.
2. For A A A for 2. following inequalities are all tight, i.e., i.e., there exist matrices A which equality holds: holds: which equality IIAIII ::s .jii IIAlb IIAIII ::s n IIAlloo, IIAIII ::s .jii II A IIF; IIAII2 ::s.jii IIAII I, IIAII2 ::s .jii IIAlloo, IIAII2::S IIAIIF; II A 1100 ::s n IIAII I , IIAlioo ::s .jii IIAII2, IIAlioo ::s .jii IIAIIF; IIAIIF ::s.jii IIAII I , IIAIIF ::s .jii IIAlb IIAIIF ::s .jii IIAlioo'
62 62
Chapter Norms Chapter 7. 7. Projections, Projections, Inner Inner Product Product Spaces, Spaces, and and Norms mxa 3. A EeR IR mxn 3. For For A .,
max laijl :::: IIAII2 :::: ~ max laijl. l.]
l.]
4. The The norms norms II|| .• IIF \\F and and II|| .• 112 ||2 (as (as well well as as all all the the Schatten Schatten /?-norms, not necessarily necessarily p-norms, but but not mx other p-norms) are unitarily unitarily invariant; invariant; i.e., i.e., for all A Ee IR Rmxn " and for all orthogonal x mxm and nxn matrices Q Q EzR and Z Ee IR M" ",, IIQAZlia ||(MZ||a = ||A|| or F. F. IRmxm IIAllaa fora fora = 2 or matrices
Convergence Convergence The following theorem uses matrix norms to convert a statement about convergence of a sequence of matrices into a statement about the convergence of an associated sequence of of scalars. scalars. (1) (2) Theorem 7.49. 7.49. Let II\\ ·11 be a matrix norm and suppose A, A A(I), A(2), -\\bea ,A ,... ... EeIRmxn. Rmx".Then Then
lim A (k)
k~+oo
= A if and only if
lim
k~+oo
IIA (k)
-
A II
= o.
EXERCISES EXERCISES + 1. P is an orthogonal projection, p+ = 1. If If P projection, prove that P = P. P.
2. Suppose P and Q are orthogonal projections and P + Q = I. Prove that P -— QQ must be an orthogonal orthogonal matrix. must be an matrix. + 3. Prove that /I -— A+ AA is an orthogonal projection. Also, prove directly that V V22Vl V/ isis an an orthogonal projection, where V2 ¥2 is is defined defined as as in in Theorem Theorem 5.1. orthogonal projection, where 5.1. nxn e IR Wmxn 4. Suppose that a matrix A A E has linearly independent columns. Prove that the orthogonal projection onto the space spanned by these column vectors is given by the P = A(AT -1 }AT. matrix P A(ATA) A)~ AT.
5. 5. Find the (orthogonal) projection of the vector [2 [2 33 4f 4]r onto the subspace of 1R R33 spanned by the plane 3;c -—yv + + 2z = = O. 0. spanned by the plane 3x x T 6. Prove that E" IR nxn ATBB is a real inner product " with the inner product (A, (A, B) B) == Tr A space. space.
7. Show that the matrix norms II|| .• 112 ||2 and and II|| .• IIF \\F are unitarily invariant. nxn 8. Definition: Let A Rnxn and denote denote its its set set of of eigenvalues eigenvalues (not (not necessarily 8. Definition: Let A Ee IR and necessarily distinct) distinct) by {A-i , .... . . , >, .An}. „ } . The The spectral radius of of A is the scalar by P.l, spectral radius A is the scalar
p(A) = max IA;I. i
63 63
Exercises Exercises
Let Let
A=[~14
0 12
~].
5
Determine ||A||F, IIAII \\A\\Ilt, IIAlb ||A||2, IIAlloo, HA^, and and peA). p(A). Determine IIAIIF' 9. Let Let 9.
A=[~4 9~ 2~].
Determine ||A||F, IIAII H A I dI ,, IIAlb ||A||2, IIAlloo, H A H ^ , and and p(A). (An nn xx nn matrix, all of of whose Determine IIAIIF' peA). (An matrix, all whose 2 n (n 2 + 1) /2, columns rows as well as columns and and rows as well as main main diagonal diagonal and and antidiagonal antidiagonal sum sum to to ss = = n(n l)/2, is called a "magic square" matrix. If M is a magic square matrix, it can be proved that || M Up = = ss for for all all/?.) p.) that IIMllp T 10. Let , where e IR R"n are are nonzero. Determine IIAIIF' ||A||F, IIAIII> ||A||j, IIAlb ||A||2, 10. Let A A = = xy xyT, where both both x, x, y y E nonzero. Determine and II||A||oo in terms terms of of IIxlla \\x\\a and/or and/or IlylljJ, \\y\\p, where where ex a and and {3 ft take the value value 1,2, 1, 2, or or (Xl oo as and A 1100 in take the as appropriate. appropriate.
This page intentionally intentionally left left blank blank This page
Chapter 8 Chapter 8
Linear Linear Least Least Squares Squares Problems Problems
8.1 8.1
The The Linear Linear Least Least Squares Squares Problem Problem
mx Problem: A E jRmxn Problem: Suppose Suppose A e R " with with m 2: > nand n and bb E <=jRm Rm isisaagiven givenvector. vector.The Thelinear linearleast least squares consists of of the the set squares problem problem consists of finding finding an an element element of set
x = {x
E jRn : p(x)
=
IIAx - bll 2 is minimized}.
Solution: The The set X has verified properties: set X has aa number number of of easily easily verified properties: 1. vector xx E Ax is the residual residual associated 1. A vector e X X if if and and only only if if AT ATrr = 0, where where r = bb -— Ax is the associated T T with x. Ax = AT x. The equations AT A rr =— 0 can be rewritten in the form A ATTAx =A bb and the latter form form is known as the normal normal equations, i.e., xx E latter is commonly commonly known as the equations, i.e., e X X if if and and only only ifif x is is aa solution solution of of the the normal normal equations. equations. For For further further details, details, see see Section Section 8.2. 8.2. 2. X if 2. A A vector vector xx E X if and and only onlv if if xx is is of of the the form x=A+b+(I-A+A)y, whereyEjRnisarbitrary.
(8.1)
To the form form To see see why why this this must must be be so, so, write write the the residual residual r in in the
r = (b - PR(A)b)
+ (PR(A)b -
Ax).
Now, Now, (PR(A)b (Pn(A)b -— Ax) AJC) is is clearly clearly in in 'R(A), 7£(A), while while (b - PR(A)b) = (I - PR(A))b = PR(A),,-b E 'R(A)-L
so so these these two two vectors vectors are are orthogonal. orthogonal. Hence, Hence,
IIrll~ = lib - Axll~ = lib - PR(A)bll~
+ IIPR(A)b -
Axll~
from the Pythagorean identity identity (Remark (Remark 7.35). 7.35). Thus, Thus, IIAx ||A.x -— bll~ b\\\ (and (and hence hence p(x) p ( x ) == from the Pythagorean \\Ax assumes its its minimum minimum value value if if and and only only if II Ax -—b\\2) b 112) assumes if (8.2)
65
66
Chapter 8. Squares Problems Problems Chapter 8. Linear Linear Least Least Squares + and this equation always has has aa solution since AA AA+b R(A). By By Theorem 6.3, all all and this equation always solution since b eE 7£(A). Theorem 6.3, solutions are of of (8.2) (8.2) are of the the form form solutions of
x = A+ AA+b
+ (I -
A+ A)y
=A+b+(I-A+A)y,
where ]R.n is minimum value value of is then then clearly clearly equal equal to to where yy eE W is arbitrary. arbitrary. The The minimum of pp ((x) x ) is lib - PR(A)bll z = ~
11(1 Ilbll z,
AA+)bI1 2
the inequality following following by 7.23. the last last inequality by Theorem Theorem 7.23. 3. X X is is convex. To see why, consider two arbitrary Xl = A++bb + (I - A + A) y 3. convex. To see why, consider two arbitrary vectors vectors jci = A (I — A+A)y and Xz = A+b + (I - A+A)z A+ A)z in in X. X. Let Let 86 eE [0,1]. the convex combination and *2 = A+b (I — [0, 1]. Then Then the convex combination 8x, (1 - - 8)xz (I(I- -A+ (1(1 - 8)z) is clearly in X. 0*i + (1 #)*2 ==A+b A+b++ A+A)(8y A)(Oy++ - 0)z) is clearly in X.
4. X has has aa unique unique element x" of of minimal minimal2-norm. x" = = A++bb isis the unique vector vector 4. element x* 2-norm. In In fact, fact, x* the unique that this "double minimization" problem, problem, i.e., i.e., x* x* minimizes minimizes the the residual residual p(x) that solves solves this "double minimization" p(x) and is of minimum minimum 2-norm does so. immediately from from and is the the vector vector of 2-norm that that does so. This This follows follows immediately convexity or directly directly from from the the fact fact that that all all xx eE X X are are of of the the form form (8.1) (8.1) and and convexity or
which two vectors orthogonal. which follows follows since since the the two vectors are are orthogonal. 5. is aa unique i.e., X = {x"} {x*} = {A+b}, if 5. There There is unique solution solution to to the the least least squares squares problem, problem, i.e., X = = {A+b}, if + and only or, equivalently, if and and only if rank (A) = n. n. and only if if A A +AA = Ilor, equivalently, if only if rank(A) Just solution of of linear squares Just as as for for the the solution linear equations, equations, we we can can generalize generalize the the linear linear least least squares problem case. problem to to the the matrix matrix case. mx mxk ]R.mxn BE ]R.mxk.. The Theorem 8.1. Let A Ee E " and B €R The general solution to
min IIAX -
XElR Plxk
Bib
is of the form form is of the X=A+B+(I-A+A)Y, xfc where ]R.nxk is arbitrary. The The unique unique solution solution of of minimum minimum 2-norm 2-norm or F-norm is is where Y Y E€ R" is arbitrary. or F-norm X= = A+B. X A+B.
Remark 8.2. 8.2. Notice linear least least squares squares problem look exactly exactly the Remark Notice that that solutions solutions of of the the linear problem look the same as as solutions solutions of of the system AX = B. B. The is that case same the linear linear system AX = The only only difference difference is that in in the the case of linear linear least squares solutions, solutions, there there is no "existence condition" such such as as R(B) R(A). of least squares is no "existence condition" K(B) S; c 7£(A). If the to be satisfied, then equality holds squares If the existence existence condition condition happens happens to be satisfied. then equality holds and and the the least least squares
8.3 and Other 8.3 Linear Linear Regression Regression and Other Linear Linear Least Least Squares Squares Problems Problems
67
O. Of all solutions that give a residual of 0, the unique solution X X = residual is 0. = A++BB has -norm. minimum 2-norm or F F-norm. + Remark 8.3. If we take Im in in Theorem Theorem 8.1, 8.1, then can be Remark 8.3. If we take B B = 1m then X = A A+ can be interpreted interpreted as as saying that the Moore-Penrose pseudoinverse of A A is the best (in the matrix 2-norm sense) AX approximates the identity. matrix such that AX
Remark 8.4. Many other interesting and useful approximation results are available for the x F -norm). One such is the following. lR~xn" with SVD matrix 2-norm (and F-norm). following. Let A Ee M™ A
= U~VT = LOiUiV!. i=l
Then a best rank kk approximation to A A for 1 l <:sf ck <:sr r, , i .i.e., e . , aa solution to min IIA - MIi2,
MEJRZ'xn
is given given by is by k
Mk =
LOiUiV!. i=1
The special case in which m = nand A Ee =n and k = = n -— 1 gives a nearest singular matrix to A
8.2 8.2
lR~ xn .
Geometric Solution Geometric Solution
Looking at the schematic provided in Figure 8.1, it is apparent that minimizing IIAx || Ax -—bll b\\2 2 is equivalent to finding the vector xx E Ax is closest to b e lR Wn1 for which pp = — Ax b (in the Euclidean Ax must be orthogonal to R(A). Ay is an arbitrary norm sense). Clearly, rr = bb -— Ax 7£(A). Thus, if Ay R(A) (i.e., yy is arbitrary), we must have vector in 7£(A) 0= (Ay)T (b - Ax) =yTAT(b-Ax) = yT (ATb _ AT Ax). T T Since y is arbitrary, we must have A AT - A AT Ax = 0 or A ATr A;c Ax = = AT bb — Ax ATb. b. T T Special case: If A A is full (column) rank, then x = A)-l ATb. = (AT (A A) A b.
8.3 8.3 8.3.1 8.3.1
Linear Regression Regression and and Other Other Linear Linear Least Least Squares Squares Linear Problems Problems Example: Linear regression
Suppose we have m measurements (ll, YI), ... (t\,y\), . . . ,, (trn, (tm,yYm) m) for which we hypothesize a linear (affine) relationship (8.3) y = at + f3
68
Chapter 8. Linear Chapter 8. Linear Least Least Squares Squares Problems Problems
b
r
p=Ax
Ay E R(A)
Figure 8.1. of bb on on R(A). K(A). Figure S.l. Projection Projection of for certain constants constants a. a and and {3. way to to solve this problem problem is to find find the the line line that that best best fits fits for certain ft. One One way solve this is to the data in the least squares sense; i.e., with the model (8.3), we have the data in the least squares sense; i.e., with the model (8.3), we have
YI
= all + {3 + 81 ,
Y2
= al2 + {3 + 82
where 8 and we we wish wish to to minimize minimize 8? 8;. Geometrically, Geometrically, we we where &\,..., 8mm are are "errors" "errors" and 8\ + + ... • • • + 8^1 , ... , 8 are trying to to find find the the best best line line that that minimizes minimizes the of the) the) distances from the are trying the (sum (sum of of squares squares of distances from the given data data points. See, for for example, example, Figure Figure 8.2. 8.2. given points. See, y
Figure 8.2. linear regression. regression. Figure 8.2. Simple Simple linear
Note distances are are measured measured in in the sense from to [he the line Note that that distances the vertical venical sense from the the points point!; to line (as (a!; indicated, indicated. for for example, example. for for the the point point (t\, (tl. y\}}. YIn. However, However. other other criteria criteria arc nrc possible. po~~iblc. For For excxcould measure the distances in the the horizontal horizontal sense, perpendiculnr distance ample, ample, one one could measure the distances in sense, or or the the perpendicular distance from the the points points to the line line could could be be used. used. The is called called total least squares. Instead from to the The latter latter is squares. Instead of use 1-norms I-norms or two are are computationally of 2-norms, 2-norms, one one could could also also use or oo-norms. oo-norms. The The latter latter two computationally
8.3. Linear Linear Regression Regression and and Other Other Linear Linear Least Least Squares Squares Problems Problems 8.3.
69
much more difficult difficult to handle, and thus we present only the more tractable 2-norm case in text that follows. follows. The m ra "error equations" can be written in matrix form as Y = Ax +0,
where
We then want to solve the problem minoT 0 = min (Ax - y)T (Ax - y) x
or, equivalently, min lIoll~ = min II Ax - YII~. x
(8.4)
T T AT Solution: xx — is aa solution solution of of the equations A yy where, for the Solution: = [^1 [~] is the normal normal equations ATAx Ax = A special form form of of the the matrices above, we special matrices above, we have have
and and AT Y = [ Li ti Yi
LiYi
J.
The solution for the parameters a and f3ft can then be written
8.3.2
Other least squares problems
(8.3) but rather is of of the the form form Suppose the hypothesized model is not the linear equation (S.3) y = f(t) =
Cl0!(0
+ • • • 4- cnn(t).
(8.5) (8.5)
In (8.5) the ¢i(t) >,(0 are given (basis) functions functions and the Ci c; are constants to be determined to minimize least squares squares error. error. The The matrix problem is still (S.4), (8.4), where where we now have minimize the the least matrix problem is still we now have
An important special case of (8.5) is least squares polynomial approximation, which ; i 1l - ,, i i Ee!!, corresponds to choosing ¢i 0,•(t) (?)==t t'~ n,although althoughthis thischoice choicecan canlead leadtotocomputational computational
70 70
Chapter 8. Linear Problems Chapter 8. Linear Least Least Squares Squares Problems
difficulties ill conditioning for large n. difficulties because of numerical ill n. Numerically better approaches are based on orthogonal polynomials, piecewise polynomial functions, splines, etc. etc. The key feature in (8.5) is that the coefficients coefficients Ci c, appear appear linearly. The basis functions functions >,- can be arbitrarily nonlinear. Sometimes a problem in which the Ci'S c, 's appear nonlinearly ¢i nonlinearly can be converted into into aa linear linear problem. For example, example, if if the the fitting fitting function function is is of of the can be converted problem. For the form form C2 y = ff( (t) t ) = c\e Y c, eC2i / ,, then then taking taking logarithms logarithms yields yields the the equation equation logy log y = = logci log c, + + cjt. c2f. Then Then — logy, c\ = log logci, defining yy = log y, c, c" and C2GI = cj_ C2 results in a standard linear least squares problem. problem.
8.4 8.4
Least Squares Squares and and Singular Singular Value Decomposition Least Value Decomposition
In the numerical linear algebra literature (e.g., [4], [4], [7], [7], [11], [23]), it is shown that solution of linear least squares problems via the normal equations can be a very poor numerical method in in finite-precision finite-precision arithmetic. arithmetic. Since Since the the standard standard Kalman Kalman filter essentially amounts method filter essentially amounts to sequential updating of normal equations, it can be expected to exhibit such poor numerical behavior in practice (and (and it it does). does). Better methods are are based based on on algorithms algorithms that behavior in practice Better numerical numerical methods that T AT work directly and solely on A A itself itself rather than A A. Two basic classes of algorithms are S VD and and QR QR (orthogonal-upper (orthogonal-upper triangular) triangular) factorization, factorization, respectively. respectively. The The former former based on SVD is much more expensive but is generally more reliable and offers offers considerable theoretical insight. insight. In this section we investigate solution of the linear least squares problem min II Ax x
b11 2 ,
A E IRmxn , bE IR m ,
(8.6)
via the SVD. Specifically, Specifically, we assume that that A has an an SVD SVD given given by VT = U,SVr U\SVf via the SVD. we assume A has by A A = UT, U~VT Theorem 5.1. 5.1. We now note that as in Theorem IIAx -
bll~ = IIU~VT x =
II ~ VT X
-
-
bll~
U T bll; since
II . Ib is unitarily invariant
=11~z-cll~ wherez=VTx,c=UTb =
II
[~ ~] [ ~~ ] - [ ~~ ] II:
= II [
sz~~ c, ] II:
The last last equality follows from from the the fact fact that ], then then II||u||^ = II||i>i The equality follows that if if vv = [£ [~~]. v II ~ = viii\\\~ + II\\vi\\\ v211 ~ (note (note that orthogonality is is not not what what is used here; here; the the subvectors subvectors can different lengths). lengths). This This that orthogonality is used can have have different explains why it is convenient to work above with the square of the norm rather than the concerned. the two are equivalent. norm. As far as the minimization is concerned, equivalent. In fact. fact, the last quantity above is clearly minimized by taking z\ S~lc\. The subvector zZ22 is arbitrary, arbitrary, z, = S-'c,. while the the minimum minimum value of II\\Ax is IIl^llr while value of Ax -— b\\^ b II ~ is czll ~.
8.5. Least Squares Squares and B.S. Least and QR QR Factorization Factorization
71 71
Now transform transform back back to to the the original coordinates: Now original coordinates: x = Vz
= [VI V2 1[ ~~ = VIZ I + V2Z2 = =
]
+ V2Z2 vls-Iufb + V2Z2. VIS-ici
The The last last equality equality follows follows from from c
= UTb = [
f: ]= [ ~~ l
~
Note that that since Z2 is is arbitrary, an arbitrary arbitrary vector R(V22)) = N(A). Thus, Thus, xx has has Note since 12 arbitrary, V V22Zz2 is is an vector in in 7Z(V = A/"(A). m A++A)_y, A) y, where where yy Ee R ffi.m is arbitrary. been written in the form x = = A++bb + + (I (/ -— A is arbitrary. This This agrees, agrees, of with (8.1). (8.1). of course, course, with The minimum value value of of the the least residual is is The minimum least squares squares residual
and we clearly have that that and we clearly have
minimum least -4=> bb is to all in U minimum least squares squares residual residual is is 0 0 {::=:} is orthogonal orthogonal to all vectors vectors in U22 {::=:} •<=^
orthogonal to all vectors vectors in b is is orthogonal to all in R(A)l. 7l(A}L
{::=:}
b E R(A).
+ Another expression expression for for the minimum residual is II|| (I (/ -— AA )b|| 2 . This easily since Another the minimum residual is AA +)bllz. This follows follows easily since 2 T T 2 T ||(7 AA+)b\\ \\U2Ufb\\l = b U U^U UJb = b U U*b = \\U?b\\ . 11(1- AA+)bll~2 = 11U2U!b"~ = b U2ZV!V22V!b = bTVZV!b = IIV!bll~.2 2 Finally, an important case of of the least squares problem is is the the Finally, an important special special case the linear linear least squares problem X so-called full-rank problem, problem, i.e., A eE 1R™ ffi.~xn. this case the SVD A is by so-called full-rank i.e., A ". In In this case the SVD of of A is given given by
A = V:EV there is is thus thus "no to the the solution. A UZVTT = = [VI [U{ Vzl[g]Vr, t/2][o]^i r > and and there "no V V22 part" part" to solution.
8.5 8.5
Least Squares and and QR QR Factorization Least Squares Factorization
In this section, section, we look at at the solution of squares problem In this we again again look the solution of the the linear linear least least squares problem (8.6) (8.6) but but this this time in in terms terms of of the the QR factorization. This This matrix matrix factorization factorization is cheaper to to compute time QR factorization. is much much cheaper compute than an numerical enhancements, enhancements, can be quite than an SVD SVD and, and, with with appropriate appropriate numerical can be quite reliable. reliable. To simplify the exposition, we add add the the simplifying assumption that that A A has has full column To simplify the exposition, we simplifying assumption full column XM rank, i.e., A eE R™ ffi.~xn.. It then possible, possible, via via aa sequence sequence of Householder or or Givens Givens rank, i.e., A It is is then of so-called so-called Householder transformations, to to reduce A in in the following way. way. A A finite finite sequence sequence of orthogonal transformations, reduce A the following of simple simple orthogonal row transformations (of Householder Householder or or Givens can be be performed performed on A to to reduce row transformations (of Givens type) type) can on A reduce itit to If we label the such orthogonal the to triangular triangular form. form. If we label the product product of of such orthogonal row row transformations transformations as as the mxm orthogonal matrix QT ffi.mxm,, we have QT E€ R (8.7)
72
Chapter 8. 8. Linear Least Squares Problems Chapter Linear Least Squares Problems
x mx where R eE M£ ffi.~xn" is is upper upper triangular. triangular. Now Now write write Q Q = = [QI Qz], where Q\ QI e E R ffi.mxn and where [Q\ Q " and 2], where IX(m ) Qz2 E Q € ffi.m K" x(m-n). ~" . Both Both Q Q\I and and Qz <22 have have orthonormal orthonormal columns. columns. Multiplying Multiplying through through by by Q Q in (8.7), (8.7), we we see that in see that
A=Q[~J = [QI
Qz] [
(8.8)
~
]
= QIR.
(8.9)
(8.8), or or (8.9) are variously variously referred referred to to as as QR QR factorizations factorizations of A. Note Note that that Any of Any of (8.7), (8.7), (8.8), (8.9) are of A. (8.9) is is essentially essentially what what is is accomplished accomplished by by the the Gram-Schmidt Gram-Schmidt process, process, i.e., by writing writing (8.9) i.e., by AR~ "triangular" linear (given by coefficients of AR- l1 = Q\ QI we we see see that that aa "triangular" linear combination combination (given by the the coefficients of R-l)I ) of of the the columns columns of yields the the orthonormal orthonormal columns of Q I. R~ of A yields columns of Q\. Now Now note note that that IIAx -
bll~
= IIQ T Ax = II [
QTbll~ since II . 112 is unitarily invariant
~ ] x - [ ~~ ] If:,
The last last quantity quantity above above is is clearly clearly minimized minimized by by taking taking xx = R- lIc\ Cl and and the the minimum minimum residual residual The =R + 1l is Equivalently, we we have have x = Qf b == A +bb and and the the minimum minimum residual residual is is IIIIC?^!^Qr bllz' is Ilczllz. \\C2\\2- Equivalently, = RR~ Q\b
EXERCISES EXERCISES xn m + 1. For A E ffi. mxn ffi. m,, and and any any y E ffi. n , check check directly directly that that (I A)y and 1. For €W , ,b b Ee E e R", (I -- A++A)y and A A+b b are orthogonal orthogonal vectors. are vectors.
2. yt): 2. Consider Consider the the following following set set of of measurements measurements (*,, (Xi, Yi): (1,2), (2,1), (3,3). (a) Find Find the the best best (in (in the the 2-norm 2-norm sense) sense) line line of of the the form form yy = = ax ax + + ftfJ that that fits fits this (a) this data. data.
(b) (in the sense) line = ay of the the form form jc x = ay + (3 fJ that that fits fits this this (b) Find Find the the best best (in the 2-norm 2-norm sense) line of data. data. n• q, and and q qz2 are are two two orthonormal orthonormal vectors vectors and and b b is is aa fixed fixed vector, vector, all in ffi. 3. Suppose 3. Suppose qi all in R".
(a) Find the the optimal optimallinear combination aq^ aql + + (3q2 that is is closest closest to to b b (in (in the the 2-norm (a) Find linear combination fiq2 that 2-norm sense). sense). (b) Let Let rr denote the "error vector" bb — - ctq\ aql — - {3qz. that rr is is orthogonal (b) denote the "error vector" flq2- Show Show that orthogonal to to both^i q2. both ql and and q2.
Exercises Exercises
73
4. Find all all solutions the linear linear least 4. Find solutions of of the least squares squares problem problem min II Ax - bll 2 x
when A = [
~
5. 5. Consider the problem of of finding the minimum 2-norm 2-norm solution solution of of the linear least least «rmarp« nrr»h1<=>m squares problem min II Ax - bl1 2 x
when A =
[~ ~
] and b = [
!1
The solution is
(a) Consider aa perturbation E\ = = [~ [0 pi of A, is aa small small positive (a) Consider perturbation EI ~] of A, where where 88 is positive number. number. of the the above Solve the perturbed perturbed version version of Solve the above problem, problem,
where AI = A+ What happens - yyII2 approaches 0? O? where AI = A + E E\.I . What happens to to IIx* ||jt* — ||2 as as 88 approaches
n
(b) Now the perturbation (b) Now consider consider the perturbation EI E2 == \[~0 s~\ of of A, A, where where again again 88 is is aa small small positive number. number. Solve the perturbed perturbed problem positive Solve the problem min II A 2 z - bib z
where —A happens to to IIx* \\x* -— z|| as 88 approaches approaches O? 0? where A A22 = A +E E22.• What What happens zll22 as 6. Use four Penrose Penrose conditions conditions and the fact fact that that QI Q\ has has orthonormal orthonormal columns to 6. Use the the four and the columns to x if A ~;:,xn"can verify verify that that if A eE R™ canbe befactored factoredininthe theform form(8.9), (8.9),then thenA+ A+== RR~IlQf. Q\. x 7. Let Let A A eE R" ~nxn, not necessarily necessarily nonsingular, nonsingular, and and suppose suppose A A = QR, where where Q is is 1. ", not QTT. orthogonal. Prove that A A ++ = R+ R+Q
This page intentionally intentionally left left blank blank This page
Chapter 9 Chapter 9
Eigenvalues Eigenvalues and and Eigenvectors Eigenvectors
9.1 9.1
Fundamental Definitions Definitions and and Properties Fundamental Properties
nxn Definition 9.1. A nonzero nonzero vector vector xx eE C" en is right eigenvector eigenvector of of A A eE C e nxn if there exists exists Definition 9.1. A is aa right if there aa scalar scalar A A. Ee e, C, called called an an eigenvalue, eigenvalue, such such that that
(9.1)
Ax = AX.
Similarly, aa nonzero nonzero vector vector yy eE C" en is is a a left left eigenvector corresponding to to an an eigenvalue eigenvalue Similarly, eigenvector corresponding a if Mif
(9.2) By taking taking Hermitian Hennitian transposes transposes in in (9.1), (9.1), we we see immediately that that X x HH is By see immediately is aa left left eigeneigenH vector of of A A H associated associated with with I. Note that that if if xx [y] [y] is is aa right right [left] [left] eigenvector eigenvector of of A, A, then then vector A. Note so [ay] for for any any nonzero nonzero scalar E C. One One often-used often-used scaling scaling for for an an eigenvector eigenvector is is so is is ax ax [ay] scalar aa E aa = — 1/ \j'||;t|| so that that the the scaled scaled eigenvector eigenvector has has nonn norm 1. 1. The The 2-nonn 2-norm is is the the most most common common IIx II so nonn used used for for such such scaling. norm scaling. polynomialn det (A - Al) is is called called the the characteristic characteristic polynomial polynomial Definition 9.2. 9.2. The Definition The polynomial n (A) (A.) == det(A—A,/) of (Note that This of A. A. (Note that the the characteristic characteristic polynomial polynomial can can also also be be defined defined as as det(A./ det(Al — - A). A). This results in in at at most most aa change change of of sign sign and, and, as as aa matter matter of of convenience, convenience, we we use use both both forms results forms throughout the the text.} text.) throughout
The It can The following following classical classical theorem theorem can can be be very very useful useful in in hand hand calculation. calculation. It can be be proved easily easily from the Jordan Jordan canonical canonical fonn to be be discussed discussed in the text text to to follow (see, for proved from the form to in the follow (see, for example, [21D or directly directly using using elementary elementary properties properties of of inverses inverses and and determinants determinants (see, (see, example, [21]) or for example, example, [3]). for [3]). nxn Theorem 9.3 9.3 (Cayley-Hamilton). (Cayley-Hamilton). For For any any A A eE C enxn n(A) = = 0. O. Theorem ,, n(A) 2 Example + 2A, aneasy easyexercise exercise toto Example 9.4. 9.4. Let Let A A = [~g [-~ ~g]. -~]. Then Then n(k) n(A) = X A2 + 2A - —3.3. ItItisisan 2 verify n(A) = =A A2 + 2A 2A -- 31 31 = 0. O. verify that that n(A) x
nxn It can determinants that C" ",, then then It can be be proved proved from from elementary elementary properties properties of of detenninants that if if A A eE e
n(A) is aa polynomial polynomial of of degree n. Thus, Thus, the the Fundamental Fundamental Theorem Theorem of of Algebra Algebra says says that that 7t (X) is degree n.
75
76
Chapter Eigenvectors Chapter 9. 9. Eigenvalues Eigenvalues and and Eigenvectors
n(A) has has nn roots, roots, possibly possibly repeated. the determinant 7r(A) repeated. These These roots, roots, as as solutions solutions of of the determinant equation equation n(A)
= det(A -
AI)
= 0,
(9.3)
are the eigenvalues A and the singularity matrix A A -— XI, AI, and are the eigenvalues of of A and imply imply the singularity of of the the matrix and hence hence further further guarantee corresponding nonzero nonzero eigenvectors. guarantee the the existence existence of of corresponding eigenvectors.
c
x of A A Ee C"nxn of A, A, i.e., of Definition Definition 9.5. 9.5. The The spectrum spectrum of " is is the the set set of of all all eigenvalues eigenvalues of i.e., the the set set of all polynomialn(A). spectrum of of A A is denoted A A(A). all roots roots of of its its characteristic characteristic polynomial n(X). The The spectrum is denoted (A).
form form
A Ee en A], ... , X An. Let the eigenvalues of A C"xxn " be denoted X\,..., n. Then if we write (9.3) in the n(A) = det(A - AI) = (A] - A) ... (An - A)
(9.4)
and = 00 in we get get the fact that A] .• A.2 A2 ... and set set A X= in this this identity, identity, we the interesting interesting fact that det(A) del (A) = = AI • • •AnAM(see (see also Theorem Theorem 9.25). If n(A) has real coefficients. coefficients. Hence the roots of 7r(A), n(A), i.e., the If A Ee 1Ftnxn, Wxn, then n(X) eigenvalues A, must must occur eigenvalues of of A, occur in in complex complex conjugate conjugate pairs. pairs.
Example 9.6. 9.6. Let a, ft R and and let = [[~f3 _^ !]. £ ]. Then Then n(A) jr(A.) = A A.22- - 2aA 2aA++aa22++f32 ft2 and and Example Let a, f3 Ee 1Ft let A A = A has has eigenvalues f3j (where A eigenvalues aa ± fij (where j = ii = R). •>/—!)• If A E If A € 1Ftnxn, R"x", then there is an easily checked checked relationship between the left and right T A and AT (take transposes of if eigenvectors eigenvectors of of A and A (take Hermitian Hermitian transposes of both both sides sides of of (9.2». (9.2)). Specifically, Specifically, if left eigenvector of of A A corresponding to A A eE A(A), A(A), then yy is a right eigenvector of of AT y is a left AT corresponding to IA. €E A A(A). (A). Note, too, that by elementary properties of of the determinant, r we have A(A) A(A) = = A(A A(AT), A(A) = A(A) only A E we always always have ), but but that that A(A) = A(A) only if if A e 1Ftnxn. R"x".
Definition 9.7. IfX is aa root multiplicity m m ofjr(X), that A X is is an an eigenvalue A Definition 9.7. If A is root of of multiplicity of n(A), we we say say that eigenvalue of of A of algebraic multiplicity m. multiplicity of of algebraic multiplicity m. The The geometric geometric multiplicity ofXA is is the the number number of of associated associated independent eigenvectors eigenvectors = = nn -— rank(A A/) = = dimN(A dim J\f(A -— AI). XI). independent rank(A -— AI) If AE A(A) has has algebraic then 1I :::: if If A € A(A) algebraic multiplicity multiplicity m, m, then < dimN(A dimA/"(A -— AI) A/) :::: < m. m. Thus, Thus, if we denote the the geometric geometric multiplicity of A A by by g, we must have 1I :::: < gg :::: < m. m. we denote multiplicity of g, then then we must have x Definition A matrix matrix A A Ee W 1Ftnxn is said said to an eigenvalue whose Definition 9.8. 9.8. A " is to be be defective defective if if it it has has an eigenvalue whose geometric multiplicity multiplicity is geometric is not not equal equal to to (i.e., (i.e., less less than) than) its its algebraic algebraic multiplicity. multiplicity. Equivalently, Equivalently, A A is is said said to to be be defective defective ifif it it does does not not have have nn linearly linearly independent independent (right (right or or left) left) eigenvectors. eigenvectors.
From the Cayley-Hamilton Theorem, we know that n(A) O. However, n(A) = = 0. However, it is possible for for A to satisfy satisfy aa lower-order example, if = \[~1Q ®], satA to lower-order polynomial. polynomial. For For example, if A A = ~], then then A A satsible 2 (Je -— 1)2 = O.0. But the smaller isfies (1 isfies I) = But it it also also clearly clearly satisfies satisfies the smaller degree degree polynomial polynomial equation equation
a - n =0o.
(it. - 1) ;;;:;
neftnhion minimal polynomial polynomial of Of A A G l::: l!if.nxn ix the (hI' polynomial polynomilll o/(X) a(A) oJ Definition ~.~. 5.5. Thll The minimal K""" is of IPll.ft least degree such that O. degree such that a(A) a (A) ~=0.
It a(Je) is unique (unique the coefficient It can can be be shown shown that that or(l) is essentially essentially unique (unique if if we we force force the coefficient of the highest A to to be such aa polynomial polynomial is is said to be monic and and we we of the highest power power of of A be + +1,1. say; say; such said to be monic generally write et a(A) generally write (A) as as aa monic monic polynomial polynomial throughout throughout the the text). text). Moreover, Moreover, itit can can also also be be
9.1. Fundamental 9.1. Fundamental Definitions Definitions and and Properties Properties
77 77
nonzero polynomial polynomial fi(k} f3(A) for which ftf3(A) O. In particular, shown that aa(A) (A.) divides every every nonzero (A) = 0. particular, a(A) a(X) divides n(A). n(X). a(A) There is an algorithm to determine or (A.) directly directly (without (withoutknowing knowing eigenvalues eigenvalues and and asasUnfortunately, this algorithm, algorithm, called the Bezout Bezout algorithm, sociated eigenvector eigenvector structure). Unfortunately, algorithm, is numerically unstable. Example 9.10. Example 9.10. The above definitions are illustrated below for a series of matrices, each 4 4, i.e., n(A) (A — - 2) 2)4. of which has an eigenvalue 2 of algebraic multiplicity 4, 7r(A) = (A . We denote the geometric multiplicity by g. g.
A-[~ -
0
0
A~[~ A~U
A~U
2
0 0
0 I 2
0 0
2
0 0 I 2
0 0 0 2
0 0
2
!]
~
~
~
ha,"(A)
] ha< a(A)
(A - 2)' ""d g
(A - 2)' ""d g
~ ~
1.
2.
0 0 0 2
~
] h'" a(A)
~
(A - 2)2 ""d g
~
3.
0 0 0 2
~
] ha
~
(A - 2) andg
~
4.
0
g plus the degree of a must always be five. At this point, one might speculate that g Unfortunately, such is not the case. The matrix
A~U has a(A)
= (A -
2)2 and g
I 2
0 0
0 0
0
2
!]
= 2.
x Theorem 9.11. Let " ana Theorem 9.11. Let A A eE C« ccnxn and [let Ai be be an an eigenvalue eigenvalue of of A A with with corresponding corresponding right right et A., eigenvector jc,-. yj be a left A (A) Xi. Furthermore, let Yj left eigenvector corresponding to any A Aj; eE l\(A) such =£ A.,. Then yfx{Xi = = O. 0. such that that Xj Aj 1= Ai. Then
YY
Proof: Since Ax Proof' Since AXit = A,*,, AiXi, (9.5)
78
Chapter Eigenvalues and and Eigenvectors Chapter 9. 9. Eigenvalues Eigenvectors
yy,
Similarly, since since YY y" A = AjXjyf, Similarly, A = (9.6)
Subtracting (9.6) (9.6) from (9.5), we = (Ai (A.,-- —Aj)YY A y )j^jc,. SinceAiA,,-- —AjA.7- =1=^ 0,0,we wemust musthave have Subtracting from (9.5), we find find 00 = xi. Since yfxt =0.O. YyXi = 0 The of Theorem 9.11 is is very similar to to two two other other fundamental important The proof proof of Theorem 9.11 very similar fundamental and and important results. results.
c
x H Let A A E be Hermitian, Hermitian, i.e., i.e., A A = AH.. Then all eigenvalues eigenvalues of of A A must Theorem 9.12. Let Theorem 9.12. e C"nxn " be =A Then all must be real. real. be
Proof: Suppose (A, (A., x) an arbitrary arbitrary eigenvalue/eigenvector = A.JC. Then x) is is an eigenvalue/eigenvector pair pair such such that that Ax Ax = AX. Then Proof: Suppose (9.7) Taking in (9.7) yields Taking Hermitian Hermitian transposes transposes in (9.7) yields
H Using the fact fact that Hermitian, we have that that IXH XxHxx = = Xx However, since since xx is is an Using the that A A is is Hermitian, we have AXHx. x. However, an H eigenvector, A, i.e., A isisreal. eigenvector, we have xH X Xx =1= /= 0, 0, from from which which we conclude conclude IA. = = A, i.e., A. real. 0D
c
x Let A A eE C"nxn be Hermitian Hermitian and and suppose suppose A iJ- are are distinct Theorem 9.13. Let Theorem 9.13. " be X and and /JL distinct eigenvalues eigenvalues of A with with corresponding right eigenvectors eigenvectors x and and zz must of A corresponding right and z, respectively. respectively. Then Then x and must be be orthogonal. orthogonal. H Proof: the equation equation Ax = A.JC to get get ZH ZH Ax Take the Hermitian Premultiply the Ax = AX by by Z ZH to Ax = = XAZz HH xx.. Take the Hermitian Proof: Premultiply A is Hermitian and A transpose of of this equation equation and use the facts facts that A A.isisreal realtotoget getxXHHAz Az == H H H AxH Az = iJ-Z Az = = iJ-XH AXH Xx z.z. Premultiply the equation equation Az i^z by xXHH to get get xXHHAz /^X ZZ = Xx z.z. Since Since A,A ^=1= /z, that X = 0, 0, i.e., two vectors vectors must be orthogonal. iJ-, we we must must have have that x HHzz = i.e., the the two must be orthogonal. D 0
Let us now the general case. Let us now return return to to the general case. nxn Theorem 9.14. €. c Cnxn have distinct distinct eigenvalues eigenvalues A , 1 ?... . . . ,, A. corresponding Theorem 9.14. Let Let A A E have AI, An n with with corresponding right Then {XI, [x\,..., linearly independent independent set. The same same XI, ... ,,xxnn. • Then ... , x xn}} is is a a linearly set. The right eigenvectors eigenvectors x\,... result holds for corresponding left left eigenvectors. eigenvectors. for the the corresponding result holds
Proof: For the proof see, Proof: the proof see, for for example, [21, [21, p. p. 118]. 118].
0
nxn If e c C nx " has distinct eigenvalues, eigenvalues, and and if if Ai A., Ee A(A), Theorem 9.11, 9.11, jc, If A A E has distinct A(A), then then by by Theorem Xi is is H orthogonal to to all all yj's for which i. However, However, it cannot be the case case that yf*x = 00 as orthogonal y/s for which jj ^=1= i. it cannot be the that Yi Xi as t = would be be orthogonal to nn linearly vectors (by Theorem 9.14) well, or well, or else else xXif would orthogonal to linearly independent independent vectors (by Theorem 9.14) and would thus have to be 0, contradicting the fact fact that is an an eigenvector. eigenvector. Since Since yf*XiXi =1= ^ 00 and would thus have to be 0, contradicting the that it it is for each each i, i, we can choose choose the the *, 's, or or the y, 's, 's, or or both, so that that Yi ytHHx; = 11 for we can the normalization normalization of of the Xi'S, the Yi both, so Xi = for/ i €E !1. n. for
yr
79
9.1. 9.1. Fundamental Fundamental Definitions Definitions and and Properties
x Theorem Let A A Ee C" en xn AI,, ... Annand Theorem 9.15. 9.15. Let " have have distinct distinct eigenvalues eigenvalues A.I ..., , A. andlet letthe thecorrespondcorresponding right right eigenvectors eigenvectors form matrix X X = [XI, [x\, ... ..., , xxn]. let YY = — [YI,"" [y\, ..., yYn] ing form aa matrix Similarly, let n]. Similarly, n] be Furthermore, suppose suppose that be the the matrix matrix of of corresponding corresponding left left eigenvectors. eigenvectors. Furthermore, that the the left left and and right eigenvectors Xi = Finally, let A == right eigenvectors have have been been normalized normalized so so that that YiH yf1 Xi = 1, 1, i/ Een.!!:: Finally, let A txn diag(AJ, An) ]Rnxn.. Then AXi = = A.,-*/, AiXi, i/ E as diag(A,j, ... . . . ,, X e W Then AJC, e !!, n, can can be be written written in in matrixform matrix form as n) E
(9.8)
AX=XA
while YiH y^XjXj = = oij, 5,;, i/ E!!, en, y' e !!, n, is is expressed expressed by by the equation while j E the equation yHX = I.
(9.9)
These yield the following matrix These matrix matrix equations equations can can be be combined combined to to yield the following matrix factorizations: factorizations: X-lAX
=A =
yRAX
= XAX- I =
XAyH
=
and and
(9.10)
n
A
(9.11)
LAixiyr i=1
Example 9.16. Let Example 9.16. Let 2
5 -3
-3 -2
~
-4
]
.
Then AI) = -(A 4A22 + 9)" Then rr(A) n(X) = det(A det(A -- A./) -(A.33 + 4A. 9 A. + 10) 10) = -()" -(A. + 2)(),,2 2)(A.2 + 2)" 2A,++5), 5),from from which we find A A(A) find the which we find (A) = = {-2, {—2, -1 — 1 ± 2j}. 2 j } . We We can can now now find the right right and and left left eigenvectors eigenvectors corresponding to eigenvalues. corresponding to these these eigenvalues. For A-i Al = linear system get For = -2, —2, solve solve the the 33 xx 33 linear system (A (A -— (-2)l)xI (—2}I)x\ = = 00 to to get
Note that one component component of of XI ;ci can can be set arbitrarily, arbitrarily, and and this then determines determines the the other other two two be set this then Note that one (since dimN(A (since dimA/XA -— (-2)1) (—2)7) = = 1). 1). To To get get the the corresponding corresponding left left eigenvector eigenvector YI, y\, solve solve the the linear system system y\(A 21) = = 00 to to get get linear (A + 21)
yi
yi
This time we we have arbitrary scale 1. This time have chosen chosen the the arbitrary scale factor factor for for YJ y\ so so that that y f xXI\ = = 1. For A22 = -1 I)x2 get For A —1 + + 2j, 2j, solve solve the the linear linear system system (A (A -— (-1 (—1+ + 2j) 2j)I)x = 00 to to get 2 =
X2
=[
3+ j ] 3 ~/ .
80
Chapter 9. 9. Eigenvalues Eigenvalues and and Eigenvectors Eigenvectors Chapter
Solve the the linear linear system system y" (A -— ((-1 + 227')/) and nonnalize normalize Y2 y>2 so so that that y"x 1 to to get Solve yf (A -I + j) I) = = 00 and yf X2 get 2 = 1
For XT, = -I — 1 -— 2j, 2j, we we could could proceed proceed to to solve solve linear linear systems systems as as for for A2. A.2. However, we For A3 = However, we can also also note note that that x$ =xX2 ' and yi = jj. To see this, use the fact that A, 3 A.2 and simply X3 = and Y3 Y2. To see this, use the fact that A3 = A2 and simply can 2 conjugate AX22 = A2X2 to get Ax^ AX2 = ^2X2A2X2. A similar conjugate the the equation equation A;c — ^.2*2 to get similar argument argument yields yields the the result result for left left eigenvectors. eigenvectors. for Now the matrix right eigenvectors: Now define define the matrix X of of right eigenvectors: 3- j ] 3+j .
3+j 3-j
-2
-2
It that It is is then then easy easy to to verify verify that
.!.=.L
!.±1
l+j
.!.=.L
4
4
4
4
Other results results in in Theorem Theorem 9.15 9.15 can can also also be verified. For For example, Other be verified. example, X-IAX=A=
[
-2 0
0 -1+2j
o
0
Finally, note note that we could could have solved directly directly only only for for *i and xX22 (and (and XT, = xX2). Finally, that we have solved XI and X3 = Then, 2). Then, instead of of detennining determining the j,'s directly, directly, we we could could have have found found them instead by by computing instead the Yi'S them instead computing X-I X~l and reading off its rows. Example 9.17. 9.17. Let Example Let A =
[-~ -~ ~] . o
-3
3 Then Jl"(A) 7r(A.) = det(A A./) = -(A + 8A 8A22+ 19A 19X++ 12) 12)== -(A -(A.++ I)(A 1)(A.++3)(A 3)(A,++4), 4), Then det(A -- AI) _(A 3 + from which which we we find (A) = = {-I, {—1, -3, —3, -4}. —4}.Proceeding Proceedingasasininthe theprevious previousexample, example,ititisis from find A A(A) gtruightforw!U"d comput~ straightforward to to compute
X~[~ and and
x-,~q
1
3 2
I
0 -I
2 0 -2
-i ] 1
-3 ] 2
~ y'
9.1. Fundamental Fundamental Definitions Properties 9.1. Definitions and and Properties
81 81
l We also also have have X~ X-I AX AX = A= = diag( -1, —3, -3, -4), which is is equivalent equivalent to to the the dyadic dyadic expanWe =A diag(—1, —4), which expansion sion
3
A = LAixiyr i=1
~(-I)[ ~
W~ ~l+(-3)[ j ][~
+(-4) [ -; ] [~ ~ (-I) [
I (; I
I
3 2
1 - 3
I
3
3
I (;
3
I
I (;
-~l
~J
I (;
3
0
J+
(-3) [
I 2 0 0 0 I
-2
0
I
I
-2 0 I
2
]+
(-4) [
3 I
-3 I
3
I
-3 I
3 I
-3
I
3 I
-3 I
3
l
Theorem 9.18. Eigenvalues Eigenvalues (but not eigenvectors) eigenvectors) are under a a similarity similarity transtransTheorem 9.18. (but not are invariant invariant under formation T. formation T. X) is is an pair such that Ax Ax = = Xx. AX. Then, since T T Proof: Suppose Proof: Suppose (A, (A, jc) an eigenvalue/eigenvector eigenvalue/eigenvector pair such that Then, since I AT)(T-lx) x) = = XA(Tis nonsingular, we have the equivalent equivalent statement statement (T(T~lIAT)(T~ ( T ~ lIxx), ) , from from which the theorem theorem statement follows. For For left we have have aa similar similar statement, statement, namely the statement follows. left eigenvectors eigenvectors we namely H H H HH 1 AyH if and only if (T = A(THHyf. y)H. DD yyH AA = Xy ifandon\yif(T y) y)H (T~(TAT)1 AT) =X(T x Remark 9.19. 9.19. If analytic function function (e.g., polynomial, or or eeX, or sin*, sinx, Remark If /f is is an an analytic (e.g., ff(x) ( x ) is is aa polynomial, , or fl n or, in general, representable representable by a power series X^^o L~:O anxn), then it is easy to show that n* )> then easy to show that the eigenvalues eigenvalues of f(A) (defined (defined as L~:OanAn) are f(A), but the of /(A) as X^o^-A") are /(A), butf(A) /(A)does does not notnecessarily necessarily have all all the the same same eigenvectors eigenvectors (unless, (unless, say, A is is diagonalizable). diagonalizable). For For example, example, A A = = T [~0 6 have say, A Oj] 2 has only one one right corresponding to has only right eigenvector eigenvector corresponding to the the eigenvalue eigenvalue 0, 0, but but A A2 = = f[~0 0~1]has has two two independent right right eigenvectors eigenvectors associated associated with with the the eigenvalue o. What What is is true true is is that that the the independent eigenvalue 0. eigenvalue/eigenvector pair pair (A, (A, x) x) maps maps to to (f(A), x) but but not not conversely. eigenvalue/eigenvector (/(A), x) conversely.
The following theorem is is useful useful when when solving solving systems of linear linear differential differential equations. The following theorem systems of equations. A etA Ax are Details of how the matrix exponential e' is used to solve solve the system system xi = Ax are the subject of of Chapter Chapter 11. 11. xn 1 Theorem 9.20. Let A Ee R" jRnxn and suppose suppose X~~ X-I AX = A, A, where A A is diagonal. Then Theorem 9.20. AX —
n
= LeA,txiYiH. i=1
82
Chapter 9. 9. Eigenvalues Eigenvalues and and Eigenvectors Chapter Eigenvectors
Proof: Starting from from the definition, we Proof' Starting the definition, we have have
n
=
0
LeA;IXiYiH. i=1
The following following corollary corollary is is immediate immediate from from the the theorem setting tt == I.I. The theorem upon upon setting nx Corollary If A A Ee R ]Rn xn is diagonalizable diagonalizable with Ai, i/' E right Corollary 9.21. 9.21. If " is with eigenvalues eigenvalues A.,-, en,~, and and right AA XA i eigenvectors •, / € n_, then e has eigenvalues e , i € n_, and the same eigenvectors. i E ~, then e has eigenvalues e i E ~, and the same eigenvectors. eigenvectors xXi, " t
There are extensions extensions to to Theorem Theorem 9.20 9.20 and and Corollary Corollary 9.21 9.21for forany anyfunction functionthat thatisis There are analytic A, i.e., i.e., ff(A) ... , f(An))Xanalytic on on the the spectrum spectrum of of A, (A) = = XXf(A)Xf(A)X~l I = = Xdiag(J(AI), Xdiag(/(A.i),..., f ( X t t ) ) X ~ Il .. It course, to have aa version version of which It is is desirable, desirable, of of course, to have of Theorem Theorem 9.20 9.20 and and its its corollary corollary in in which A A is is not not necessarily necessarily diagonalizable. diagonalizable. It It is is necessary necessary first first to to consider consider the the notion notion of of Jordan Jordan canonical form, form, from from which such aa result is then then available available and and presented in this chapter. canonical which such result is presented later later in this chapter.
9.2 9.2
Jordan Canonical Canonical Form Form Jordan
Theorem 9.22. 9.22. Theorem x I. lordan all A A Ee C" c nxn AI, ... , kAnn E C 1. Jordan Canonical Canonical Form Form (JCF): (/CF): For For all " with with eigenvalues eigenvalues X\,..., eC x (not necessarily necessarily distinct), distinct), there there exists exists X € C^ " such (not X E c~xn such that that
X-I AX
= 1 = diag(ll, ... , 1q),
(9.12)
where of the the lordan Jordan block matrices 1/ i1,, .••• . . ,, 1q Jq is is of of the the form form where each each of block matrices
0
1i
o
0
Ai
0
Ai Ai
=
(9.13)
o o
Ai
o
Ai
9.2. Jordan Canonical Canonical Form Form 9.2. Jordan
83 83
and L;=1 ki = n. nx Form: For all A E€ R jRnxn" with eigenvalues AI, 2. Real Jordan Canonical Form: Xi, ... . . .,,An Xn (not (not xn necessarily distinct), there exists X X € E R" lR.~xn such that necessarily
(9.14) J\, ... ..., , J1qq is form where each of of the Jordan block matrices 11, is of of the form
in the case of real eigenvalues A., e A (A), and
where = [[ _»' andhI2 == [6 \0 ~]A ininthe thecase caseof of complex complex conjugate conjugateeigenvalues eigenvalues Mi; = _~; ^~: 1] and where M > ai±jp eA(A ). (Xi ± jfJi E A(A). i Proof: Proof: For the proof proof see, for example, [21, pp. 120-124].
D 0
Transformations T == [I"__,~ -"•{"] allowus usto togo goback back and andforth forthbetween between aareal realJCF JCF Transformations like like T { ] allow and its complex counterpart: T-I [ (X
+ jfJ o
O. ] T (X - JfJ
=[
(X -fJ
fJ ] (X
= M.
complicated. With For nontrivial Jordan blocks, the situation is only a bit more complicated. 1
-j
o
o
-j
1
o
o
~ -~]
o -j
0
1
'
84
Chapter 9. 9. Eigenvalues Eigenvectors Chapter Eigenvalues and and Eigenvectors
it is is easily it easily checked checked that that
T- I
[ "+ jfi 0 0 0
et
0 0
+ jf3 0 0
0 0
0
]T~[~ l h
et - jf3
M
et - jf3
Definition Definition 9.23. 9.23. The The characteristic characteristic polynomials polynomials of of the the Jordan Jordan blocks blocks defined defined in in Theorem Theorem 9.22 are called the elementary or invariant of A. 9.22 are called the elementary divisors divisors or invariant factors factors of A. matrix is product of of its its elementary Theorem 9.24. The characteristic polynomial polynomial of Theorem 9.24. The characteristic of aa matrix is the the product elementary divisors. The minimal of aa matrix divisors of of divisors. The minimal polynomial polynomial of matrix is is the the product product of of the the elementary elementary divisors highest degree corresponding to to distinct distinct eigenvalues. highest degree corresponding eigenvalues.
c
x Theorem 9.25. " with eigenvalues AI, ...," X Then Theorem 9.25. Let Let A A eE C"nxn with eigenvalues AI, .. An. n. Then
n
1. det(A) = nAi. i=1 n
2. Tr(A) =
2,)i. i=1
Proof: Proof: l
1. Theorem 9.22 we have have that A = XXJJXX-I. Thus, 1. From From Theorem 9.22 we that A ~ . Thus, 1 det(A) = ) = det(7) A,-. det(A) = det(XJXdet(X J X-I) det(J) = = ]~[" n7=1 Ai. =l
Theorem 9.22 2. Again, from from Theorem 9.22 we have that A = XXJJXX-I. ~ l . Thus, l 11 Tr(A) = = Tr(XJX~ ) = TrC/X" *) = Tr(A) Tr(X J X-I) Tr(JX- X) = Tr(/) Tr(J) = = £" L7=1 Ai. =1 A.,-.
D 0
Example 9.26. Suppose A e E lR. is known known to to have have 7r(A) :rr(A) = (A Example 9.26. Suppose A E7x7 is (A.- - 1)4(A 1)4(A- - 2)3 2)3and and 2 2 et(A) a (A.) = = (A (A.- —1)2(A I) (A.- —2)2. 2) . Then ThenAAhas hastwo twopossible possibleJCFs JCFs(not (notcounting countingreorderings reorderingsofofthe the diagonal blocks): diagonal blocks): 1
J(l)
=
0 0 0
0 0
0 0 0
0 0 0
1
0 0 1 0 0 0 0
0 0 0
0 0 0 1 0 0 2 0 0
0 0 0 0 1 2
0
0
0
1 0 0 0
0 0 0
0 0 0 2
and
f2)
=
0 0 0 0 0
0
1
I 1 0 0 2
0 0 0 0
0
0
0 0
0
0
0 0 0 0 0 1 0 2 0
0 0 0 0 0 0 0 0 0 0 0 2
(1) has elementary - 1), (A - (A. 1),-(A1), - 2)2, - 2),(A - 2), Note that 7J(l) has elementary divisors divisors(A(A- -1)z, I) 2(A , (A. - 1), (A, -and 2)2(A , and 2) 2 2 2 J(2) has has elementary - -1)2,I)(A, (A - 2)2, (A -(A2). while /( elementarydivisors divisors (A(A- -1)2, I) (A , (A - 2)and , and - 2).
9.3. Determination Determination of JCF 9.3. of the the JCF
85 &5
Example rr(A), l) for Example 9.27. 9.27. Knowing TT (A.), a(A), a (A), and and rank(A rank (A -—Ai A,,7) for distinct distinct Ai A.,isis not not sufficient sufficient to to determine A uniquely. determine the JCF of A uniquely. The matrices
Al=
a 0 0 0 0 0 0
0 a 0 0 0 0 0
a 0 0 0 0
0 0 0 a 0 0 0
0 0 0 a 0 0
0 0 0 0 0 a 0
0 0 0 0 0 1 a
A2 =
a 0 0 0 0 0 0
0 a 0 0 0 0 0
a 0 0 0 0
0 0 0 a 0 0 0
0 0 0 a 0 0
0 0 0 0 a 0
0 0 0 0 0 0 a
a)\ al) both have rr(A) 7r(A.) = = (A (A.- —a)7, a) ,a(A) a(A.)== (A(A.- — a) and , andrank(A rank(A- — al) ==4, 4,i.e., i.e.,three threeeigeneigenvectors.
9.3
Determination of of the the JCF Determination JCF
lxn The first critical item of information in determining the JCF of a matrix A ]R.nxn is its A Ee W number of eigenvectors. For each distinct eigenvalue Ai, A,,, the associated associated number of linearly independent right (or left) eigenvectors eigenvectors is given by dim dimN(A A;l) = n -— rank(A -— A;l). independent right A^(A -— A.,7) A.(7). The straightforward straightforward case case is, of course, course, when when Ai X,- is is simple, simple, i.e., of algebraic algebraic multiplicity 1; it it The is, of i.e., of multiplicity 1; then has precisely one eigenvector. The more interesting (and difficult) case occurs when Ai is of algebraic multiplicity multiplicity greater than one. For example, suppose A, suppose
A =
[3 2 0
o Then Then
A-3I=
3 0
n
U2 I] o o
0 0
has rank 1, so the eigenvalue 3 has two eigenvectors associated associated with it. If If we let [~l [^i ~2 £2 ~3]T &]T denote aa solution solution to to the the linear linear system system (A (A -— 3/)£ 0, we that 2~2 2£2 + +£ = 0O.. Thus, Thus, both both denote 3l)~ = = 0, we find find that ~33=
are eigenvectors eigenvectors (and (and are are independent). independent). To get aa third JC3 such such that X = [Xl [x\ KJ_ XT,] are To get third vector vector X3 that X X2 X3] reduces A to JCF, we need the notion of principal vector.
c
xn x Definition 9.28. A Ee C"nxn (or R" ]R.nxn). principal vector of degree Definition 9.28. Let A "). Then xX is a right principal degree k associated with A A(A) X Ee A (A) ifand if and only only if(A if (A -- ulx XI)kx == 00 and and(A (A -- AI)k-l U}k~lxx i= ^ o. 0.
Remark Remark 9.29. 9.29. 1. An analogous definition holds for a left left principal vector of degree k. k.
86
Chapter 9. 9. Eigenvalues Eigenvectors Chapter Eigenvalues and and Eigenvectors
synonymously with "of "of degree k." 2. The phrase "of "of grade k" is often often used synonymously 3. Principal vectors are sometimes also called generalized generalized eigenvectors, eigenvectors, but the latter different meaning in Chapter 12. term will be assigned a much different = 1 corresponds to the "usual" eigenvector. eigenvector. 4. The case kk =
S. of 5. A right (or left) principal vector of degree kk is associated with a Jordan block J; ji of dimension k or larger.
9.3.1 9.3.1
Theoretical Theoretical computation computation
To motivate the development of a procedure for determining determining principal vectors, consider a (1) (2) 2 2 2 x 2 Jordan Jordan block{h[~0 h1. i]. Denote Denote by by xx(l) and x x(2) the the two two columns columns of of aa matrix matrix XX eE R lR~X2 2x2 and ,x A to this JCF. JCF. Then J can that reduces a matrix A Then the theequation equation AX AX == XXJ canbe bewritten written A [x(l)
x(2)] = [x(l)
X(2)]
[~ ~
J.
The first column yields the equation Ax(!) Ax(1) = = AX(!), hx(1) which simply says that x(!) x (1) is a right (2) x(2),, the principal vector eigenvector. The second second column yields the following equation for x of degree 2: of degree (A - A/)x(2)
= x(l).
(9.17) z (2)
w
If we premultiply premultiply (9.17) by by (A AI), we we find find (A ==(A If we (A -- XI), (A-- A1)2 X I ) x(2) x (A-- A1)X(l) XI)x ==O.0.Thus, Thus, the definition of principal vector is satisfied. x lR nxn This suggests a "general" procedure. First, determine all eigenvalues of A eE R" " nxn ). A eE A A(A) following: (or C ). Then for each distinct X (A) perform the following:
c
1. Solve (A - A1)X(l) = O.
I) associated This step finds all the eigenvectors (i.e., principal vectors of degree 1) associated with A. The number of of A -— XI. AI. For example, if if X. of eigenvectors depends on the rank of - XI) A/) = = n -— 1, there is only one eigenvector. If multiplicity of rank(A — If the algebraic multiplicity of principal vectors still need need to be computed XA is greater than its geometric multiplicity, principal from succeeding steps. (1) x(l),, solve 2. For each independent jc
(A - A1)x(2) = x(l).
of The number of linearly independent solutions at this step depends on the rank of (A — - uf. (A X I ) 2 . If, for example, this rank is nn -— 2, there are two linearly independent AI)22x^ o. One of these solutions solutions solutions to the homogeneous equation (A (A -— XI) x (2) = 0. (l) 22 ( l ) (1= 0), 0), since (A = (A AI)O = 0. o. The The other is, of course, xx(l) (^ (A -- 'A1) X I ) xx(l) = (A - XI)0 othersolution solution necessary to take a linear is the desired principal vector of degree 2. (It may be necessary (1) of jc x(l) R(A — - XI). AI). See, combination of vectors to get get a right-hand right-hand side that is in 7£(A See, for example, Exercise 7.)
9.3. Determination Determination of of the the JCF 9.3. JCF
87
3. 3. For For each each independent independent X(2) x(2) from from step step 2, 2, solve solve (A -
AI)x(3)
=
x(2).
4. Continue Continue in in this this way until the the total total number number of of independent independent eigenvectors eigenvectors and and principal 4. way until principal vectors is is equal equal to to the the algebraic algebraic multiplicity multiplicity of of A. vectors A. Unfortunately, this this natural-looking can fail fail to to find find all vectors. For For Unfortunately, natural-looking procedure procedure can all Jordan Jordan vectors. more extensive more extensive treatments, treatments, see, see, for for example, example, [20] [20] and and [21]. [21]. Determination Determination of of eigenvectors eigenvectors and principal principal vectors is obviously obviously very for anything anything beyond simple problems problems (n (n = = 22 and vectors is very tedious tedious for beyond simple or or 3, 3, say). say). Attempts Attempts to to do do such such calculations calculations in in finite-precision finite-precision floating-point floating-point arithmetic arithmetic generally prove prove unreliable. unreliable. There There are are significant significant numerical numerical difficulties difficulties inherent inherent in in attempting generally attempting to compute compute aa JCF, JCF, and and the the interested interested student student is is strongly strongly urged urged to to consult consult the the classical classical and and very to very readable MATLAB readable [8] [8] to to learn learn why. why. Notice Notice that that high-quality high-quality mathematical mathematical software software such such as as MATLAB does not not offer j cf command, j ardan command is available does offer aa jcf command, although although aa jordan command is available in in MATLAB's MATLAB'S Symbolic Toolbox. Toolbox. Symbolic kxk Theorem 9.30. 9.30. Suppose Suppose A Ckxk has an an eigenvalue eigenvalue A A,ofofalgebraic algebraicmultiplicity multiplicitykkand and Theorem A Ee C has suppose further further that X = of suppose that rank(A rank(A -— AI) AI) = = kk -— 1. 1. Let Let X = [[x(l), x ( l ) , ... . . . ,, X(k)], x(k)], where where the the chain chain of vectors Then vectors x(i) x(i) is is constructed constructed as as above. above. Then
Theorem Theorem 9.31. 9.31. {x(l), (x (1) , ... . . . ,, X(k)} x (k) } is is aa linearly linearly independent independent set. set. Theorem Principal vectors Jordan blocks indeTheorem 9.32. 9.32. Principal vectors associated associated with with different different Jordan blocks are are linearly linearly independent. pendent. Example Let Example 9.33. 9.33. Let
A=[~0 01 2; ] . The eigenvalues eigenvalues of of A are A1 = I, 1, A2 h2 = = 1, 1, and and A3 h3 = = 2. 2. First, First, find the eigenvectors eigenvectors associated associated The A are AI = find the with the distinct distinct eigenvalues eigenvalues 11 and and 2. with the 2. ,(1)= (A 2I)x~1) = 00 yields (A --2/)x3(1) yields
88
Chapter 9. Eigenvalues and Eigenvectors (1)
(A yields (A-- 11)x?J l/)x, ==00 yields
To find find aa principal of degree degree 22 associated associated with with the the multiple multiple eigenvalue eigenvalue 1, 1, solve solve To principal vector vector of (A get (A -– 1I)xl l/)x,(2)2) == xiI) x, (1)to toeet
x,
(2)
Now let let Now X
= [xiI)
=[
0~ ]
.
xl" xl"] ~ [ ~
0 1
5 3
0
Then itit is is easy easy to to check check that Then that
X-'~U -i 0
1
-5 ]
and X-lAX
=[
l
I
~
1
0
0
9.3.2 9.3.2
n
On the +1 's 's in JCF JCF blocks
In this subsection subsection we show that superdiagonal elements elements of of aa JCF not be In this we show that the the nonzero nonzero superdiagonal JCF need need not be 11's's but but can can be be arbitrary arbitrary -— so so long long as as they they are are nonzero. nonzero. For For the the sake sake of of definiteness, defmiteness, we we consider below below the case of of aa single single Jordan but the the result clearly holds any JCF. JCF. consider the case Jordan block, block, but result clearly holds for for any nxn Suppose and SupposedAA€E RjRnxn and
Let D diag(d1, ... . . . ,, ddnn)) be be aa nonsingular nonsingular "scaling" "scaling" matrix. D = diag(d" matrix. Then Then Let
D-'(X-' AX)D = D-' J D = j
A
4l. d,
0
0
)...
!b. d,
0
0
A
=
dn -
I
dn -
2
A-
0
0
0 dn dn -
)...
I
89
9.4. 9.4. Geometric Geometric Aspects Aspects of of the the JCF JCF
di's Appropriate choice of the di 's then yields any desired nonzero superdiagonal elements. interpreted in terms of the matrix X = [x[, ... ,x This result can also be interpreted = [x\,..., xnn]] of eigenvectors eigenvectors and principal that reduces reduces A Specifically, Jj is is obtained obtained from from A the and principal vectors vectors that A to to its its JCF. lCF. Specifically, A via via the similarity dnxn}. similarity transformation transformation XD XD = \d\x\,..., [d[x[, ... , dnxn]. In similar fashion, reverse-order identity matrix (or matrix) In aa similar fashion, the the reverse-order identity matrix (or exchange exchange matrix) I
0
0
0 p = pT = p-[ =
(9.18)
0 I
1
0
0
can be used to to put the superdiagonal superdiagonal elements elements in in the subdiagonal instead instead if that is desired: desired: A
I
0
A
0
A
0 0
A
p-[
0
p=
0
0
A
0
1
A
0 A 0
0
9.4 9.4
I A
A
0
0
0 A
Geometric Aspects of the Geometric Aspects of the JCF JCF
c
X nxn The matrix X X that reduces a matrix A E jH.nxn )) totoaalCF e IR" "(or (or Cnxn JCFprovides providesaachange changeof ofbasis basis with respect to diagonal or or block natural to with respect to which which the the matrix matrix is is diagonal block diagonal. diagonal. It It is is thus thus natural to expect expect an an associated direct direct sum decomposition of of jH.n. R. Such Such aa decomposition decomposition is is given given in in the the following associated sum decomposition following theorem. x Theorem 9.34. Suppose A Ee R" jH.nxn 9.34. Suppose " has characteristic polynomial
n(A) = (A - A[)n) ... (A - Amtm
and minimal polynomial a(A) = (A - A[)V) '" (A - Am)Vm
. . . ,, A. distinct. Then Then with A-i, AI, ... Ammdistinct. jH.n
= N(A = N (A
- AlIt) E6 ... E6 N(A - AmItm - A1I)
v)
E6 ... E6 N (A - Am I) Vm .
dimN(A -— A.,/) AJ)Viw = = «,-. ni. Note that dimM(A Definition 9.35. Let Definition 9.35. Let V be a vector space over F IF and and suppose suppose A : V —>• --+ V is a linear transformation. A subspace S c V V is if AS c S, is defined as the transformation. A subspace S ~ is A-invariant A -invariant if AS ~ S, where where AS AS is defined as the set {As : ss eE S}. S}. set {As:
90
Chapter Eigenvectors Chapter 9. 9. Eigenvalues Eigenvalues and and Eigenvectors
If ... , Sk If V is taken to be ]Rn R" over Rand R, and SS Ee ]Rn R"xxk* is a matrix whose columns SI, s\,..., s/t span aa k-dimensional /^-dimensional subspace subspace S,
Example 9.36. 9.36. The The equation equation Ax Ax = A* = xx A defining aa right right eigenvector eigenvector xx of of an an eigenvalue Example AX = A defining eigenvalue XA says that *x spans an A-invariant subspace (of dimension one). Example 9.37. 9.37. Suppose X block diagonalizes A, i.e., X-I AX =
[~
J 2
].
Rewriting in the form
~ J, we have that that A A,i = X;li, A", /,,i /== 1,2, 1, 2,sosothe thecolumns columnsofofXiA,span spanananA-invariant A-mvanantsubspace. subspace. we have AX Theorem 9.38. 9.38. Suppose A Ee ]Rnxn. E"x".
7. = CloI «o/ + + ClIA o?i A + + '"• • •+ + ClqAq
2. S is A-invariant A -invariant if if and only only if ifSS1-1. is AATT-invariant. Theorem If V is a vector space over IF NI EB Theorem 9.39. 9.39. If F such that V = = N\ ® ... • • • EB 0 Nmm, , where each A// is A-invariant, then aa basis V can can be with respect respect to which A N; is A-invariant, then basis for for V be chosen chosen with to which A has has aa block block diagonal representation. diagonal representation.
The Jordan Jordan canonical canonical form form is is aa special special case case of of the above theorem. If A A has The the above theorem. If has distinct distinct eigenvalues Ai as in Theorem 9.34, N(A — - A.,-/)"' Ai/)n, by SVD, for eigenvalues A,,9.34, we could choose bases for N(A example (note (note that that the the power power ni n, could could be be replaced replaced by v,). We would then then get get aa block block diagonal diagonal example by Vi). We would representation for blocks rather structured Jordan blocks. Other Other representation for A A with with full full blocks rather than than the the highly highly structured Jordan blocks. such "canonical" "canonical" forms forms are are discussed discussed in text that that follows. such in text follows. Suppose A" X == [Xl AX ==diag(J1, ... , Jm [ X ..... i , . . . ,Xm] Xm] Ee]R~xn R"nxnisissuch suchthat thatX-I X^AX diag(7i,..., Jm),),where where each Ji = diag(JiI,"" diag(/,i,..., Jik,) //*,.) and and each each /,* is aa Jordan Jordan block block corresponding corresponding to to Ai A, Ee A(A). each Ji = Jik is A(A). We could also use other block diagonal decompositions (e.g., via SVD), but we restrict our attention to only only the the Jordan block case. case. Note that A A", == Xi A*,- J/,,i , so so by by (9.19) (9.19) the the columns columns attention here here to Jordan block Note that AXi of A", (i.e., the the eigenvectors eigenvectors and and principal vectors associated associated with with Ai) A.,) span span an an A-invariant of Xi (i.e., principal vectors A-invariant subspace of]Rn. of W. Finally, we return to the problem of developing a formula A formula for ee'l AA in the case that A x T nxn is not necessarily diagonalizable. Let 7, E€
9.S. The The Matrix Sign Function Function 9.5. Matrix Sign
91 91
compatibly. Then compatibly. Then A = XJX- I = XJy H = [XI, ... , Xm] diag(JI, ... , Jm) [YI ,
••• ,
Ym]H
m
H
= LX;JiYi . i=1
In a similar fashion we can compute m
etA = LXietJ;YiH, i=1
which in conjunction which is is aa useful useful formula formula when when used used in conjunction with with the the result result A
0 exp t
0
teAt
.lt 2 e At
0
eAt
teAt
0
0
0
eAt
1 A
0
A A
0
eAt
0
0
2!
0
block 7, Ji associated A == Ai. for a k x k Jordan block associated with an eigenvalue A. A.,.
9.5 9.5
The Function The Matrix Matrix Sign Sign Function
section we give a very brief brief introduction to an interesting interesting and useful useful matrix function In this section function called sign function. sign (or scalar. A called the the matrix matrix sign function. It It is is aa generalization generalization of of the the sign (or signum) signum) of of aa scalar. A survey of the matrix sign function and some of its applications can be found in [15]. Definition 9.40. 9.40. Let z E E C with Re(z) ^f= O. of z is defined defined by Definition 0. Then the sign of ifRe(z) > 0, ifRe(z) < O.
Re(z) {+1 sgn(z) = IRe(z) I = -1
x Definition 9.41. cnxn Definition 9.41. Suppose A E e C" " has no eigenvalues on the imaginary axis, and let
be Jordan canonical canonicalform form for for A, with with N N containing containing all all Jordan Jordan blocks blocks corresponding corresponding to to the the be aa Jordan in the the left left half-plane half-plane and and P P containing containing all all Jordan Jordan blocks blocks corresponding corresponding to eigenvalues of eigenvalues of A in to eigenvalues in eigenvalues in the the right right half-plane. half-plane. Then Then the the sign sign of of A, A, denoted denoted sgn(A), sgn(A), is is given given by by sgn(A) = X
[ -/ 0] 0
/
X
-I
,
92 92
Chapter 9. 9. Eigenvalues Eigenvalues and and Eigenvectors Eigenvectors Chapter
where the negative and positive positive identity matrices are of of the same dimensions as N and p, P, respectively. There are are other other equivalent equivalent definitions definitions of of the sign function, function, but but the one given There the matrix matrix sign the one given here is is especially especially useful useful in in deriving deriving many of its its key key properties. The JCF JCF definition definition of of the the here many of properties. The matrix sign function does not generally generally lend itself itself to reliable computation on a finite-wordlength digital computer. In fact, its reliable numerical calculation calculation is an interesting topic in its own right. We state state some some of the more properties of matrix sign sign function function as as theorems. theorems. We of the more useful useful properties of the the matrix Their Their straightforward proofs are left left to the exercises. exercises.
e
x Theorem 9.42. 9.42. Suppose A Ee C"nxn " has no eigenvalues on the imaginary axis, and let Theorem = sgn(A). S= sgn(A). Then the following following hold:
1. S is diagonalizable with eigenvalues equal to del. ± 1. 2. S2 2. S2 = = I. I.
3. = SA. SA. 3. AS AS = 4. sgn(AH) = 4. sgn(A") = (sgn(A»H. (sgn(A))". l x 5. sgn(T-1AT) foralinonsingularT Ee C" enxn 5. sgn(TAT) = T-1sgn(A)T T-lsgn(A)TforallnonsingularT "..
6. sgn(cA) = sgn(c) sgn(c) sgn(A) sgn(A)/or c. 6. sgn(cA) = for all nonzero real scalars c. x nxn Theorem 9.43. 9.43. Suppose A Ee e C" " has no eigenvalues on the imaginary axis, and let Theorem — sgn(A). sgn(A). Then the following S= following hold:
1. — /) is an A-invariant left half-plane half-plane eigenvalues I. 7l(S R(S -l) A-invariant subspace corresponding to the left of A (the (the negative negative invariant invariant subspace). subspace). of
2. R(S+/) R(S + l) is an A-invariant A -invariant subspace corresponding to the right half-plane half-plane eigenvalues of A (the (the positive invariant subspace). of positive invariant 3. negA negA == = (l (/ -— S) S)/2 of A. 3. /2 is a projection projection onto the negative invariant subspace subspace of 4. posA == positive invariant subspace of = (l (/ + + S)/2 is a projection onto the positive of A. A.
EXERCISES EXERCISES
e
nxn 1. A Ee Cnxn ),.1> ••• ),.nn with corresponding right 1. Let A have distinct distinct eigenvalues AI, ...,, X right eigeneigenvectors ... ,,xXnn and and left left eigenvectors eigenvectors Yl, y\, ••. ..., , Yn, yn, respectively. respectively. Let Let v Ee en C" be be an vectors Xi, Xl, ... an arbitrary vector. vector. Show Show that that vv can can be be expressed expressed (uniquely) (uniquely) as as aa linear linear combination combination arbitrary
of the right eigenvectors. Find the appropriate expression expression for v as a linear combination of the left eigenvectors as well.
93 93
Exercises
x H 2. A E rc nxn i.e., A AH = -A. Prove that all of 2. Suppose Suppose A € C" " is is skew-Hermitian, skew-Hermitian, i.e., = —A. Prove that all eigenvalues eigenvalues of aa skew-Hermitian matrix must be pure imaginary. skew-Hermitian matrix must be pure imaginary. x 3. A Ee C" rc nxn is Hermitian. Let A be an an eigenvalue eigenvalue of A with with corresponding 3. Suppose Suppose A " is Hermitian. Let A be of A corresponding right eigenvector x. Show that also aa left left eigenvector eigenvector for for A. right eigenvector x. Show that xx is is also A. Prove Prove the the same same result result if A A is skew-Hermitian. if is skew-Hermitian. 5x5 4. Suppose a matrix A E€ lR. R5x5 has eigenvalues {2, {2, 2, 2, 2, 3}. 3}. Determine all possible JCFs for A. JCFs for A.
5. 5. Determine the eigenvalues, eigenvalues, right eigenvectors eigenvectors and and right principal vectors if if necessary, and (real) JCFs of the following matrices: (a)
2 -1 ] 0 ' [ 1
6. Determine the the JCFs JCFs of 6. Determine of the the following following matrices: matrices:
Uj
n
7. 7. Let Let A =
-2 -1
2
=n
[H -1]· 2
2"
1
Find aa nonsingular nonsingular matrix matrix X X such that X X-IAX AX = = J,J, where where JJ is is the the JCF JCF Find such that
J=[~0 0~ 1~]. r Hint: Use[— Use[-11 11 -— l] I]T and[1 0 of Hint: as an an eigenvector. The vectors [0 [0 1 -If — l] r and[l 0]r (2) (1) I)x(2) = x x(1) can't be solved. are both eigenvectors, but then the equation (A (A -— /)jc
8. that all right eigenvectors of the the Jordan Jordan block block matrix matrix in in Theorem 9.30 must must be 8. Show Show that all right eigenvectors of Theorem 9.30 be multiples lR. k . Characterize Characterize all multiples of of el e\ eE R*. all left left eigenvectors. eigenvectors. x T 9. Let A A eE R" lR.nxn A = xyT, x, y y e E R" lR.n are nonzero vectors with " be of the form A = xy , where x, TT xx yy = 0. O. Determine Determine the the JCF JCF of of A. A. xn T 10. Let A A eE R" lR. nxn be of the form form A A = = 1+ xyT, where x, x, y y e E R" lR. n are nonzero vectors vectors 10. Let be of the / + xy , where are nonzero TT with x yy = O. Determine Determine the the JCF A. with = 0. JCF of of A.
16x 16 11. Suppose a matrix A A Ee R lR. 16x 16 has has 16 eigenvalues eigenvalues at at 00 and its its JCF JCF consists consists of of a single single 16 Jordan Jordan block of the form form specified specified in Theorem 9.22. 9.22. Suppose Suppose the small number 1010~16 is to the the (16,1) (16,1) element element of What are are the the eigenvalues eigenvalues of of this this slightly slightly perturbed is added added to of J. J. What perturbed matrix? matrix?
94
Chapter 9. 9. Eigenvalues Eigenvalues and and Eigenvectors Eigenvectors Chapter
x 12. A E jRnxn A = SIS2, SI 12. Show Show that that every every matrix matrix A e R" " can can be be factored factored in in the the form form A Si$2, where where Si and real symmetric symmetric matrices matrices and and one one of of them, them, say say S1, Si, is is nonsingular. nonsingular. and £2 S2 are are real Hint: Suppose A = A to Hint: Suppose A = XXl J XX-I ~ l is is aa reduction reduction of of A to JCF JCF and and suppose suppose we we can can construct construct the the "symmetric "symmetric factorization" factorization" of of 1. J. Then Then A = = (X ( X SSIXT)(Xi X T ) ( X ~ T T S2X-I) S2X~l) would would be the A. Thus, required required symmetric symmetric factorization factorization of of A. Thus, it it suffices suffices to to prove prove the the result result for for the the JCF. The The transformation transformation P P in in (9.18) (9.18) is is useful. useful. JCF. x 13. jRn xn is similar similar to to its its transpose transpose and and determine determine aa similarity similarity 13. Prove Prove that that every every matrix matrix A Ee W " is transformation explicitly. explicitly. transformation Hint: Use the the factorization factorization in in the the previous previous exercise. exercise. Hint: Use
14. block upper upper triangular 14. Consider Consider the the block triangular matrix matrix A _ [ All
-
0
Al2 ]
A22
'
xn kxk where A Ee M" jRnxn and All jRkxk with 1 ::s: n. Al2 we and A e R 1 ::s: < k < n. Suppose Suppose A ^ 0 and and that we n E u =1= want to to block diagonalize A via the similarity transformation A via the similarity transformation want block diagonalize
where X X Ee IRkx(n-k), R*x <«-*), i.e., T-IAT = [A011
0
A22
]
.
Find aa matrix matrix equation equation that that X X must must satisfy satisfy for for this this to to be If nn = = 22 and and kk = = 1, Find be possible. possible. If 1, what you say All and A22, what can can you say further, further, in in terms terms of of AU and A 22, about about when when the the equation equation for for X is is solvable? solvable? 15. Prove Theorem 15. Prove Theorem 9.42. 9.42. 16. 16. Prove Prove Theorem Theorem 9.43. 9.43.
en
A Ee C"xn xn has that 17. 17. Suppose Suppose A has all all its its eigenvalues eigenvalues in in the the left left half-plane. half-plane. Prove Prove that sgn(A) sgn(A) = = -1. -/.
Chapter 10 Chapter 10
Canonical Canonical Forms Forms
10.1 10.1
Some Basic Basic Canonical Canonical Forms Some Forms
Problem: Let Let V and W W be be vector vector spaces and suppose suppose A A :: V ---+ W W is is aa linear linear transformation. transformation. Problem: V and spaces and V —>• Find V and "simple form" or "canonical "canonical Find bases bases in in V and W W with with respect respect to to which which Mat Mat A A has has aa "simple form" or mxn n xn form." In In matrix matrix terms, terms, if if A A eE R IR mxn find P eE lR;;:xm and Q lR~xn such that PAQ P AQ has has aa form." ,, find R™ xm and Q eE R such that n "canonical form." form." The The transformation transformation A A M» f--+ PAQ P AQ is is called called an an equivalence; it is called an an "canonical equivalence; it is called orthogonal orthogonal equivalence equivalence if if P P and and Q are are orthogonal orthogonal matrices. matrices. xn Remark 10.1. We can also and Remark 10.1. We can also consider consider the the case case A A eE C emmxn and unitary unitary equivalence equivalence if if P P and and
<2 Q are are unitary. unitary.
of interest: interest: Two special cases are are of Two special cases 1. V and and <2 Q == p1. If If W = V P"11,, the thetransformation transformation AAf--+ H>PAP-I PAP" 1 isiscalled calledaasimilarity. similarity. T T 2. If = VV and and if if Q = P pT is orthogonal, the transformation transformation A A i-» f--+ PAP P ApT is called If W = is orthogonal, the is called an orthogonal orthogonal similarity (or unitary unitary similarity in the the complex complex case). case). an similarity (or similarity in
The achieved under similarity. If The following following results results are are typical typical of of what what can can be be achieved under aa unitary unitary similarity. If A = AHH E has eigenvalues AI, ... An,n, then then there matrix U A = A 6 en C"xxn " has eigenvalues AI, . . . ,, A there exists exists aa unitary unitary matrix £7 such suchthat that UHHAU D, where where D D == diag(AJ, diag(A.j,..., A. n ). This This is is proved proved in in Theorem Theorem 10.2. 10.2. What What other other U AU =— D, ... , An). answer is given in in Theorem matrices are are "diagonalizable" "diagonalizable" under under unitary unitary similarity? matrices similarity? The The answer is given Theorem x 10.9, where C"nxn " is is unitarily similar to 10.9, where it it is is proved proved that that aa general general matrix matrix A A eE e unitarily similar to aa diagonal diagonal H H and only only if if it it is is normal normal (i.e., (i.e., AA AA H = = A AHA). Normal matrices matrices include include Hermitian, Hermitian, matrix if matrix if and A). Normal skew-Hermitian, (and their symmetric, skewskew-Hermitian, and and unitary unitary matrices matrices (and their "real" "real" counterparts: counterparts: symmetric, skewsymmetric, and and orthogonal, orthogonal, respectively), respectively), as as well well as as other other matrices matrices that that merely merely satisfy the symmetric, satisfy the a definition, as A _ b ^1 for for real scalars aa and If aa matrix definition, such such as A= = [[_~ real scalars and b. h. If matrix A A is is not not normal, normal, the the JCF described described in 9. most "diagonal" we can can get is the the JCF most "diagonal" we get is in Chapter Chapter 9.
!]
x Theorem en xn AI, ... Theorem 10.2. 10.2. Let A = = AHH eE C" " have (real) eigenvalues A.I, . . . ,,An. Xn. Then there HH exists aa unitary unitary matrix matrix X X such such that that X X AX AX = D= diag(Al, ... An) (the columns columns ofX of X are are exists = D = diag(A.j, . . . ,, X n) (the orthonormal eigenvectors for orthonormal eigenvectors for A). A).
95 95
96 96
Chapter 10. 10. Canonical Canonical Forms Forms Chapter
Proof: Let x\ eigenvector corresponding corresponding to X\, xf*x\ = Proof' XI be a right eigenvector AI, and normalize it such that x~ XI = 1. Then Then there exist n . . . ,, xXnn such such that that X = (XI, [x\,..., = 1. there exist n -— 11 additional additional vectors vectors xX2, ... , x xn] 2, ... n] = [x\ X22]] is unitary. Now [XI XHAX
=[
xH I XH ] A [XI 2
X 2]
=[ =[ =[
x~Axl
X~AX2 XfAX 2
XfAxl
Al
X~AX2
0
XfAX 2
Al
0
0
XfAX z
]
]
(10.1)
l
(10.2)
In (l0.1) (10.1) we have used fact that = AIXI. k\x\. When When combined combined with with the the fact fact that In we have used the the fact that Ax\ AXI = that x~ Al remaining in the (l,I)-block. (2, I)-block by x"xiXI = = 1, 1, we get A-i (l,l)-block. We also get 0 in the (2,l)-block orthogonal to all vectors in X (l,2)-block by noting that x\ XI is orthogonal Xz. 2. In (10.2), we get 0 in the (l,2)-block H XH AX AX is Hermitian. The proof induction upon noting noting that X proof is completed easily by induction that the (2,2)-block ... , A. An.n . 0 (2,2)-block must have eigenvalues A2, X2,..., D XI Ee JRn, X— = Given a unit vector x\ E", the construction of X2z Ee JRnx(n-l) ]R"X("-1) such that X [XI orthogonal is frequently [x\ X22]] is orthogonal frequently required. The construction can actually be performed quite easily by means of Householder Householder (or Givens) transformations transformations as in the proof proof of the following general general result. following result. nxk 10.3. Let X\ E e C Cnxk have orthonormal orthonormal columns columns and and suppose U is is a unitary have Theorem 10.3. Let XI suppose V a unitary kxk matrix such such that that V UX\ = [\ ~], 1, where is matrix XI = where R R €E Ckxk is upper upper triangular. triangular. Write Write U V HH = [U\ [VI U Vz]] 0
2
nxk with Ui VI E €C Cnxk . Then [XI [Xi V U2]] is unitary.
Proof: Xk]. Construct sequence of of Householder (also known Proof: Let Let X\ X I = [x\,..., [XI, ... ,xd. Construct aa sequence Householder matrices matrices (also known HI, ... , H Hkk in the usual way (see below) such that as elementary reflectors) H\,..., Hk ... HdxI, ... , xd = [
~
l
..., , Xk U= = where R is upper triangular (and nonsingular since x\, XI, ... Xk are orthonormal). Let V H UH = /,-•• H Hk'" HI. Then VH = /HI'" Hkk and and k...H v. Then
H Then x^U = 0 (i (/ E € ~) k) means that xXif is orthogonal to each of the n — U2. X i U2 - kk columns of V2. 2 = But the latter are orthonormal since they are the last n -— kk rows of the unitary matrix U. U. Thus. [XI unitary. 0 Thus, [Xi U2] f/2] is unitary. D
10.3 The construction called called for in Theorem 10.2 is then a special case of Theorem Theorem 10.3 for kk = 1. = 1. 1. We illustrate the construction of the necessary Householder matrix for kk — For simplicity, simplicity, we consider the real case. Let the unit vector x\ [£i, .. . . ,. ,, ~nf. %n]T. XI be denoted by [~I,
10.1. Basic Canonical Canonical Forms 10.1. Some Some Basic Forms
97
Then X^2 is is given given by Then the the necessary necessary Householder Householder matrix matrix needed needed for for the the construction construction of of X by + r TT , U = I -—2uu+ = I +uu where u = [';1 ± 1, ';2, ... , ';nf. It can easily be checked 2uu — u-^UU , where u [t-\ 1, £2, • • •» £«] - It can checked u that U U is symmetric symmetric and U UTTU U = = U U22 = = I, I, so U U is orthogonal. orthogonal. To see that U U effects effects the necessary is easily easily verified = 2± 2£i and = 11 ± necessary compression compression of of jci, Xl, it it is verified that that U u TTU u = ± 2';1 and U u TTX\ Xl = ± £1. ';1. Thus,
Further details on Householder matrices, including the choice of sign and the complex case, consulted in standard numerical linear linear algebra can be consulted standard numerical algebra texts such as [7], [7], [11], [11], [23], [23], [25]. [25]. The real version of Theorem 10.2 10.2isisworth worthstating statingseparately separately since sinceititisisapplied appliedfrefrequently quently in in applications. applications. T nxn Theorem 10.4. Let A A = A AT jRnxn have eigenvalues eigenvalues k\, AI, ... , An. Then there there exists an 10.4. Let eE E have ... ,X exists an n. Then lxn jRn xn (whose orthogonal matrix X eE W (whose columns are orthonormal eigenvectors of of A) such that T XT AX = = D D= = diag(Xi, diag(Al, .... X AX . . , An). X n ).
A (with the obvious analogue Note that Theorem 10.4 implies that a symmetric matrix A from 10.2for forHermitian Hermitian matrices) matrices) can canbe bewritten written from Theorem Theorem 10.2 n
A = XDX
T
= LAiXiXT,
(10.3)
i=1
spectral representation of A. In fact, A in (10.3) is actually a which is often often called the spectral weighted sum of orthogonal projections P, Pi (onto the one-dimensional one-dimensional eigenspaces eigenspaces corresponding 's),i.e., i.e., sponding to to the the A., Ai'S), n
A
= LAiPi, i=l
where = PUM —xxiXt ixf = ixj since where P, Pi = PR(x;) = =xxixT sincexjxTxi Xi — =1.1.
The following pair of theorems form the theoretical theoretical foundation of the double-Francisdouble-FrancisQR algorithm used to compute matrix eigenvalues in a numerically stable and reliable way.
98
Chapter Canonical Forms Chapter 10. 10. Canonical Forms
x Theorem 10.5 Let A A eE C" cnxn Then there there exists exists a a unitary unitary matrix matrix U such that that Theorem 10.5 (Schur). (Schur). Let ". . Then U such H U H AU U AU == T, T, where where TT is is upper upper triangular. triangular.
Proof: The proof of of this this theorem theorem is is essentially essentially the the same same as that of of Theorem lO.2 except except that that Proof: The proof as that Theorem 10.2 in this this case case (using (using the the notation notation U rather than than X) X) the the (l,2)-block AU2 is is not not 0. O. 0 in U rather (l,2)-block wf AU2 D
ur
of A A E IRn xxn it is is thus thus unitarily unitarily similar to an an upper upper triangular triangular matrix, matrix, but but In the the case case of In e R" ",, it similar to if A A has has aa complex complex conjugate conjugate pair pair of of eigenvalues, eigenvalues, then then complex arithmetic is if complex arithmetic is clearly clearly needed needed to place place such such eigenvalues eigenValues on on the the diagonal diagonal of of T. T. However, However, the the next next theorem theorem shows shows that that every every to xn A eE W IRnxn is also also orthogonally orthogonally similar similar (i.e., (i.e., real real arithmetic) arithmetic) to to aa quasi-upper-triangular A is quasi-upper-triangular matrix. A A quasi-upper-triangular matrix is is block block upper upper triangular triangular with with 1 matrix. quasi-upper-triangular matrix 1 xx 11 diagonal diagonal blocks corresponding to corresponding to blocks corresponding to its its real real eigenvalues eigenvalues and and 2x2 2 x 2 diagonal diagonal blocks blocks corresponding to its its complex conjugate conjugate pairs pairs of of eigenvalues. eigenvalues. complex
Theorem 10.6 Let A A E IR n xxn. there exists exists an an orthogonal 10.6 (Murnaghan-Wintner). Let e R" ". Then Then there orthogonal T T matrix U such that that U AU = where S S is is quasi-upper-triangular. matrix U such U AU = S, S, where quasi-upper-triangular. Definition 10.7. triangular matrix matrix T in Theorem Theorem 10.5 is called Schur canonical canonical Definition 10.7. The The triangular T in 10.5 is called aa Schur form The quasi-upper-triangular S in 10.6 is real form or or Schur Schur form. fonn. The quasi-upper-triangular matrix matrix S in Theorem Theorem 10.6 is called called aa real Schur canonical form form or real Schur Schur form fonn (RSF). columns of unitary [orthogonal} Schur canonical or real (RSF). The The columns of aa unitary [orthogonal] matrix U that reduces reduces a a matrix matrix to [real} Schur Schur form fonn are are called called Schur matrix U that to [real] Schur vectors. vectors.
Example 10.8. 10.8. The The matrix matrix
s~ [ -20
4
h[
1
-2
is is in in RSF. RSF. Its Its real real JCF JCF is is
1 -1
5
0
0 0
n n
Note corresponding first Note that that only only the the first first Schur Schur vector vector (and (and then then only only if if the the corresponding first eigenvalue eigenvalue if U orthogonal) is is an an eigenvector. eigenvector. However, However, what what is is true, true, and and sufficient for virtually virtually is real real if is U is is orthogonal) sufficient for all applications applications (see, (see, for for example, example, [17]), is that that the the first first k Schur vectors span span the the same all [17]), is Schur vectors same Ainvariant subspace the eigenvectors corresponding to to the the first first k eigenvalues along the the invariant subspace as as the eigenvectors corresponding eigenvalues along diagonal of of T (or S). diagonal T (or S). While every every matrix matrix can can be be reduced reduced to to Schur Schur form (or RSF), RSF), it it is is of of interest interest to to know While form (or know when we we can go further further and reduce aa matrix matrix via via unitary unitary similarity to diagonal diagonal form. form. The when can go and reduce similarity to The following following theorem theorem answers answers this this question. question. x Theorem 10.9. 10.9. A C"nxn " is is unitarily unitarily similar Theorem A matrix matrix A A eE c similar to to a a diagonal diagonal matrix matrix ifif and and only only if if H H H A is is normal normal (i.e., (i.e., A AHAA = = AA A AA ).).
Proof: Suppose Suppose U is aa unitary unitary matrix matrix such such that that U AU = D, where where D D is is diagonal. diagonal. Then Then Proof: U is UHH AU = D, AAH
so is normal. so A A is normal.
= U VUHU VHU H = U DDHU H == U DH DU H == AH A
10.2. Definite Matrices 10.2. Definite Matrices
99
Conversely, suppose A A is normal and let U A U = T, U be a unitary matrix such that U UHHAU T, where T T is an upper triangular matrix (Theorem (Theorem 10.5). Then
It It is then a routine exercise to show that T T must, in fact, be diagonal.
10.2 10.2
0 D
Definite Matrices Definite Matrices
xn Definition 10.10. A e lR. Wnxn is Definition 10.10. A symmetric symmetric matrix matrix A A E
definite if if and only if ifxxTTAx > 0Qfor all nonzero nonzero xx G Wn1.. We We write write A > 0. 1. positive positive definite and only Ax > for all E lR. A > O.
2. nonnegative definite (or x TT Ax Ax :::: for all (or positive positive semidefinite) if if and and only only if if X > 0 for all n nonzero xx Ee lR. W. • We We write write A > 0. A :::: O. nonzero 3. negative negative definite if - A is positive positive definite. write A A < O. if—A definite. We We write < 0. 4. nonpositive definite (or negative semidefinite) if We (or negative if—-A A is nonnegative nonnegative definite. definite. We write < 0. write A A ~ O. Also, if A and B are symmetric we write write A > B if and only if or Also, if A and B are symmetric matrices, matrices, we A > B if and only if AA -— BB >> 0 or B — - A A < < 0. O. Similarly, Similarly, we we write write A A :::: B ifif and and only only ifA if A — - B>QorB B :::: 0 or B — - A A < ~ 0. O. B > B
e
x nxn Remark If A A Ee C" Remark 10.11. 10.11. " is Hermitian, all the above definitions hold except that superscript s. Indeed, this is generally true for all results in the remainder of of superscript H //ss replace T Ts. this section that may be stated in the real case for simplicity.
Remark 10.12. If If a matrix is neither neither definite nor semidefinite, semidefinite, it is said to be indefinite. indefinite. H nxn Theorem 10.13. Let Let A A = AH with AI{ :::: A22 :::: An.n. Thenfor = A eE e Cnxn with eigenvalues eigenvalues X > A > ... • • • :::: > A Then for all all E en, x eC",
Proof: Let U A as in Theorem 10.2. Proof: U be a unitary matrix that diagonalizes diagonalizes A 10.2. Furthermore, Furthermore, let yv = U UHHx, x, where x is an arbitrary vector in en, CM, and denote the components of y by j]i, ii En. € n. Then Then 11;, n
x HAx = (U HX)H U H AU(U Hx) = yH Dy = LA; 111;12. ;=1
But clearly n
LA; 11'/;12 ~ AlyH Y = AIX HX ;=1
100 100
and and
Chapter 10. 10. Canonical Canonical Forms Forms Chapter
n
LAillJilZ:::
AnyHy = An xHx ,
i=l
from which the theorem follows.
0 D
H nxn nxn Remark 10.14. The ratio ^^ XHHAx for A = AH E eC and Remark = A <= andnonzero nonzerox jcEeen C"isiscalled calledthe the x x of jc. x. Theorem Theorem 1O.l3 provides upper (AO (A 1) and lower (An) Rayleigh quotient of 10.13 provides (A.w) bounds for H x AH enxn x HHAx Ax > the Rayleigh quotient. If A = = A eE C" " is positive definite, X > 0 for all nonzero E C",soO en, so 0 < XAnn <::::: ••• ... < ::::: A.I. AI. x E I
x H Corollary 10.15. Let ". . Then Then IIAII2 \\A\\2 = =^ A}. Corollary Let A A e E C" enxn Ar1ax(AH A). m(A
Proof: For all x €E en Proof: C" we have
I
Let jc Let x be be an an eigenvector eigenvector corresponding corresponding to to X Amax (AHHA). A). Then Then ^pjp 111~~1~22 = ^^(A" Ar1ax (A HA), A), whence whence max(A IIAxll2 ! H IIAliz = max - - = Amax{A A). xfO IIxll2
0
Definition 10.16. A principal submatrix submatrixofan n x n matrix A is the (n — -k) x (n — -k) Definition of an nxn k)x(n k) matrix that remains by deleting k rows and the corresponding k columns. A leading principal submatrix of of order n — - k is obtained obtained by deleting the last k rows and and columns. x ~nxn positive definite definite ififand and only only ififany any of ofthe the Theorem 10.17. A symmetric matrix A eE E" " is positive following three equivalent equivalent conditions hold: following
determinants of principal submatrices of 1. The determinants of all leading principal of A are positive. positive.
positive. 2. All All eigenvalues eigenvalues of of A A are are positive. T 3. A can be written in the form form M MT M, where M eE R" ~n xxn " is nonsingular. x ~n xn definite if and only Theorem 10.18. A symmetric matrix A €E R" " is nonnegative definite if and only if if any of the following following three equivalent equivalent conditions hold: of
1. The determinants of all principal principal submatrices submatrices of of A are nonnegative. of all
2. eigenvalues of nonnegative. 2. All All eigenvalues of A A are are nonnegaTive. T ix 3. A can be written wrirren in [he/orm MT M, where where M M 6 E R IRb ~ rank(A) ranlc(A) "" ranlc(M). 3. A can be in the form M M, " and — rank(M).
Remark 10.19. Note that the determinants of all principal "ubm!ltriC[!!l eubmatrioesmu"t muetbB bQnonnBgmivB nonnogativo R.@mllrk 10.19. Not@th!ltthl!dl!termin!lntl:ofnllprincip!ll in Theorem 10.18.1, not just those of the leading principal submatrices. For example, Theorem 10.18.1, consider 1. The The determinant determinant of submatrix is is 0 consider the the matrix matrix A A — = [[~0 _l~]. of the the 1x1 I x 1 leading leading submatrix 0 and and 2 x 2 leading submatrix is also 0 0 (cf. determinant of the 2x2 the determinant (cf. Theorem Theorem 10.17). 10.17). However, the
101 101
10.2. 10.2. Definite Definite Matrices Matrices
principal principal submatrix submatrix consisting consisting of of the the (2,2) (2,2) element element is, is, in in fact, fact, negative negative and and A is is nonpositive nonpositive definite. Remark Remark 10.20. 10.20. The The factor factor M M in in Theorem Theorem 10.18.3 10.18.3 is is not not unique. unique. For For example, example, if if
then can be be then M M can
[1 0], [
fz -ti
o o
l [~~ 0] 0
v'3
, ...
0
Recall > B B if if the B is definite. The The following Recall that that A A :::: the matrix matrix A A -— B is nonnegative nonnegative definite. following theorem is useful "comparing" symmetric is straightforward straightforward from from in "comparing" symmetric matrices. matrices. Its Its proof proof is theorem is useful in basic basic definitions. definitions. nxn Theorem 10.21. Let A, B eE R jRnxn be symmetric. nxm T Band M E R jRnxm,, then M MT AM :::: 1. 1f If A :::: >BandMe AM > MT MTBM. BM. nxm T 2. If A> Band jR~xm, MT AM> 2. Ifj A >B and M eE R , then M AM > MT M. TBM. BM. m
proof (see, The following standard standard theorem theorem is stated stated without proof (see, for for example, example, [16, [16,p.p. xn nxn 181]). the notion notion of root" of of aa matrix. matrix. That That is, is, if if A E ,,we 181]). It concerns concerns the of the the "square "square root" € lR. E" wesay say nx that S Ee R jRn xn"isisa asquare that squareroot rootofofAAififS2S2 =—A.A. InIngeneral, general,matrices matrices(both (bothsymmetric symmetricand and nonsymmetric) have have infinitely infinitely many many square square roots. roots. For For example, matrix S of of nonsymmetric) example, if if A = = lz, /2, any any matrix c e s 9 . [COSO Sino] " the form [ °* _ ™ ] is a square root. the 10rm ssinOe _ ccosOe IS a square root. x nxn Theorem 10.22. A Ee lR. Theorem 10.22. Let A R" "be benonnegative nonnegativedefinite. definite. Then ThenAAhas hasaaunique uniquenonnegative nonnegative definite = AS = rank A (and hence S S is positive S. Moreover, SA = AS and rankS rankS = rankA definite square root S. definite ifif A is positive positive definite). definite definite).
A stronger form of of the third characterization characterization in available and is A stronger form the third in Theorem Theorem 10.17 10.17 is is available and is known as Cholesky factorization. factorization. It It is is stated stated and for the the more more general general and proved proved below below for known as the the Cholesky Hermitian case. Hermitian case. nxn Theorem 10.23. 10.23. Let A eE c be Hermitian Theorem
and positive positive definite. definite. Then there exists a with positive positive diagonal elements such that
Proof: The The proof proof is is by by induction. induction. The The case = 1 is is trivially true. Write Write the the matrix matrix A A in in Proof: case n = trivially true. the form form the
By our induction induction hypothesis, hypothesis, assume assume the the result result is is true true for for matrices so that that B By our matrices of of order order n -— 11 so B may be written as as B = = L\L^, L1Lf, where L\ Ll eE c(n-l)x(n-l) and lower triangular C1-""1^""^ is nonsingular and
102 102
Chapter Chapter 10. 10. Canonical Canonical Forms Forms
with positive diagonal elements. It It remains to prove that we can write the n x n matrix A in the form in the form
b ] = [Lc J
ann
0 ]
a
[Lf0
c a
J,
multiplication and equating the corwhere a is positive. Performing the indicated matrix multiplication H responding submatrices, we we see we must have L\c L IC = b and ann cH cC + aa22.• Clearly see that we =b and a =C nn = c is given simply by by c = C,lb. L^b. Substituting Substituting in in the the expression involving involving a, we we find find H LIH L11b a22 = = ann ann -— bbHL\ L\lb = = ann ann -— bbHH B-1b B~lb (= the Schur complement of B B in A). A). But But we know that
o < det(A) =
det [
~
b ] = det(B) det(a nn _ b H B-1b). ann
H l Since det(B) ann - bH B-1b > 0. O. Choosing Choosing aa to be be the positive square det(fi) > > 0, we must have a B b > nn —b l H of «„„ ann -— bb B~ B-1b completes the proof. 0 root of b completes D
10.3 10.3
Equivalence Equivalence Transformations Transformations and and Congruence Congruence
71 xm x Theorem 10.24. Let A €E C™* c;,xn. c~xn . Then Then there exist exist matrices P Ee C: C™xm and Q eE C" such n " such that that
PAQ=[~ ~l
(l0.4)
Proof: A classical proof proof can be consulted in, for example, [21, Proof: [21,p.p.131]. 131].Alternatively, Alternatively, suppose A has an SVD of the form (5.2) in its complex version. Then
[
Take P
=[
'f [I ]
S~
H
U
0 ] [ I Uf
S-l
o
and Q
=
]
AV
=
[I0
V to complete the proof.
0 ] 0 .
0
Note that the greater freedom afforded afforded by the equivalence transformation of Theorem 10.24, as opposed to the more restrictive situation of a similarity transformation, yields a far "simpler" canonical form (10.4). However, numerical procedures procedures for computing such an equivalence directly via, say, Gaussian or elementary row and column operations, are generally unreliable. The numerically preferred equivalence is, of course, the unitary unitary equivalence known as the SVD. However, the SVD is relatively expensive to compute and other canonical forms exist that are intermediate between (l0.4) (10.4) and the SVD; see, for example [7, Ch. 5], [4, Ch. 2]. Two such forms are stated here. They are more stably computable than (lOA) (10.4) and more efficiently efficiently computable than a full SVD. Many similar results are also available. available.
10.3. Transformations and Congruence 10.3. Equivalence Equivalence Transformations and Congruence
103 103
x Theorem 10.25 10.25 (Complete (Complete Orthogonal Decomposition). Let Let A A Ee C™ e~xn.". Then exist Theorem Orthogonal Decomposition). Then there there exist mxm nxn mxm nxn unitary matrices U e and V e such that that unitary matrices U eE C and V Ee C such
(10.5)
where R Ee e;xr upper (or lower) triangular triangular with with positive positive diagonal diagonal elements. where R €,rrxr is is upper (or lower) elements.
Proof: For the proof, proof, see Proof: For the see [4]. [4].
0 D x
mxm Let A A eE C™ e~xn.". Then exists a a unitary unitary matrix matrix Q Q Ee C e mxm and and aa Theorem 10.26. 10.26. Let Theorem Then there there exists x permutation matrix IT E en xn such that permutation Fl e C" "
QAIT =
[~ ~
l
(10.6)
r xr rx( r) E C e;xr erx(n-r) arbitrary but in general general nonzero. nonzero. where R E upper triangular and S eE C " is arbitrary r is upper
Proof: For the see [4]. [4]. Proof: For the proof, proof, see
D 0
Remark 10.27. When A has has full column rank rank but but is "near" aa rank rank deficient deficient matrix, Remark 10.27. When A full column is "near" matrix, various rank rank revealing decompositions are can sometimes detect such such various revealing QR QR decompositions are available available that that can sometimes detect phenomena at considerably less less than than aa full Again, see see [4] phenomena at aa cost cost considerably full SVD. SVD. Again, [4] for for details. details. nxn n xn H e nxn and X X e E C e~xn. H- X XH AX is called Definition 10.28. Definition 10.28. Let A eE C The transformation A i-> AX n . The aa congruence. congruence. Note Note that that aa congruence congruence is is aa similarity similarity if if and and only only if ifXX is is unitary. unitary.
Note that that congruence preserves the the property property of of being being Hermitian; Hermitian; i.e., if A A is Note congruence preserves i.e., if is Hermitian, Hermitian, then AX is is also also Hermitian. Hermitian. It of interest to ask ask what what other properties of of aa matrix matrix are are then X XHH AX It is is of interest to other properties preserved under under congruence. congruence. It turns out the principal principal property property so so preserved preserved is is the the sign sign preserved It turns out that that the of of each each eigenvalue. eigenvalue. H x nxn Definition 10.29. Let =A eE C" " and and let the numbers positive, Let A A = AH e let 7t, rr, v, v, and and £ ~ denote denote the numbers of of positive, Definition 10.29. negative, and zero eigenvalues, respectively, of of A. A. Then inertia of of negative, and eigenvalues, respectively, Then the inertia of A is is the the triple of numbers v, n of A is sig(A) = v. numbers In(A) In(A) = (rr, (n, v, £). The The signature signature of is given by sig(A) = nrr -— v.
Example 10.30. Example 10.30.
o 1 o o
0] 00 -10 =(2,1,1).
l.In[!
0
0
x 2. If A = A" AH Ee Ce nnxn if and and only only if In(A) = (n, 0, 0). 2. If A " , ,t hthen e n AA> > 00 if if In (A) = (n,0,0).
In(A) = (rr, v, £), n, then rank(A) = n rr + v. 3. If In(A) (TT, v, then rank(A) v. n xn Theorem 10.31 10.31 (Sylvester's (Sylvester's Law Law of Inertia). Let A = A HHE xn and X e E C e~ nxn.. Then Theorem of Inertia). e en Cnxn H H AX). In(A) In(A) == In(X ln(X AX).
Proof: For For the the proof, proof, see, for example, p. 134]. D Proof: see, for example, [21, [21, p. 134]. D Theorem Theorem 10.31 10.31guarantees guaranteesthat thatrank rankand andsignature signatureofofa amatrix matrixare arepreserved preservedunder under We then then have have the the following. congruence. congruence. We following.
104 104
Chapter 10. Chapter 10. Canonical Canonical Forms Forms
H xn nxn Theorem 10.32. Let A = A AH with In(A) = (jt, (Jr, v, v, O. eE c C" In(A) = £). Then there exists a matrix xn H X E c~xn such that XH AX = diag(1, I, -1,..., -I, ... , -1, -1,0, X e C"n X AX = diag(l, .... . . ,, 1, 0, .... . . ,0), , 0),where wherethe thenumber number of of 1's's is Jr, the number of -I 's is v, and the numberofO's is~. is 7i, the number of — l's is v, the number 0/0 's is (,.
Proof: Let AI AI,, ... Anw denote the eigenvalues of of A and order them such that the first TT Jr are Proof: . . . ,, X O. By Theorem Theorem 10.2 there exists a unitary positive, the next v are negative, and the final £~ are 0. AV = matrix V U such that VH UHAU = diag(AI, diag(Ai, ... . . . ,, An). A w ). Define Define the thenn xx nnmatrix matrix
vv
= diag(I/~, ... , I/~, 1/.f-Arr+I' ... , I/.f-Arr+v, I, ... ,1).
Then it is easy to check that X X =V U VV W yields the desired desired result.
10.3.1 10.3.1
0 D
Block matrices and definiteness
T AT Theorem 10.33. Suppose A = =A and D D= = DT. DT. Then
°
T ifand A> D -- BT A-Il B > D > and A -- BD^B BD- I BT > O. if and only ifeither if either A > 0 and and D BT A~ > 0, 0, or D > 0 and > 0.
Proof: The proof proof follows by considering, for example, the congruence Proof: B ] [I D ~ 0
_A-I B I
JT [
A BT
~ ][ ~
The details are straightforward and are left left to the reader.
0 D
Remark 10.34. Note the symmetric Schur complements of A (or D) in the theorem. Remark T T AT D =D DT. Theorem 10.35. Suppose A = A and D . Then
B ] > D -
°
+ + if A:::: 0, AA AA+B = B, B. and D -- BT A+B:::: o. if and only if ifA>0, B = and D BT A B > 0.
Proof: Consider the congruence with Proof: Consider
proof of Theorem Theorem 10.33. and proceed as in the proof
10.4 10.4
0 D
Rational Form Rational Canonical Canonical Form
rational canonical form. One final canonical form to be mentioned is the rational
10.4. Rational Rational Canonical Canonical Form Form 10.4.
105 105
n x Definition 10.36. A A matrix matrix A A E Xn" is said to be nonderogatory ifits Definition e lR M" is said to be if its minimal minimal polynomial polynomial and characteristic characteristic polynomial polynomial are are the same or; Jordan canonical canonical form and the same or, equivalently, equivalently, if if its its Jordan form has only one block block associated each distinct has only one associated with with each distinct eigenvalue. eigenvalue.
xn Suppose A EE lR is aa nonderogatory nonderogatory matrix characteristic polynoSuppose A Wnxn is matrix and and suppose suppose its its characteristic polynon(A) = A" An -— (ao alA + ... + A + an_IAn-I). a n _iA n ~')- Then Then it it can can be be shown shown (see (see [12]) [12]) that that A mial is 7r(A) (a0 + + «A is similar is similar to to aa matrix matrix of of the the form form
o o
o
o 0
o
(10.7)
o
o
nxn Definition 10.37. 10.37. A " of Definition A matrix matrix A A eE E lRnx of the the form form (10.7) (10.7) is is called called a a companion cornpanion matrix rnatrix or or is to be in companion cornpanion forrn. is said said to be in form.
Companion matrices matrices also also appear appear in in the the literature literature in in several several equivalent equivalent forms. forms. To To Companion illustrate, consider the the companion matrix illustrate, consider companion matrix
(l0.8)
This in lower Hessenberg form. This matrix matrix is is aa special special case case of of aa matrix matrix in lower Hessenberg form. Using Using the the reverse-order reverse-order identity P given by (9.18), (9.18), A A is is easily to be be similar to the the following matrix identity similarity similarity P given by easily seen seen to similar to following matrix in upper Hessenberg Hessenberg form: in upper form: a2
al
o
0
1
0
o
1
6]
o . o
(10.9)
Moreover, since since aa matrix matrix is is similar similar to to its its transpose transpose (see (see exercise exercise 13 13 in in Chapter Chapter 9), 9), the the Moreover, following are also also companion companion matrices matrices similar similar to above: following are to the the above:
:l ~ ! ~01]. ao
0
(10.10)
0
Notice that that in in all cases aa companion companion matrix matrix is is nonsingular nonsingular if and only only if ao i= Notice all cases if and if aO /= O. 0. In fact, the inverse of aa nonsingular nonsingular companion matrix is in companion companion form. form. For In fact, the inverse of companion matrix is again again in For £*Yamr\1j=» example,
o 1
o
-~ ao
1
o o
-~ ao
o o
_!!l
o o
(10.11)
Chapter 10. 10. Canonical Canonical Forms Forms Chapter
106
with with aa similar similar result result for for companion companion matrices matrices of of the the form form (10.10). (10.10). If If a companion matrix of the form (10.7) is singular, singular, i.e., if if ao ao = = 0, then its pseudo1 ... , an-If inverse can still be computed. Let a Ee JRn-1 M"" denote the vector [ai, \a\, a2, 02,..., a n -i] and and let l r . Then it is easily verified that I+~T a' Then it is easily verified that cc = l+ a a
o
o
o
+
o o
o
o
o
o
1- caa T
o
ca
J.
Note that /I -— caa TT = = (I + + aaTT) ) -I ,, and hence the pseudoinverse of a singular companion matrix is not companion matrix matrix unless = 0. O. matrix is not aa companion unless a a= Companion matrices matrices have interesting properties, among which, perCompanion have many many other other interesting properties, among which, and and perhaps surprisingly, surprisingly, is is the the fact singular values found in in closed form; see see haps fact that that their their singular values can can be be found closed form; [14].
Theorem 10.38. 10.38. Let GI > ••• > the singular values of of the companion matrix matrix Theorem Let a\ al > ~ a2 ~ ... ~ a ann be be the singular values the companion A a = Then Leta = a\ + + a\ ai + + •...• • ++ a%_ a;_1{ and and yy = = 1 1+ + «.Q ++ a. a. Then A in in (10.7). (10.7). Let
ar
aJ
2_ 21 ( y + Jy 2- 4ao2) '
al
-
a? = 1
for i = 2, 3, ... , n - 1,
a; = ~ (y - Jy2 - 4aJ) . Ifao ^ 0, the largest largest and and smallest smallest singular also be be written in the the equivalent equivalent form form If ao =1= 0, the singular values values can can also written in
Remark 10.39. Explicit Explicit formulas formulas for for all all the right and left singular singular vectors can Remark 10.39. the associated associated right and left vectors can also be derived easily. easily. also be derived nx If A E JRnxn If A € R " is derogatory, derogatory, i.e., has more than one Jordan block associated associated with at least least one not similar companion matrix matrix of of the at one eigenvalue, eigenvalue, then then it it is is not similar to to aa companion the form form (10.7). (10.7). However, it can be shown that a derogatory matrix is similar to a block diagonal matrix, each of each of whose whose diagonal diagonal blocks blocks is is aa companion companion matrix. matrix. Such Such matrices matrices are are said said to to be be in in rational canonical form (or Frobenius Frobenius canonical form). rational canonical form form). For details, see, for example, [12]. Companion appear frequently control and signal processing literature Companion matrices matrices appear frequently in in the the control and signal processing literature but they are are often often very very difficult difficult to to work work with numerically. Algorithms reduce but unfortunately unfortunately they with numerically. Algorithms to to reduce an companion form form are are numerically an arbitrary arbitrary matrix matrix to to companion numerically unstable. unstable. Moreover, Moreover, companion companion matrices are are known known to possess many many undesirable undesirable numerical properties. For For example, in matrices to possess numerical properties. example, in n increases, their eigenstructure is extremely ill conditioned, general and especially especially as n nonsingular ones nearly singular, unstable, and nonsingular ones are are nearly singular, stable stable ones ones are are nearly nearly unstable, and so so forth forth [14]. [14].
Exercises Exercises
107
Companion matrices matrices and and rational rational canonical canonical forms forms are are generally generally to to be be avoided avoided in in fioatingCompanion floatingpoint computation.
Remark 10.40. Theorem 10.38 10.38 yields yields some understanding of of why why difficult difficult numerical Remark 10.40. Theorem some understanding numerical behavior linear behavior might might be be expected expected for for companion companion matrices. matrices. For For example, example, when when solving solving linear equations of the form (6.2), one measure of numerical numerical sensitivity Kp(A) systems of equations sensitivity is K = P(A) = l m A -] IIpp'> the so-calledcondition conditionnumber numberof ofAA with withrespect respecttotoinversion inversionand andwith withrespect respect II ^A IIpp II A~ e so-called k to P-norm. If If this 0(10*), this number number is is large, large, say say O(lO ), one one may may lose lose up up to to kk digits digits of of to the the matrix matrix p-norm. precision. In In the the 2-norm, 2-norm, this this condition number is is the the ratio ratio of of largest largest to to smallest smallest singular singular precision. condition number explicitly as values which, by the theorem, can be determined determined explicitly
y+J y 2 - 4a5
21 a ol It is is easy k2(A) < small or or yy is both), It easy to to show show that that y/2/ao 21~01 ::::< K2(A) :::: -£-,, 1:01' and and when when GO ao is is small is large large (or (or both), then It is for yy to large n. Note K2(A) ^~ T~I. I~I' It is not not unusual unusualfor to be be large large for forlarge Note that that explicit explicit formulas formulas then K2(A) Koo(A) can also be determined easily by using (l0.11). for K] K\ (A) (A) and Koo(A) (10.11).
EXERCISES EXERCISES 1. 1. Show that if a triangular matrix is normal, then it must be diagonal. x A e E M" jRnxn" is normal, then Af(A) N(A) = = N(A ). 2. Prove that if A A/"(ATr ). nx A G E C cc nxn peA) = = maxx max)..EA(A) peA) is called the spectral 3. Let A " and define p(A) I'M- Then p(A) €A(A) IAI. radius of if A ||A||2. Show radius of A. A. Show Show that that if A is is normal, normal, then then p(A) peA) = = IIAII2' Show that that the the converse converse is is true true if if n n= = 2. 2. nxn A € E C en xn be normal with eigenvalues eigenvalues y1 A],, ... and singular singular values a\ 0'1 ~ ~ 4. Let A ..., , yAnn and > a0'22 > ... • •• ~ > an on ~ > O. 0. Show Show that that a; a, (A) (A) = IA;(A)I |A.,-(A)| for for ii E!l. e n.
5. Use the reverse-order identity identity matrix P introduced in in (9.18) (9.18) and the matrix U U in x A e E C" cc nxn Theorem 10.5 to find a unitary matrix Q that reduces A " to lower triangular form. x2 6. M]eECCC22x2 .. Find U such such that that A = I[~J : Find aa unitary unitary matrix matrix U 6. Let Let A
xn 7. A E jRn xn is positive definite, show that A A -I[ must must also also be be positive positive definite. 7. If If A e W
[1
x 8. A e E E" jRnxn is positive definite. definite. Is [ ^ 3. Suppose A " is nxn 9. Let R, R, S 6 E E jRnxn be be symmetric. Show that that [[~* }. Let symmetric. Show
R > SS-I. R>
A~I]1 > ~ 0? O? /i > 0 if and and only only if if S > > 0 and J~]1 > 0 if 0 and
108 108
Chapter Chapter 10. 10. Canonical Canonical Forms Forms
10. following matrices: 10. Find the inertia of the following (a)
[~ ~
(d) [ - 1 1- j
l
(b) [
1+ j ] -1 .
-2 1- j
1+ j ] -2 '
Chapter 11 11 Chapter
Linear and Linear Differential Differential and Difference Equations Difference Equations
11.1 11.1
Differential Differential Equations Equations
In this section the linear homogeneous system equations In this section we we study study solutions solutions of of the linear homogeneous system of of differential differential equations x(t)
= Ax(t);
x(to)
= Xo
E JR.n
(11.1)
for this for tt 2: > to. IQ. This This is is known known as as an an initial-value initial-value problem. problem. We We restrict restrict our our attention attention in in this nxn chapter to the where the the matrix A E JR.nxn is constant chapter only only to the so-called so-called time-invariant time-invariant case, case, where matrix A e R is constant and (11.1) is known always and does does not not depend depend on on t.t. The The solution solution of of (11.1) is then then known always to to exist exist and and be be unique. in terms unique. It It can can be be described described conveniently conveniently in terms of of the the matrix matrix exponential. exponential. nxn Definition 11.1. A Ee JR.nxn, JR.nxn is Definition 11.1. For For all all A Rnxn, the the matrix matrix exponential exponential eeAA Ee R is defined defined by by the power series power series
A
e =
+00 1
L
k=O
,Ak.
(11.2)
k.
The series be shown to converge A (has radius of The series (11.2) (11.2) can can be shown to converge for for all all A (has radius of convergence convergence equal equal to +(0). to +00). The Thesolution solutionof of(11.1) (11.1)involves involvesthe thematrix matrix (11.3)
which thus A and which thus also also converges converges for for all all A and uniformly uniformly in in t.t.
11.1.1 11.1.1
Properties of of the matrix exponential exponential Properties the matrix
1. eO e° = = I. I. Proof: This This follows follows immediately immediately from from Definition Definition 11.1 11.1bybysetting settingAA==O.0. Proof T
A )A = e A • 2. For For all allAAEGJR.nxn, R" XM , (e(e f - e^. Proof This follows follows immediately immediately from Definition 11.1 linearity of of the the transpose. Proof: This from Definition 11.1 and and linearity transpose. T
109 109
110 110
Chapter Chapter 11. 11. Linear Linear Differential Differential and and Difference Difference Equations Equations
3. For For all all A Ee JRnxn R"x" and and for for all all t, t, Tr Ee JR, R, Proof" Note that Proof: Note that e(t+r)A
= I
e(t+r)A e(t+T)A
rA = = = etA e'AeerA = erAe elAe'tAA..
+ (t + T)A + (t + T)2 A 2 + ... 2!
and and tA rA
e e
= ( I + t A + t2!2 A 2 +... ) ( I + T A + T2!2 A 2 +... ) .
Compare powers of A in the above Compare like like powers of A in the above two two equations equations and and use use the the binomial binomial theorem theorem on(t+T)k. on (t + T)*. xn B 4. For all JRnxn and = all A, B Ee R" and for all all t Ee JR, R, et(A+B) et(A+B) =-etAe =^e'Ae'tB = etBe e'Be'tAA if and and only if A and B commute, AB = BA. and B commute, i.e., i.e., AB =B A. Proof' Note that Proof: Note that 2
et(A+B)
= I
t + teA + B) + -(A + B)2 + ...
2!
and and
while while tB tA
e e
=
(
1+ tB
t2 2 2 2 +... ) . + 2iB +... ) ( 1+ tA + t2!A
Compare like like powers of tt in in the first equation equation and the second second or or third third and the Compare powers of the first and the and use use the binomial theorem on on (A (A + B/ B)k and and the the commutativity commutativityof ofAAand andB.B. binomial theorem x 5. ForaH JRnxn" and For all A Ee R" and for for all all t eE JR, R, (etA)-1 (e'A)~l = ee~'tAA.. Proof" Simply Proof: Simply take take TT = = -t — t in in property property 3. 3.
6. Let £ denote the Laplace transform. Then for 6. Let denote the Laplace transform transform and and £-1 £~! the the inverse inverse Laplace Laplace transform. Then for x E R" JRnxn" and for all tt € E lR, all A € R, tA } = (sI - A)-I. (a) (a) .l{e C{etA } = (sI-Arl. 1 M (b) A)-I} erA. (b) .l-I{(sl£- 1 {(j/-A)} == « .
Proof" prove only similarly. Proof: We We prove only (a). (a). Part Part (b) (b) follows follows similarly.
{+oo = io
et(-sl)e
(+oo
=io
ef(A-sl)
tA
dt
dt
since A and (-sf) commute
111 111
11.1. Differential Differential Equations 11.1. Equations
= {+oo
10
=
t
e(Ai-S)t x;y;H dt assuming A is diagonalizable
;=1
~[fo+oo e(Ai-S)t dt]x;y;H 1
n
= '"'
- - X i y;H
L..... s - A"I i=1
assuming Re s > Re Ai for i E !!
1 = A)-I. = (sI (sl --A).
The matrix matrix (s A) ~' -I is is called called the the resolvent resolvent of A and and is is defined defined for for all all ss not not in A (A). The (s II -— A) of A in A (A). Notice in in the the proof proof that that we we have have assumed, assumed, for convenience, that that A A is Notice for convenience, is diagonalizable. diagonalizable. If this is not scalar dyadic If this is not the the case, case, the the scalar dyadic decomposition decomposition can can be be replaced replaced by by m
et(A-sl)
=L
Xiet(Ji-sl)y;H
;=1
using Allsucceeding succeedingsteps stepsin inthe theproof proof then then follow follow in inaastraightforward straightforward way. way. using the the JCF. JCF. All x A For all all A A eE R" JRnxn" and and for all t eE R, JR, 1h(e 7. For for all £(e'tA )) = AetA = etA e'AA. Proof: Since Since the the series series (11.3) is uniformly uniformly convergent, convergent, it it can can be be differentiated Proof: (11.3) is differentiated term-byterm-byterm from which the result follows immediately. Alternatively, the formal definition
d
e(t+M)A _ etA
_(/A) = lim
dt
L'lt
~t-+O
can be employed employed as follows. For any consistent matrix norm,
I
e(t+~t)AAtetA --u.-- - Ae tA
I = IIIL'lt (etAe~tA -
/A) - Ae tA
I
tA
I
=
I ~t (e~tAetA -
=
I ~t (e~tA - l)e - Ae II
=
tA
tA
I (M A
(M)2 A 2 +... ) +~
e tA - AetAil
tA
~; A 2etA + ... ) -
Ae
I L'lt
= I ( Ae + =
etA) - Ae
I ( ~; A2 + (~~)2 A
< MIIA21111e
tA II
1 ( _2!
< L'lt1lA21111e
tA Il
(1 +
-
3
+ .. , )
etA
tA
II
I
L'lt (L'lt)2 + -IIAII + --IIAI12 + ... ) 3! 4! L'ltiIAIl
= L'lt IIA 21111e tA IIe~tIIAII.
+ (~t IIAII2 + ... )
112 112
Chapter 11. 11. Linear Linear Differential Differential and and Difference Difference Equations Equations Chapter For fixed t, the For fixed t, the the right-hand right-hand side side above above clearly clearly goes goes to to 00 as as t:.t At goes goes to to O. 0. Thus, Thus, the limit and equals Ae t AA•. A A similar the limit etAA A, A, or the limit exists exists and equals Ae' similar proof proof yields yields the limit e' or one one can can use use the A fact A commutes with any A of finite degree etA. fact that that A commutes with any polynomial polynomial of of A of finite degree and and hence hence with with e' .
11.1.2
Homogeneous Homogeneous linear differential equations equations
Theorem 11.2. Let Let A A Ee IR Rnnxn xn.. The The solution solution of of the the linear linear homogeneous homogeneous initial-value initial-value problem problem x(t)
= Ax(l);
x(to)
= Xo
E
IR n
(11.4)
for t ::: to is given by
(11.5)
Proof: Proof: Differentiate Differentiate (11.5) (11.5) and and use use property property 77 of of the the matrix matrix exponential exponential to to get get xx((t) t ) == (t to)A Ae(t-to)A Xo so, by the fundamental fundamental existence and Ae ~ xo x(t0) = — e(to-to)A e(fo~t°')AXQXo = — XQ xo = Ax(t). Also, x(to) uniqueness theorem theorem for for ordinary ordinary differential differential equations, equations, (11.5) (11.5) is is the the solution solution of of (11.4). (11.4). D uniqueness 0
11.1.3
Inhomogeneous Inhomogeneous linear differential equations equations
nxn xm Theorem Let A A Ee R IR nxn B Ee W IR nxm and function uu be given Theorem 11.3. Let ,, B and let let the the vector-valued vector-valued function be given and, and, say, say, continuous. continuous. Then Then the the solution solution of of the the linear linear inhomogeneous inhomogeneous initial-value initial-value problem problem
= Ax(t) + Bu(t);
x(t)
= Xo
IRn
(11.6)
= e(t-to)A xo + t e(t-s)A Bu(s) ds. lo
(11.7)
x(to)
E
for > to IQ is is given given by by the the variation variation of of parameters parameters formula formula for tt ::: x(t)
t
Proof: Differentiate property 77 of Proof: Differentiate (11.7) (11.7) and and again again use use property of the the matrix matrix exponential. exponential. The The general general formula formula d dt
l
q
(t)
pet)
f(x, t) dx =
l
q
af(x t) ' dx pet) at (t)
+
dq(t) dp(t) f(q(t), t ) - - - f(p(t), t ) - dt dt
Ir:
( s)A is used to Ae(t-s)A Bu(s) ds + Bu(t) = Ax(t) + Bu(t). Also, to get get xx(t) (t) = = Ae(t-to)A Ae{'-to)AxXo0 + f'o Ae '- Bu(s) + Bu(t) = Ax(t) = (f fo)/1 x(to} = e(to-tolA Xo + + 0 == XQ Xo so, by the fundamental fundilm()ntill existence ()lI.i~t()Oc() and nnd uniqueness uniqu()Oc:s:s theorem theorem for for *('o) °~ .¥o ordinary 0 ordinary differential differential equations, equations, (11.7) (11.7) is is the the solution solution of of (1l.6). (11.6). D
Remark proof above parameters formula by Remark 11.4. 11.4. The The proof above simply simply verifies verifies the the variation variation of of parameters formula by direct be derived by means direct differentiation. differentiation. The The formula formula can can be derived by means of of an an integrating integrating factor factor "trick" "trick" as Ax = Bu by get as follows. follows. Premultiply Premultiply the the equation equation x x -— Ax = Bu by ee~tA to to get (11.8)
11.1. 11.1. Differential Differential Equations Equations
113
[to, t]: Now integrate (11.8) over the interval [to, t]:
1
t d -e-sAx(s) ds = to ds
1t
e-SABu(s) ds.
to
Thus, e-tAx(t) - e-toAx(to)
=
t e- sA Bu(s) ds
lto
and hence x(t) = e(t-tolA xo
11.1.4 11.1.4
+
t e(t-s)A Bu(s) ds.
lto
Linear differential equations equations Linear matrix matrix differential
Matrix-valued initial-value problems also occur frequently. The first is an obvious generalization of Theorem Theorem 11.2, 11.2,and andthe theproof proof isisessentially essentiallythe thesame. same. lxn Theorem 11.5. Let A Ee W jRnxn.. The The solution of of the matrix linear homogeneous initial-value nrohlcm problem
X(t)
=
AX(t); X(to)
=C
E jRnxn
(11.9)
for > to for tt ::: to is is given given by by X(t) = e(t-to)Ac.
(11.10)
coefficient matrices on both the right and left. For In the matrix case, we can have coefficient convenience, the following following theorem is stated with initial time to to = = 0. O. xn mxm xm Theorem jRnxn, jRmxm, ]R.nxm. the matrix initial-value Theorem 11.6. 11.6. Let A Ee Rn , B eE R , and C eE Rn . Then the problem problem
X(t) = AX(t)
+ X(t)B;
X(O) = C
(11.11)
tB . ratB = aetACe has the the solution XX(t) (t) — = etACe tA tB with respect to tt and use property Proof: Differentiate etACe property 7 of the matrix exponential. Proof: Differentiate e CetB exponential.
The fact that X X ((t) t ) satisfies the initial condition is trivial.
0 D
X
Corollary 11.7. ]R.nxn. Corollary 11.7. Let A, C eE IR" ". Then the matrix initial-value problem X(t)
= AX(t) + X(t)AT;
X(O)
=C
(11.12)
X(t) = = etACetAT. has the the solution X(t} etACetAT.
When C is symmetric in (11.12), X X ((t) t ) is symmetric and (11.12) is known as a Lyapunov differential differential equation. equation. The initial-value problem (11.11) is known as a Sylvester punov Sylvester differential equation. equation. differential
114 114
Chapter 11. 11. Linear Linear Differential Differential and and Difference Difference Equations Equations Chapter
11.1 .5 11.1.5
Modal decompositions
xn E W jRnxn E
Let A and suppose, suppose, for convenience, that is diagonalizable (if A A is not diagonalizLet A and for convenience, that it it is diagonalizable (if is not diagonalizable, the rest rest of by using decomposition able, the of this this subsection subsection is is easily easily generalized generalized by using the the JCF JCF and and the the decomposition A Ji YyitHH as A — = ^L Xf X;li as discussed discussed in in Chapter Chapter 9). 9). Then Then the the solution solution x(t) x(t) of of (11.4) (11.4) can can be be written written x(t) = e(t-to)A Xo
= (ti.iU-tO)Xiyr) Xo 1=1
n
= L(YiHxoeAi(t-tO»Xi. i=1
The Ai ki ss are are called called the the modal modal velocities velocities and and the the right right eigenvectors eigenvectors *, are called called the the modal modal The Xi are directions. The The decomposition decomposition above above expresses expresses the the solution solution x(t) as aa weighted sum of of its directions. x (t) as weighted sum its modal velocities velocities and and directions. directions. modal This be expressed This modal modal decomposition decomposition can can be expressed in in aa different different looking looking but but identical identical form form n
if we we write write the the initial initial condition condition XQ as aa weighted weighted sum sum of of the the right right eigenvectors eigenvectors Xo = L ai Xi. if Xo as i=1
Then Then
n
= L(aieAiU-tO»Xi. i=1
HXj = In the last equality equality we used the the fact that Yi yf*Xj = flij. Sfj. In the last we have have used fact that Similarly, in in the the inhomogeneous case we we can can write Similarly, inhomogeneous case write t e(t-s)A Bu(s) ds
=
i~
11.1.6
t i=1
(it eAiU-S)YiH Bu(s) dS) Xi. ~
Computation Computation of the matrix exponential exponential
JCF method JCF method x xn 1 Let A eE R" jRnxn" and jR~xn is that X" X-I AX AX = where JJ is JCF for Let A and suppose suppose X X Ee Rn is such such that = J, J, where is aa JCF for A. A. Then Then
etA = etXJX-1 = XetJX- 1 n ,
Le A• X'YiH
~
if A is diagonalizable
1=1
I t,x;e'J,y;H
in geneml.
11.1. Differential Differential Equations Equations 11.1.
115
tJ If Xe tl X-I If A is is diagonalizable, it is then easy to compute etA etA via the formula etA etA = Xe X ' tj since et I is simply a diagonal matrix. clearly reduces of In the more general case, the problem problem clearly reduces simply to the computation of kxk kxk the exponential of a Jordan block. To be specific, let .7, Ji EeC
A
o
1
o
o A o
Ji =
o
o
=U+N.
A
l N by property 4 of Clearly A/ AI and N commute. Thus, eettJiI, = eO.! e'ueetN the matrix exponential. tu lH Atx lN is ••• ,eAt). The diagonal part is easy: ee == diag(e diag(e ,',..., ext}. But eetN is almost as easy since N is nilpotent degree k. nilpotent of degree k. nx Definition 11.8. A matrix jRnxn if matrix M M E e M " is is nilpotent nilpotent of of degree degree (or (or index, index, or or grade) grade) p if p p l MP O. M = = 0, 0, while MP-I M ~ t=^ 0.
l's along only For the matrix N defined above, it is easy to check that while N has 1's its first superdiagonal (and (and O's O's elsewhere), elsewhere), N N22 has 1's along along only only its its second second superdiagonal, superdiagonal, has l's its first superdiagonal and so forth. N kk~- lI has a 1 in its (1, forth. Finally, N (1, k) k) element and has O's O's everywhere else, and kk N N = 0. O. Thus, the series expansion of e' e lN finite, i.e., is finite, t2 t k- I e IN =I+tN+-N 2 + ... + N k2! (k - I)!
I
o
o
t 1
o
Thus,
ell;
=
12
At
eAt
teAt
2I e
0
eAt
teAl
0
0
eAt
At
Ik-I
(k-I)!
12
2I e
e
At
teAl
0
0
eAt
In the case when A A.isiscomplex, complex,aareal realversion versionof ofthe theabove abovecan canbe beworked workedout. out.
116
Chapter Chapter 11. 11. Linear Linear Differential Differential and and Difference Difference Equations Equations
=[=i a = x-I =[ =[
Example and Example 11.9. 11.9. Let Let A A = [ ~_\ J]. Then Then A(A) A (A) = = {-2, {-2, -2} -2} and etA
Xe tJ
2
1
2 1
] exp t ] [
[
e~2t
-2 0
-~ ] [ -1
te- 2t e- 2t
][
-1
-1 2
]
]
-1 2
Interpolation Interpolation method method
This method is numerically unstable in finite-precision arithmetic but is quite effective effective for hand small-order problems. problems. The The method method is stated and and illustrated hand calculation calculation in in small-order is stated illustrated for for the the exponential function but applies equally well to other functions. functions. nxn tx A A € E E. jRnxn and /(A) f(A) = compute f(A) f(A) = = e' etA, fixed scalar. Given Given A and = eetA, , compute , where tt is a fixed Suppose the characteristic characteristic polynomial of t', of A can be written as n(A) n ( X ) = Yi?=i (A (^ -~~ Ai^i)"'» where the the A.,Ai s are distinct. Define
nr=1
n constants that are to be determined. They are, in fact, the unique where ao, OTQ, ... . . . , , an-l an-i are n solution of the n equations: g(k)(Ai) = f(k)(Ai);
k = 0, I, ... , ni - I,
i Em.
Here, the superscript kth derivative with respect to A. superscript (k) (&) denotes the fcth X. With the aiS a,s then known, function g = g(A). The motivation for this known, the the function g is is known known and and /(A) f(A) = g(A). The motivation for this method method is is the Cayley-Hamilton Theorem, Theorem 9.3, which says that all powers of A A greater than n -— 1 can be expressed Akk for kk = 0, I, expressed as linear combinations of A 1, ... . . . ,, n -— 1. 1. Thus, all the terms of order greater greater than nn -— 1 in the power series for ee't AA can be written in terms of these lower-order lower-order powers as well. The polynomial gg gives the appropriate linear combination. Example 11.10. Let Example 11.10. Let A
= [-~
o
-~0-1~ ]
and /(A) f(A) = etA. n(A) = -(A = 11 and nl{ = 3. etK. Then jr(A.) -(A. + + 1)3, I) 3 , so so m m= and n 2 Let g(X) g(A) = ao + alA aiS are given by — UQ a\X + a2A2. o^A. . Then the three equations for the a,s g(-I)
= f(-1) ==> ao - a l +a2 = e-
g'(-1) = f'(-1)
g"(-I)
= 1"(-1)
==> at - 2a2 = te- t , ==> 2a2 = t 2 e- t •
t
,
117
11.1. Differential Equations 11 .1. Differential Equations Solving for s, we ai s, we find find Solving for the the a,
Thus,
6]
~4 4i ff>\ TU^^ _/"i\ f\ i o\22 Example 11.11. 11.11. Let _* J] and = eO-. eatk. Then Then 7r(X) = (A 2) so somm= = 11and and Example Let A A = [[::::~ andt /(A) f(A) = rr(A) = (A ++ 2)2 «i = 2. nL 2. Let + ofiA.. equations for are given given by Let g(A.) g(A) = «o ao + aLA. Then Then the the defining defining equations for the the a,-s aiS are by
g(-2)
= f(-2) ==> ao -
g'(-2) = f'(-2)
==> al
2al
= te-
= e- 2t , 2t
.
Solving Solving for for the the a,s, aiS, we we find find ao = e- 2t aL =
+ 2te- 2t ,
te- 2t .
Thus, f(A)
= etA = g(A) = aoI + al A = (e- 2t
_ [ -
+ 2te- 2t )
e- 2t _
[
~ oI ] + te- 2t
[-4 4] -I
0
2te- 2t
-te- 2t
Other methods Other methods l 1 1. Use etA .c-I{(sI A)-I} 1. etA = = £~ {(sl -— A)^ } and techniques for inverse Laplace transforms. This is quite effective effective for small-order small-order problems, but general nonsymbolic computational techniques are numerically numerically unstable since the problem problem is theoretically equivalent equivalent to techniques knowing precisely a JCE JCF.
2. Use Pade approximation. There is an extensive literature on approximating certain nonlinear functions functions by rational rational functions. The matrix analogue yields eeAA ~ =
118 118
Chapter and Difference Chapter 11. 11. Linear Linear Differential Differential and Difference Equations Equations l
=
P
=
D-I(A)N(A), D~ (A)N(A), where where D(A) D(A) = 001 80I + olA Si A + H ... +hopAP SPA and and N(A) N(A) = vol v0I + + vIA vlA + + q
... • • • + Vq vq A A q..
Explicit Explicit formulas formulas are are known known for for the the coefficients coefficients of of the the numerator numerator and and denominator polynomials of various orders. Unfortunately, a Pad6 approximation denominator polynomials of various orders. Unfortunately, a Fade approximation for for the exponential exponential is is accurate only in in aa neighborhood neighborhood of the origin; origin; in matrix case the accurate only of the in the the matrix case this means means when this when IIAII || A|| isis sufficiently sufficiently small. small. This This can can be be arranged arranged by by scaling scaling A, A, say, say, by by 22' / AA { ] / 2 *)A \ * multiplying it by 1/2k for sufficiently large k and using the fact that = I /2')A )
e (e(
multiplying it by 1/2* for sufficiently large k and using the fact that e = ( e j . Numerical loss accuracy can in this procedure from the successive squarings. Numerical loss of of accuracy can occur occur in this procedure from the successive squarings.
3. Reduce Reduce A A to to (real) (real) Schur Schur form form S via the the unitary unitary similarity U and and use use eeAA = UUe e SsU UH H 3. S via similarity U and successive recursions recursions up the superdiagonals the (quasi) upper triangular triangular matrix and successive up the superdiagonals of of the (quasi) upper matrix e Ss.. e 4. Many methods are outlined outlined in, in, for example, [19]. Reliable and and efficient 4. Many methods are for example, [19]. Reliable efficient computation computation of matrix matrix functions and 10g(A) remains aa fertile area for of functions such such as as eeAA and log(A) remains fertile area for research. research.
11.2 11.2
Difference Difference Equations Equations
In this this section section we we outline of discrete-time discrete-time analogues analogues of of the the linear linear differential In outline solutions solutions of differential equations of of the the previous previous section. Linear discrete-time modeled by by systems of equations section. Linear discrete-time systems, systems, modeled systems of difference equations, many parallels parallels to to the the continuous-time continuous-time differential differential equation equation difference equations, exhibit exhibit many case, and and this this observation case, observation is is exploited exploited frequently. frequently.
11.2.1 11.2.1
Homogeneous difference equations Homogeneous linear linear difference equations
xn Theorem 11.12. Let Let A A E jRn xn.. The solution of ofthe the linear homogeneous system system of e Rn The solution linear homogeneous of difference difference equations equations (11.13)
for kk 2:: given by by for > 00 is is given
Proof: The proof is is almost almost immediate immediate upon upon substitution into (11.13). Proof: The proof substitution of of (11.14) (11.14) into (11.13).
0 D
Remark 11.13. Again, we we restrict attention only only to to the the so-called so-called time-invariant Remark 11.13. Again, restrict our our attention time-invariant case, where where the the matrix matrix A A in is constant constant and not depend depend on on k. k. We We could could also case, in (11.13) (11.13) is and does does not also consider ko, but but since the system is time-invariant, time-invariant, and and since we consider an an arbitrary arbitrary "initial "initial time" time" ko, since the system is since we want to to keep keep the the formulas formulas "clean" no double double subscripts), we have have chosen ko = = 0 0 for for want "clean" (i.e., (i.e., no subscripts), we chosen ko convenience. convenience.
11.2.2
Inhomogeneous difference equations Inhomogeneous linear linear difference equations
nxn nxm jRnxn,, B e E R jRnxm and suppose {«*}£§ {udt~ « is a given sequence of of Theorem 11.14. Let A eE R m-vectors. solution of m-vectors. Then Then the the solution of the the inhomogeneous inhomogeneous initial-value initial-value problem problem
(11.15)
11.2. 11.2. Difference Difference Equations Equations
119 119
is given by k-I
xk=AkXO+LAk-j-IBUj, k:::.O.
(11.16)
j=O
Proof: The Proof: The proof proof is is again again almost almost immediate immediate upon substitution substitution of of (11.16) (11.16) into into (11.15). (11.15).
11.2.3 11.2.3
0 D
Computation Computation of of matrix matrix powers powers
It clear that that solution of It is is clear solution of of linear linear systems systems of of difference difference equations equations involves involves computation computation of sometimes useful useful for hand Akk.. One solution method, which is numerically unstable but sometimes calculation, is to use z-transforms, z-transforms, by analogy with the use of Laplace transforms to compute aa matrix matrix exponential. exponential. One One definition definition of of the the z-transform z-transform of of aa sequence sequence {gk} is is +00
Z({gk}t~) = LgkZ-k. k=O
Assuming |z| Izl > Assuming > max max IAI, |A|,the thez-transform z-transformof ofthe thesequence sequence {Ak} {Ak} isisthen thengiven givenby by AEA(A) X€A(A)
+00
k "'kk 1 12 Z({A})=L...-z-A =I+-A+"2 A + ...
z
k=O
z
= (l-z-IA)-I = z(zI - A)-I.
based on the JCF are sometimes useful, again mostly for small-order Methods based small-order probxn lems. Assume that A eE M" jRnxn and and let X X e E jR~xn AX = /, J, where J is a R^n be such that X-I X~1AX JCF JCF for for A. A. Then Then Ak = (XJX-I)k = XJkX- 1
_I tA~X;y;H -
if A is diagonalizable,
m
H
LXi Jtyi
in general.
;=1
k If Akk via the formula A Akk = X Jk If A is diagonalizable, diagonalizable, it is then easy to compute A — XJ XX-Il since /* Jk is simply a diagonal matrix.
Chapter 11. 11. Linear Linear Differential Differential and and Difference Difference Equations Equations Chapter
120
In the general general case, case, the the problem problem again reduces to to the computation of the power power of In the again reduces the computation of the of aa pxp To be specific, let 7, Ji eE Cpxp be a Jordan block of the form Jordan block. To C
o ...
0
A
Writing J/,•i = AI + N and and noting noting that that XI AI and and the the nilpotent nilpotent matrix is Writing = XI matrix N commute, commute, it it is k then straightforward straightforward to apply the binomial theorem to (AI + N)k and verify that (XI N) verify
J/ =
Ak
kA k-I
k 2 (;)A -
0
Ak
kA k- 1
0
0
Ak
(
k ) Ak-P+I p-l
( ; ) Ak- 2
0
kA k - 1 Ak
0
The symbol (: ( )) has ,(^ ., and and is is to to be interpreted as if kk < < q. q. The symbol has the the usual usual definition definition of of q!(kk~q)! be interpreted as 0 0 if In the case when A. A isiscomplex, complex,aareal realversion versionof ofthe theabove abovecan canbe beworked workedout. out.
a [2
[=i
-4
Example 11.15. Let Example 11.15. Let A A = [_J Ak = XJkX-1 =
J]. Then Then
1
1 ] [(_2)k 1 0
_ [ (_2/- 1 (-2 - 2k) -
-k( _2)k-1
k(-2)kk(-2)
1
] [
1 -1 1
-2
1 ]
] k( -2l+ (-2l- 1(2k - 2) .
Basic analogues of other methods methods such as those mentioned in Section 11.1.6 11.1.6 can also be derived for for the the computation computation of of matrix matrix powers, but again again no universally "best" be derived powers, but no universally "best" method method exists. For an erudite discussion of the state of the art, see [11, [11, Ch. 18].
11.3 11.3
Higher-Order Equations Higher-Order Equations
differential equation can be converted to It is well known that a higher-order higher-order (scalar) linear differential a first-order linear system. Consider, for example, the initial-value initial-value problem (11.17)
with ¢J(t) 4>(t} a given function and n initial conditions y(O)
= Co,
y(O)
= CI,
... , in-I)(O)
= Cn-I'
(1l.l8)
121 121
Exercises
Here, the mth with Here, y(m) v (m) denotes denotes the mth derivative derivative of of yy with with respect respect to to t.t. Define Define aa vector vector xx (t) (?) Ee ]Rn R" with components Xl (t) = yyet), components *i(0 ( t ) , xX2(t) ( t ) , ... . . . ,, xXn(t) y { n ~ l ) ( t ) . Then Then 2(t) = yyet), n(t) = In-l)(t). Xl (I)
= X2(t) = y(t),
X2(t)
= X3(t) = yet),
Xn-l (t) Xn(t)
= Xn(t) =
y(n-l)(t),
= y(n)(t) = -aoy(t) -
aly(t) - ... - an_lln-l)(t)
= -aOx\ (t) - a\X2(t) - ... - an-lXn(t)
+ ¢(t)
+ ¢(t).
These rewritten as first-order linear system These equations equations can can then then be be rewritten as the the first-order linear system
0 0
0 0
0
1
x(t) =
x(t)+ [
0 0 -ao
0 -a\
1
n~(t)
(11.19)
-a n-\
r.
The initial initial conditions conditions take take the the form form X^(0) = Cc = [co, [CQ, Cl, c\,..., CnM-\_I] . The (0) = •.. , C Note that that det(A! det(X7 -— A) A) == A." an-\Xnn-~1l+H... +halA a\X++ao. ao.However, However,the thecompanion companion An ++an_1A Note matrix A in (11.19) (11.19) possesses many nasty nasty numerical numerical properties for even even moderately moderately sized sized nn matrix A in possesses many properties for and, as as mentioned mentioned before, before, is is often often well well worth worth avoiding, avoiding, at at least least for for computational computational purposes. purposes. and, A similar similar procedure procedure holds holds for for the the conversion conversion of of aa higher-order higher-order difference difference equation equation A
with n first-order difference with (vector) with n initial initial conditions, conditions, into into aa linear linear first-order difference equation equation with (vector) initial initial condition. condition.
EXERCISES EXERCISES nxn
p
1. Let Let P Rnxn be projection. Show Show that that eeP ~ % !/ + + 1.718P. 1.718P. 1. P E€ lR be aa projection. T 2. x, y y E A = Further, let that etA 2. Suppose Suppose x, € lR R"n and and let let A = xyT. xyT. Further, let aa = = xXT y.y. Show Show that e'A T 1+ a)xyT, where I + gget, ( t , a)xy , where
!(eat - I) g(t,a)= { a t
3. Let Let 3.
if a if a
1= 0, = O.
122 122
Chapter and Difference Chapter 11. 11. Linear Linear Differential Differential and Difference Equations nx where X eE M' jRmxn " is arbitrary. Show that
e = [eoI A
sinh 1 X ] ~I
.
4. Let Let K denote denote the the skew-symmetric matrix 4. skew-symmetric matrix
0 [ -In
In ] 0 '
2nx2n In denotes the n x n identity matrix. A A matrix A e E R jR2nx2n is said to be where /„ 1 T l T 1 K -I A ATK K = - A and to be be symplectic symplectic if K -I A ATK K = Hamiltonian if K~ = -A and to K~ - AA--I. .
(a) Suppose Suppose E H is is Hamiltonian Hamiltonian and and let let).. (a) A,be be an aneigenvalue eigenvalueof of H. H. Show Showthat that-).. —A,must must also be an an eigenvalue also be eigenvalue of of H. H. (b) is symplectic symplectic and let).. (b) Suppose Suppose SS is and let A.be bean aneigenvalue eigenvalueof ofS.S. Show Showthat that1/).. 1 /A,must must also also be an eigenValue eigenvalue of of S. H S must be (c) Suppose Suppose that H is Hamiltonian and and S is symplectic. symplectic. Show Show that S-I S~1HS Hamiltonian. Hamiltonian.
(d) (d) Suppose Suppose H is Hamiltonian. Show that eHH must be symplectic.
5. Let R and Let a, a, ft f3 € E lR and
Then show that Then show that ectt
cos f3t sin f3t
_eut
ectctrt
e
sin ~t cos/A
J.
6. Find Find aa general general expression expression for for
M 7. Find Find e etA when A = =
8.5. Let Let
(a) Solve the differential equation (a) Solve the differential equation
i
= Ax ;
x(O)
=[ ~
J.
Exercises Exercises
123
(b) equation (b) Solve Solve the the differential differential equation i
= Ax + b;
x(O)
=[
x(O)
= Xo
~
l
9. Consider Consider the the initial-value initial-value problem 9. problem i(t)
=
Ax(t);
for that for tt ~ > O. 0. Suppose Suppose that that A Ee ~nxn E"x" is is skew-symmetric skew-symmetric and and let let ex a == Ilxol12. \\XQ\\2. Show Show that ||*(OII2 = = ex aforallf > 0. I/X(t)1/2 for all t > O.
10. Consider Consider the the n matrix initial-value initial-value problem 10. n xx nn matrix problem X(t)
=
AX(t) - X(t)A;
X(O)
= c.
Show that the eigenvalues eigenvalues of of the solution XX((t) t ) of of this this problem are the the same same as as those those Show that the the solution problem are of all t. of C Cffor or all?. 11. there are three large Asia (A), (A), 11. The The year year is is 2004 2004 and and there are three large "free "free trade trade zones" zones" in in the the world: world: Asia Europe (E), (E), and and the Americas (R). (R). Suppose Suppose certain certain multinational companies have Europe the Americas multinational companies have total assets of $40 trillion $20 trillion is in in E and $20 $20 trillion is in in R. R. Each total assets of $40 trillion of of which which $20 trillion is E and trillion is Each year half half of of the Americas' money stays home, home, aa quarter quarter goes goes to to Europe, Europe, and and aa quarter quarter year the Americas' money stays goes to to Asia. Asia. For Europe and and Asia, Asia, half stays home and half goes to to the Americas. goes For Europe half stays home and half goes the Americas. (a) the matrix that gives gives (a) Find Find the matrix M M that
[ A] E
R
=M
[A] E
R
year k+1
year k
(b) Find the the eigenvalues (b) Find eigenvalues and and right right eigenvectors eigenvectors of of M. M. (c) Find the the distribution the companies' (c) Find distribution of of the companies' assets assets at at year year k. k. (d) Find the the limiting (d) Find limiting distribution distribution of of the the $40 $40 trillion trillion as as the the universe universe ends, ends, i.e., i.e., as as —»• +00 +00 (i.e., (i.e., around around the the time time the the Cubs Cubs win win aa World World Series). Series). kk ---* (Exercise adapted (Exercise adapted from from Problem Problem 5.3.11 5.3.11 in in [24].) [24].)
12. 12.
(a) Find the solution solution of of the the initial-value initial-value problem problem (a) Find the .Yet)
+ 2y(t) + yet) = 0;
yeO)
=
1, .YeO)
= O.
(b) the difference (b) Consider Consider the difference equation equation Zk+2
+ 2Zk+1 + Zk =
O.
If Zo £0 = = 11 and and ZI z\ = 2, is the value of of ZIOOO? ZIQOO? What What is is the value of of Zk Zk in If 2, what what is the value the value in general? general?
This page intentionally intentionally left left blank blank This page
Chapter 12 12 Chapter
Generalized Generalized Eigenvalue Eigenvalue Problems
12.1 12.1
The Generalized Problem The Generalized Eigenvalue/Eigenvector Eigenvalue/Eigenvector Problem
In generalized eigenvalue eigenvalue problem problem In this chapter we we consider the the generalized Ax = 'ABx, xn where e e C"nxn . . The The standard eigenvalue problem considered in Chapter 99 obviously where A, A, B B E standard eigenvalue problem considered in Chapter obviously corresponds to special case case that corresponds to the the special that B B = = I.I.
Definition 12.1. A nonzero vector x eE C" en is a right generalized Definition 12.1. A generalized eigenvector eigenvector of of the pair MX B) with A, B B eE e exists aa scalar scalar 'A. A eE e, generalized eigenvalue, (A, (A, B) with A, Cnxn " ifif there there exists C, called called aa generalized eigenvalue, such that that (12.1) Ax = 'ABx. Similarly, a nonzero vector y eE C" en is a left generalized eigenvector corresponding to an generalized eigenvector eigenvalue eigenvalue 'XA if if (12.2)
When the context is such that no confusion can arise, the adjective "generalized" "generalized" standard eigenvalue eigenvalue problem, if x [y] [y] is a right [left] is usually dropped. As with the standard ax [ay] [ay] for eigenvector, eigenvector, then so is ax for any any nonzero scalar aa. Ee
125 125
126 126
Chapter Chapter 12. 12. Generalized Generalized Eigenvalue Eigenvalue Problems Problems
Remark 12.5. If = I (or in general when B is nonsingular), nonsingular), then rr(A) n ( X ) is a polynomial Remark 12.5. If B = of degree n, and hence there are n eigenvalues associated with the pencil A -— XB. AB. However, eigenvalues However, when B B =II, in particular, when B is singular, there may be 0, k E !!, or infinitely many = I, B k e n, eigenvalues associated AB. For example, suppose associated with the pencil A -— XB. (12.3) where a and (3 ft are scalars. Then the characteristic polynomial is det(A - AB)
=
(I - AHa - (3A)
and there are several cases to consider. Case 1: aa ^ aretwo twoeigenvalues, eigenvalues, I1and and~.|. Case 1: =I- 0, 0, {3ft ^ =I- 0. O. There There are Case 2: 2: a = = 0, 0, {3 f3 =I/ O. 0. There There are are two eigenvalues, I1 and and O. 0. Case two eigenvalues, Case 3: a = = O. 0. There is only one eigenvalue, 1 Case =I- 0, f3 {3 = I (of multiplicity multiplicity 1). 1). Case 4: = 0, 0, f3 = 0. A Ee C C are are eigenvalues eigenvalues since since det(A det(A -— A.B) Case 4: aa = (3 = O. All All A AB) ===0. O. If del det(A AB) is not not identically zero, zero, the pencil pencil A — - XB AB is said to be Definition 12.6. 12.6. If (A -— XB) regular; is said to be singular. singular. regular; otherwise, it is Note that if AA(A) N(A) n n J\f(B) N(B) ^ =I- 0, the associated matrix pencil is singular singular (as in Case 4 above). Associated any matrix B is and corcorAssociated with with any matrix pencil pencil A -— XAB is aa reciprocal reciprocal pencil pencil B — - n,A /.LA and responding generalized eigenvalue problem. Clearly the reciprocal pencil has eigenvalues responding generalized /.L = (JL = £. It It is instructive to consider the reciprocal reciprocal pencil associated with the example in Remark 12.5. With A and B as in (12.3), the characteristic polynomial is
±.
det(B - /.LA) = (1 - /.L)({3 - a/.L) and there are again four cases to consider. Case 1: 1: a =I^ 0, ^ 0. are two two eigenvalues, eigenvalues, 1 and ~. ^. Case 0, {3ft =IO. There There are I and Case I). Case 2: a = = 0, {3ft =I^ O. 0. There is only one eigenvalue, I1 (of multiplicity 1). Case 3: ^ 0, = O. 0. There There are eigenvalues, 11 and Case 3: a =I0, f3 {3 = are two two eigenvalues, and 0. O. Case 4: = 0, 0, (3 (3 = = 0. 6C C are are eigenvalues eigenvalues since since det(B det(B -— /.LA) uA) == = 0. Case 4: a = O. All All A AE O. At least for the case of regular pencils, it is apparent where the "missing" "missing" eigenvalues have gone in Cases 2 and 3. That is to say, there is a second eigenvalue "at infinity" for Case 3 of of - A.B, AB, with its reciprocal reciprocal eigenvalue being 0 in Case Case 3 of of the reciprocal reciprocal pencil B — - /.LA. A— nA. A similar reciprocal reciprocal symmetry symmetry holds for Case Case 2. A similar holds for 2. While there are applications in system theory and control where singular pencils While appear, only the case of of regular regular pencils is considered considered in in the of this this chapter. chapter. Note Note appear, only the case pencils is the remainder remainder of AB always has that A and/or B may still be singular. If B is singular, the pencil pencil A -— KB
12.2. 12.2. Canonical Canonical Forms Forms
127
B is is nonsingular, nonsingular, the the pencil pencil A A --AAB has precisely precisely n fewer than than n eigenvalues. fewer eigenvalues. If B . f i always always has eigenvalues, since the eigenvalue problem easily seen to be eigenvalues, since the generalized generalized eigenvalue problem is is then then easily seen to be equivalent equivalent to the eigenvalue problem problem B~ B- 1lAx Ax = Xx Ax (or AB- 1lw W = Xw). AW). However, However, this this turns turns to the standard standard eigenvalue (or AB~ be aa very very poor poor numerical numerical procedure procedure for for handling handling the out to to be out the generalized generalized eigenvalue eigenvalue problem problem if B is is even even moderately moderately ill conditioned with with respect respect to to inversion. inversion. Numerical Numerical methods methods that that if ill conditioned A and and B are in standard standard textbooks textbooks on on numerical numerical linear linear algebra; algebra; work directly directly on on A work are discussed discussed in see, see, for for example, example, [7, [7, Sec. 7.7] 7.7] or [25, [25, Sec. Sec. 6.7]. 6.7].
12.2 12.2
Canonical Forms Canonical Forms
Just as for the the standard standard eigenvalue eigenvalue problem, problem, canonical forms are are available available for the generalized Just as for canonical forms for the generalized eigenvalue problem. Since the latter involves aa pair of matrices, matrices, we now deal with equivalencies rather rather than than similarities, the first first theorem theorem deals with what what happens happens to to eigenvalues lencies similarities, and and the deals with eigenvalues and eigenvectors under equivalence. and eigenvectors under equivalence.
c
nxn Let A, A, B, with Q and Z nonsingular. nonsingular. Then Theorem 12.7. 12.7. Let fl, Q, Q, Z eE Cnxn with Q and Then
1. the same two 1. the the eigenvalues eigenvalues of of the the problems problems A A — - XB AB and and QAZ QAZ — - XQBZ AQBZ are are the same (the (the two problems are said to problems to be equivalent). ifx isa A-AB, then Z~ Z-llxx isa righteigenvectorofQAZ-AQB 2. ifx is a right eigenvector of of A—XB, is a right eigenvector of QAZ—XQ B Z. Z. left eigenvector of -AB, then Q-H isa left lefteigenvectorofQAZ 3. ify ify isa is a left of A —KB, Q~Hyy isa eigenvector ofQAZ -AQBZ. — XQBZ. Proof: Proof: 1. det(QAZ - AQBZ) = = det[0(A det[Q(A -- XB)Z] AB)Z] = = det det Q det ZZdet(A det(A -- AB). 1. det(QAZ-XQBZ) gdet XB). Since Sincedet detQ0
and det det Z Z are nonzero, the the result result follows. and are nonzero, follows. l if and only if -AB)Z(Z-l 2. The The result result follows follows by bynoting notingthat that (A (A-AB)x –yB)x =- 0Oif andonly if Q(A Q(A-XB)Z(Z~ x)x) ==
o.0.
H 3. Again, the result follows easily by noting that yyH (A — - XB) AB) — 0 o ifif and and only if (A only if H H (Q-H O. 0 ( Q ~ yy)H ) QQ(A ( A –_X BAB)Z )Z = = Q. D
The first form is an analogue of Schur's Schur's Theorem and forms, forms, in fact, the the The first canonical canonical form is an analogue of Theorem and in fact, QZ algorithm, algorithm, which which is the generally preferred method method for for theoretical foundation foundation for for the the QZ theoretical is the generally preferred or [25, solving the the generalized eigenvalue problem; problem; see, for example, example, [7, solving generalized eigenvalue see, for [7, Sec. Sec. 7.7] 7.7] or [25, Sec. Sec. 6.7]. 6.7]. xn nxn Theorem 12.8. Let A, A, B B eE c Then there exist unitary matrices Q, cnxnxn such such that that 12.8. Let Cn .. Then there exist unitary matrices Q, Z eE Cn
QAZ = Ta ,
QBZ = TfJ ,
where are upper Taa and and Tp TfJ are upper triangular. triangular. where T By Theorem Theorem 12.7, the eigenvalues pencil A A— - XB AB are are then the ratios ratios of the diagBy 12.7, the eigenvalues ofthe of the pencil then the of the diagonal elements of to the the corresponding diagonal elements with the the understanding onal elements of Ta Ta to corresponding diagonal elements of of T Tp, understanding fJ , with to an infinite generalized generalized eigenvalue. that aa zero zero diagonal diagonal element that element of of TfJ Tp corresponds corresponds to an infinite eigenvalue. There is also also an an analogue of the Theorem for for real matrices. There is analogue of the Murnaghan-Wintner Murnaghan-Wintner Theorem real matrices.
Chapter 12. Chapter 12. Generalized Generalized Eigenvalue Eigenvalue Problems Problems
128
nxn xn Theorem 12.9. B eE R jRnxn.. Then there exist orthogonal matrices Q, Z e E R" jRnxn such 12.9. Let A, B thnt that
QAZ = S,
QBZ = T,
where T is upper triangular and S is quasi-upper-triangular. quasi-upper-triangular.
When S has a 2 x 2 diagonal block, the 2 x 2 subpencil formed fonned with the corresponding 2 x 2 diagonal diagonal subblock 2x2 subblock of T has a pair of complex conjugate eigenvalues. eigenvalues. Otherwise, real of S to corresponding eigenvalues are given as above by the ratios of diagonal elements of elements of T. T. There is also an analogue of the Jordan canonical form fonn called the Kronecker Kronecker canonical form (KCF). KCF, including analogues of form (KeF). A full description description of the KeF, of principal vectors and of so forth, is beyond the scope of this book. In this chapter, we present only statements of the basic theorems and some examples. The first theorem pertains only to "square" regular pencils, while the full KeF KCF in all its generality applies also to "rectangular" "rectangular" and singular pencils. nxn B eE C cnxn pencil A -— XB AB is regular. Then there Theorem 12.10. Let A, B and suppose the pencil x nxn exist nonsingular nonsingular matrices P, Q € E c C" "such suchthat that
peA - AB)Q =
[~ ~
] - A
[~ ~
l
form corresponding to the finite eigenvalues of of A -A.fi - AB and where J is a Jordan canonical canonical form nilpotent matrix matrix of ofJordan blocks associated associated with 0 and and corresponding to the infinite N is a nilpotent infinite eigenvalues of of A -— AB. XB.
Example 12.11. 12.11. The matrix pencil
[2oo I
0
0
2 0 0
o o
0 0 0
1 0 0 1 0 0
~ ]-> [~
0 I 0 0 0
0 0 0 0 0
o I
o 0
0] 0 0 0 0
2 (X — with characteristic polynomial (A - 2) 2)2 has a finite eigenvalue 2 2 of multiplicty 2 2 and three infinite eigenvalues. mxn Theorem 12.12 12.12 (Kronecker Canonical Form). Let A, B eE c Cmxn .• Then there exist mxm nxn mxm nxn nonsingular matrices P eE c nonsingular C and Q Q eE c C such that
peA - AB)Q
= diag(LII' ... , L l"
L~, ...• L;'. J - A.I, I - )"N),
12.2. Canonical Canonical Forms Forms 12.2.
129
where is nilpotent, nilpotent, both both N and JJ are in Jordan canonical form, is the the (k (k + + I) 1) xx kk Nand are in Jordan canonical form, and and L^ Lk is where N N is bidiagonal pencil bidiagonal pencil
-A
0
0
-A Lk
=
0
0
-A 0
0
I
The /( are called called the indices while the r, called the the right right minimal indices. The Ii are the left left minimal minimal indices while the ri are are called minimal indices. Left or right minimal indices can take the value O. Left 0. Example 12.13. Consider a 13 x 12 block diagonal matrix whose diagonal blocks are
-A 0] I
o
-A I
.
Such a matrix is in KCF. The first block of zeros actually corresponds Lo, LQ, Lo, LQ, Lo, LQ L6,, corresponds to LQ, LQ, L6, where each LQ Lo has "zero columns" and one row, while each LQ L6 has "zero rows" and one second block L\ while the L\. The next one column. The second block is L\ the third block block is is LInext two two blocks correspond correspond to 21 0 2
J =
[
o
0
while the nilpotent matrix N N in this example is
[ ~6~]. 000
Just as sets of eigenvectors eigenvectors span A-invariant subspaces in the case of the standard eigenproblem eigenproblem (recall Definition 9.35), there is an analogous geometric concept for the generalized eigenproblem. eigenproblem. generalized lxn Definition 12.14. Let A, B eE W ~nxn and suppose suppose the pencil pencil A -— XB AB is regular. Then V is a deflating deflating subspace subspace ifif
dim(AV
+ BV) =
dimV.
(12.4)
eigenvalue case, there is a matrix characterization characterization of deflating Just as in the standard eigenvalue xk subspace. Specifically, suppose S eE Rn* ~nxk is a matrix whose columns span a k-dimensional ^-dimensional subspace S S of ~n, Rn, i.e., i.e., n(S) R ( S ) = S.
(12.5)
130
Chapter 12. Generalized Eigenvalue Problems
If = /, (12.4) becomes dim(AV + V) V) == dim dimV, clearly equivalent equivalent to If B B = I, then then (12.4) becomes dim (A V + V, which which is is clearly to AV V. Similarly, Similarly, (12.5) as before. If the pencil pencil is AV c~ V. (12.5) becomes becomes AS AS = = SM SM as before. lEthe is not not regular, regular, there there is aa concept reducing subspace. is concept analogous analogous to to deflating deflating subspace subspace called called aa reducing subspace.
12.3 12.3
Application the Computation Computation of of System System Zeros Zeros Application to to the
Consider the linear svstem Consider the linear system i y
= Ax + Bu, = Cx + Du
nxn xm pxn pxm jRnxn,, B € E R" jRnxm,, C e E R jRPxn,, and jRPxm.. This with A €E M and D €E R This linear linear time-invariant statespace model model is control theory, is called called the state space is often often used used in in multivariable multivariable control theory, where where x(= x(= x(t)) x(t)) is the state vector, u u is the vector vector of controls, and is the the vector vector, is the of inputs inputs or or controls, and yy is vector of of outputs outputs or or observables. observables. For details, For details, see, see, for for example, example, [26]. [26]. In general, general, the the (finite) (finite) zeros of this system are given by the (finite) (finite) complex complex numbers In zeros of this system are given by the numbers where the the "system pencil" z, where "system pencil"
(12.6) drops rank. rank. In the special special case case p these values values are are the the generalized generalized eigenvalues the drops In the p = = m, m, these eigenvalues of of the (n + + m) (n + m) (n m) x x (n m) pencil. pencil.
Example 12.15. Let Example 12.15. Let A=[
-4
C
2
=
[I 2],
D=O.
Then the transfer (see [26]) [26)) of Then the transfer matrix matrix (see of this this system system is is
+ 14 ' + 3s + 2
55
g(5)=C(sI-A)-'B+D=
2 5
which clearly has aa zero zero at at -2.8. Checking the finite eigenvalues of the the pencil we which clearly has —2.8. Checking the finite eigenvalues of pencil (12.6), (12.6), we find find the the characteristic characteristic polynomial polynomial to to be be det [
A-c M DB] "'" 5A + 14,
which has root at -2.8. which has aa root at —2.8. The method method of of finding via aa generalized generalized eigenvalue problem also works The finding system system zeros zeros via eigenvalue problem also works well for for general multi-output systems. Numerically, however, however, one must be well general mUlti-input, multi-input, multi-output systems. Numerically, one must be careful first first to to "deflate out" the the infinite (infinite eigenvalues of (12.6». This is careful "deflate out" infinite zeros zeros (infinite eigenvalues of (12.6)). This is accomaccomplished computing aa certain certain unitary unitary equivalence equivalence on system pencil that then yields aa by computing on the the system pencil that then yields plished by smaller eigenvalue problem problem with with only only finite finite generalized generalized eigenvalues (the finite finite smaller generalized generalized eigenvalue eigenvalues (the zeros). zeros). The connection between system zeros zeros and and the system pencil is nonThe connection between system the corresponding corresponding system pencil is nonof aa single-input, single-input. trivial. However, However, we we offer some insight insight below below into the special case of trivial. offer some into the special case
12.4. Symmetric Generalized Eigenvalue Eigenvalue Problems Problems 12.4. Symmetric Generalized
131 131
1 lxn single-output system. Specifically, let B = bb E ffi.n, C = c T E ffi.l xn,, and D =d E R e Rn, e R and D e R. r ! T g(s) = cc (s7 (s I -— A)~ A) -1Z? b+ Furthermore, let g(.s) + dd denote the system transfer function function (matrix), and assume that gg(s) ( s ) can in the can be be written written in the form form and assume that
v(s) g(s) = n(s)'
polynomial of A, A, and v(s) relatively prime where n(s) TT(S) is the characteristic polynomial v(s) and n(s) TT(S) are relatively (i.e., there are no "pole/zero "pole/zero cancellations"). cancellations"). Suppose Zz E€ C is is such such that that Suppose [
A - zI cT
b ]
d
is singular. Then there exists a nonzero solution to
or or
+ by =
0,
(12.7)
c T x +dy = O.
(12.8)
(A - zl)x
A (i.e., no pole/zero pole/zero cancellations), then from (12.7) we Assuming z is not an eigenvalue of A get get x = -(A - zl)-lby. (12.9)
Substituting this (12.8), we have Substituting this in in (12.8), we have _c T (A - zl)-lby
+ dy =
0,
or ( z ) y = 00 by definition of of g. ^ 00 (else from (12.9)). or gg(z)y by the the definition g. Now Now _y y 1= (else xx = 00 from (12.9». Hence Hence g(z) g(z) = 0, 0, i.e., zz is a zero of g. g.
12.4 12.4
Symmetric Symmetric Generalized Generalized Eigenvalue Eigenvalue Problems Problems
A very important special case of the generalized eigenvalue problem Ax = ABx
(12.10)
nxn for A, A, B ffi.nxn arises when A A = A AT and B = BT the second-order B Ee R and B B1 > O. 0. For example, the system of differential differential equations
Mx+Kx=O, M is a symmetric positive definite K is a symmetric "stiffness where M definite "mass matrix" and K "stiffness matrix," is a frequently frequently employed model of structures or vibrating systems and yields a generalized eigenvalue problem ofthe of the form (12.10). Since B definite it Thus, the (12.10) is is equivalent Since B is is positive positive definite it is is nonsingular. nonsingular. Thus, the problem problem (12.10) equivalent Ax = AX. However, B~ B-11AA is not necessarily to the standard eigenvalue problem BB~l1Ax = AJC. symmetric.
132 132
Chapter 12. 12. Generalized Eigenvalue Problems Problems Chapter Generalized Eigenvalue
Example 12.16. Let Example 12.16. Let A A
= [~ ;
l = [i ~ J
B~Il = [-~ ~
ThenB~ AA Then
B
J
B~Il A A are always real (and are approximately approximately 2.1926 Nevertheless, the eigenvalues of B and -3.1926 in Example 12.16). nxn T T Theorem 12.17. Let A, A, B B E jRnxn with A A =A AT and B B = B BT > O. Then the eR and > 0. the generalized eigenvalue problem eigenvalue problem Ax = ABx
has n real eigenvalues, and the n corresponding right eigenvectors can be chosen to be orthogonal product (x, y) y)BB = X x TTBy. By. Moreover, Moreover, if orthogonal with respect to the inner product if A > > 0, 0, then the eigenvalues are also all positive. positive. Proof: Since B > 0, 0, it it has = LL Proof: Since B > has aa Cholesky Cholesky factorization factorization B B = LL TT,, where where L L is is nonsingular nonsingular (Theorem 10.23). Then the eigenvalue problem (Theorem 10.23). Then the eigenvalue problem Ax
= ABx = ALL Tx
can be rewritten as the equivalent problem (12.11) J 1 Letting C = AL ~T and zZ = = LLT x, (12.11) (12.11) can can then then be be rewritten rewritten as as = L ~I1AL and x,
(12.12)
Cz = AZ.
Since C C= =C CTT,, the n real corresponding eigeneigenthe eigenproblem eigenproblem (12.12) (12.12) has has n real eigenvalues, eigenvalues, with with corresponding Since vectors Z I, •.. , z Znn satisfying vectors zi,..., Zj = Dij.
zi
T
Then x, Xi = L ~Tzi, Zi, ii € E n, !!., are are eigenvectors eigenvectors of of the the original original generalized generalized eigenvalue eigenvalue problem problem and satisfy satisfy and (Xi, Xj)B
= xr BXj = (zi L ~l)(LLT)(L ~T Zj) = Dij.
T
Finally, > 0, CTT > > 0, 0, so so the eigenvalues are Finally, if if A A = = A AT> 0, then then C = = C the eigenvalues are positive. positive.
D 0
Example 12.18. The The Cholesky factor for for the B in in Example 12.16 is Example 12.18. Cholesky factor the matrix matrix B Example 12.16 is
L=[~ .,fi
1] . .,fi
Then it Then it is is easily easily checked checked thai that
c = L~lAL~T = [ 0..5 2.5
2..5 ] -1.5 '
-3.1926 as expected. whose eigenvalues are approximately 2.1926 and —3.1926 of this section can, The material material of can, of course, be generalized generalized easily to the case case where A A and since real-valued most applications, applications, and B are are Hermitian, Hermitian, but but since real-valued matrices matrices are are commonly commonly used used in in most we attention to to that we have have restricted restricted our our attention that case case only. only.
12.5. Simultaneous Simultaneous Diagonalization Diagonalization 12.5.
12.5 12.5
133
Simultaneous Simultaneous Diagonalization Diagonalization
Recall that many matrices be diagonalized diagonalized by by aa similarity. In particular, particular, normal maRecall that many matrices can can be similarity. In normal matrices can be by aa unitary unitary similarity. similarity. It It turns turns out in some some cases cases aa pair pair of trices can be diagonalized diagonalized by out that that in of matrices (A, be simultaneously diagonalized by by the the same matrix. There There are many matrices (A, B) B) can can be simultaneously diagonalized same matrix. are many such results and we present useful) theorem such results and we present only only aa representative representative (but (but important important and and useful) theorem here. here. Again, we we restrict our attention attention only only to the real case, with with the the complex complex case case following following in Again, restrict our to the real case, in aa straightforward way. straightforward way. x ][~nxn Theorem 12.19 12.19 (Simultaneous Reduction to Diagonal Form). Let A, B Ee E" " with T T A=A AT and and B B= B BT > > 0. O. Then Then there there exists exists a a nonsingular nonsingular matrix matrix Q Q such such that that A
where D D is is diagonal. Infact, diagonal elements D are are the eigenvalues of of BA. where diagonal. In fact, the the diagonal elements of of D the eigenvalues B 11A. T T Proof: Let B = LL LLT be the and set C = L~ L -I1AL -T. Proof: Let be the Cholesky Cholesky factorization factorization of of B and setC AL~ . Since Since T C p == D, C is is symmetric, symmetric, there there exists exists an an orthogonal orthogonal matrix matrix P P such such that that pTe P CP D, where where D D is is Let Q L - TTP. P. Then diagonal. diagonal. Let Q= = L~ Then
and and
= pT L -I(LLT)L -T P = pT P = [. T P pT 1 A = L -T = QQT AQQ-Il = LL -T L -I L -I1A A QQT AQQ~ PPTL~ A L~TL~
QT BQ Finally, Finally, since since QDQ-I QDQ~l have A(D) = A(B-11A). A). haveA(D) = A(B~
A, we we = BB~11A,
0 D
Note that that Q Q is general orthogonal, it does does not not preserve preserve eigenvalues Note is not not in in general orthogonal, so so it eigenvalues of of A and and B B individually. it does does preserve preserve the the eigenvalues A -—'AB. This can be seen directly. individually. However, However, it eigenvalues of of A XB. This can be seen directly. Let A = QT AQ and B = QT B- 1A = Q-1 B- 1l Q-T LetA QTAQandB QTBQ. Then Then/HA Q~l B~ Q~T QT QT AQ = Q-1 Q~1BB~1AQ. AQ. Theorem very useful useful for many statements pairs of symmetric Theorem 12.19 12.19 is is very for reducing reducing many statements about about pairs of symmetric matrices to diagonal case." typical. matrices to "the "the diagonal case." The The following following is is typical. xn nxn Theorem Let A, A, B B Ee lR be positive definite. definite. Then A > 2: B B ifif and and only 2: Theorem 12.20. 12.20. Let M" be positive Then A only if if BB~l1 > 1 A-I.. A-
Proof: By that QT AQ = D and BQ = Proof: By Theorem Theorem 12.19, 12.19, there there exists exists Q Q Ee lR~xn E"x" such such that QT AQ = D and QT QT BQ = [,I, where D 0 by by Theorem Theorem 10.31. A > 2: B, by Theorem where D is is diagonal. diagonal. Now Now D D >> 0 10.31. Also, Also, since since A B, by Theorem 10.21 have that AQ 2: BQ, i.e., i.e., D D 2: But then D-1I :::: trivially true true 10.21 we we have that QT QTAQ > QT QTBQ, > [. I. But then D" < [(this / (this is is trivially lI T T D- QT QT, A -Il :::: 0 since the two two matrices are diagonal). diagonal). Thus, since the matrices are Thus, Q QD~ Q ::::
\ 2.5.1 12.5.1
Simultaneous diagonalization diagonalization via via SVD SVD Simultaneous
T There are which forming L -I1AL~ AL -T as in in the proof of Theorem 12.19 is There are situations situations in in which forming C C = L~ as the proof of Theorem 12.19 is numerically when L L is respect to inversion. In In numerically problematic, problematic, e.g., e.g., when is highly highly iII ill conditioned conditioned with with respect to inversion. such cases, SVD. To illustrate, let let such cases, simultaneous simultaneous reduction reduction can can also also be be accomplished accomplished via via an an SVD. To illustrate.
134 134
Chapter 12. 12. Generalized Eigenvalue Problems Problems Chapter Generalized Eigenvalue
T us assume that and B LsLTB us assume that both both A A and B are are positive positive definite. definite. Further, Further, let let A A = = L LAL~ and B B — = LBL~ AL A and be Cholesky factorizations of A Cholesky factorizations A and B, B, respectively. respectively. Compute the SVD
(12.13)
i/
x where E " isisdiagonal. diagonal. Then Thenthe thematrix matrixQQ== LLBTu U performs performsthe thesimultaneous simultaneous L Ee R£ 1R~ xn diagonalization. To check this, note that
T QT AQ = U Li/(LAL~)Li/U = UTULVTVLTUTU
=
L2
while QT BQ = U T LB1(LBL~)Li/U = UTU
= I.
Remark without explicitly forming the Remark 12.21. The SVD in (12.13) can be computed without product or the inverse by using the so-called generalized singular value indicated matrix product decomposition (GSVD). Note that
LBB1 LAA can be found from the eigenvalue problem and thus the singular values of L
02.14) Letting see that (12.14) = XL = Letting xx = = LLBT Z we we see 02.14) can be rewritten rewritten in the the form L LAL~x ALBz B z ALAx = Bz = A L g L ^ L g 7 zz,, which ALBL~LBT which is is thus thus equivalent to to the the generalized generalized eigenvalue eigenvalue problem problem
02.15) The problem problem (12.15) is called a generalized generalized singular value problem and algorithms exist to solve it (and hence equivalently (12.13» LA (12.13)) via arithmetic operations performed only on LA T T and LB L L L see, for L B separately, i.e., i.e., without forming the products products L LA L ~ or L B L ~ explicitly; see, A A B B example, [7, Sec. Sec. 8.7.3]. This is analogous to finding the singular values of a matrix M by T operations performed performed directly on M rather than by forming MT forming the matrix M M and solving T the eigenproblem eigenproblem M MT MX M x = AX. Xx.
Remark 12.22. Various generalizations generalizations of the results results in Remark 12.21 12.21 are possible, for T example, when A A = = A AT::: O. The case when A A is symmetric but indefinite is not so > 0. T straightforward, at least in real arithmetic. For example, A can be written as A = PDP PDPT, , ~ ~ ~ ~ T P is orthogonal,butin writing A — = PDDP PDDp T = PD(PD) PD(PD{ with with where Disdiagonaland D is diagonal and P orthogonal, but in writing D D diagonal, D b may have pure imaginary elements.
12.6. Higher-Order Eigenvalue Problems Problems 12.6. Higher-Order Eigenvalue
12.6 12.6
135
Higher-Order Higher-Order Eigenvalue Eigenvalue Problems Problems
Consider second-order system equations Consider the the second-order system of of differential differential equations (12.16)
Mq+Cq+Kq=O,
1 xn q(t) e E W ~n and M, C, K e E Rn ~nxn.. Assume for simplicity that M is nonsingular. where q(t} Suppose, by analogy with the first-order case, that we try to find a solution of (12.16) of the = eeAtxt p,p, where the n-vector pp and scalar A. A are form q(t) q(t) = aretotobe bedetermined. determined. Substituting Substitutinginin (12.16) (12.16) we get
or, since eAt
:F 0, (A 2 M
+ AC + K) p
= O.
p, we thus seek seek values of A. A for which the matrix A. A22M AC + +K To get a nonzero solution /?, M+ + A.C is singular. singular. Since the determinantal determinantal equation is Since the equation
o = det(A 2 M + AC + K) = A2n + ... polynomial of degree 2rc, 2n, there are 2n eigenvalues for the second-order (or yields a polynomial A22M AC + quadratic) eigenvalue problem A. M+ + A.C + K. K. A special case of (12.16) arises frequently in in applications: applications: M = 0, and and A special case of (12.16) arises frequently = I, C = 0, T = K KT. Suppose K has eigenvalues K = . Suppose eigenvalues IL I
::: ... :::
ILr ::: 0 > ILr+ I
::: ... :::
ILn·
22 Let = I| ILk fjik I1!.2 • Then Then the the 2n 2n eigenvalues eigenvalues of of the the second-order second-order eigenvalue eigenvalue problem problem A A. K Let a>k Wk = I /+ K are are
± jWk; k = 1, ... , r, ± Wk; k = r + 1, ... , n. T If rr = n n (i.e., (i.e., K = K KT ::: 0), then then all all solutions of q q + Kq Kq = 0 0 are If > 0), solutions of are oscillatory. oscillatory.
12.6.1 12.6.1
Conversion form Conversion to to first-order first-order form
Let Let x\ XI = q q and and \i X2 = q. Then Then (12.16) (12.16) can can be be written written as as aa first-order first-order system system (with (with block block companion matrix)
.
X
=
[
0
-M-1K
2 x (t) €. E E ~2n. M is singular, or if it is desired to avoid the calculation of M M- lI because where x(t) ". If M M M is too ill conditioned with respect to inversion, the second-order second-order problem (12.16) can still generalized linear linear system be converted converted to the first-order generalized
I [ o
OJ'x = [0 -K
M
I -C
Jx.
136 136
Chapter Chapter 12. 12. Generalized Generalized Eigenvalue Eigenvalue Problems Problems
Many other first-order realizations are possible. Some Some can can be useful when M, C, C, andlor and/or K Many other first-order realizations are possible. be useful when M, K have special symmetry or skew-symmetry properties properties that can exploited. Higher-order analogues of (12.16) involving, naturally involving, say, the kth derivative derivative of q, q, lead naturally to eigenvalue problems problems that converted to form using using aaknxkn to higher-order higher-order eigenvalue that can can be be converted to first-order first-order form kn x kn block companion matrix analogue of of (11.19). (11.19). Similar Similar procedures general kthk\hprocedures hold hold for for the the general block companion matrix analogue order difference equation order difference equation
which various first-order systems of kn. which can can be be converted converted to to various first-order systems of dimension dimension kn.
EXERCISES EXERCISES xm 1. Suppose A eE R lRnnxxn" and D Ee lR::! finite generalized eigenvalues eigenvalues of of R™xm. . Show that the finite the pencil
[~ ~J-A[~ ~J are the eigenvalues of the matrix A — - BD B D- 11C. MX 2. Let Let F, G E€ e C nxn ". Show that that the the nonzero eigenvalues of of FG and the same. • Show nonzero eigenvalues and GF G F are are the same. proof' is to verify verify that the matrices Hint: An easy "trick "trick proof
[Fg
~]
and
[~
GOF ]
are similar via the similarity similarity transformation transformation are similar via the
e
e
nxm mx Let F F e E Cnxm ,, G Are the FG and the 3. Let G eE Cmxn ".• Are the nonzero singular values of FG and GF GF the same? same? nxn E R ]Rnxn,, B e E lR 4. Suppose A € Rnnxm *m, and and C eE lRmxn. E wx ".Show Showthat thatthe thegeneralized generalizedeigenvaleigenvalues of of the the pencils pencils ues
[~ ~J-A[~ ~J and and [ A
+ B~ + GC
~] _ A [~ ~]
1 are identical for all F F E Rm xn 6 E" *" and all G EG R" R"xmx m. . Hint: Consider the equivalence
B][IF0] [ 0I 1G][A-U CO l' (A similar similar result result is is also also true true for for "nonsquare" "nonsquare" pencils. In the the parlance of control control theory, (A pencils. In parlance of theory, such results show that zeros are invariant under state feedback or output injection.)
Exercises Exercises
137 137
diagonalization problems desired 5. Another Another family of simultaneous simultaneous diagonalization problems arises when it is desired operates on matrices A, B Ee that the simultaneous diagonalizing transformation Q operates nx T jRnxn Q-ll AQ~ AQ-T and QT BQ are simultaneously diagonal. Such ]R " in such a way that Q~ QTBQ a transformation transformation is called contragredient. contragredient. Consider the case where both A A and B are positive B positive definite with Cholesky Cholesky factorizations A = = L&L LA LTA~ and B = = L#Lg, L B L ~, T respectively, and let an SVD SVD of of L~LA' LTBLA. respectively, and let UW U~VT be be an
(a) Show that Q = LA V V~-! contragredient transformation that reduces both = LA £ ~ 5 is a contragredient A A and and B B to to the the same same diagonal diagonal matrix. matrix. T T (b) Show that Q~ Q-ll = ~-!UTL~. = ^~^U L B.
(c) Show that the eigenvalues of A AB B are the same as those of 1;2 E2 and hence are positive. positive.
This page intentionally intentionally left left blank blank This page
Chapter 13 Chapter 13
Kronecker Products Kronecker Products
13.1 13.1
Definition Definition and and Examples Examples
mxn the Kronecker Definition 13.1. Let A A e E R lRmx B e E lR Kronecker product product (or tensor Definition 13.1. Let ",, B Rpxq.. Then Then the (or tensor product) of of A A and and B B is defined as product) is defined as the the matrix matrix allB A@B= [
alnB ]
:
:
amlB
amnB
E
lRmpxnq.
(13.1)
Obviously, the definition holds holds if A and and B B are matrices. We We Obviously, the same same definition if A are complex-valued complex-valued matrices. restrict our attention in in this this chapter chapter primarily primarily to to real-valued real-valued matrices, matrices, pointing pointing out out the the restrict our attention extension to to the the complex only where where it it is not obvious. extension complex case case only is not obvious. Example 13.2. Example 13.2. 1. Let A
= [~
2 2
nand B
A@B
=[
= [;
3~
2B 2B
~J. Then
~]~U
4
3 4 3 4 9 4
2 6 2 6
Note that B AA i-/ AA@B. B. Note that B @
J.
6 6 2 2
n
X( pxq 2. e!F = [o 2. Foranyfl Forany B E lR 7,, //z2 <8>fl @ B = [~ ~ l\ In yields a block diagonal Replacing 12 I2 by /„ diagonal matrix with with nn copies copies of of B along along the the diagonal.
l
Let B B be be an arbitrary 22x2 x 2 matrix. matrix. Then 3. Let an arbitrary Then
B
@/z =
b~l
b"
139
0
b12
b ll
0
0
b2 2
b21
0
0 b12 0 b 22
l
140
Chapter 13. Kronecker Products Kronecker Products
The extension to arbitrary B B and /„ In is obvious. m x € E R ~m,, y e E !R.n. 4. Let Jt R". Then
X
®
Y = [ XIY T , ... , XmY T]T
=
[XIYJ, ... , XIYn, X2Yl, ... , xmYnf E !R.
mn
.
5. Let* eR m , y eR". Then
13.2 13.2
Properties the Kronecker Product Properties of of the Kronecker Product
mx rxi sxt Theorem 13.3. Let A e E R ~mxn, B Ee R ~rxs,, C C e E ~nxp, and D D e E R ~sxt.. Then 13.3. Let ", 5 R" x ^ and
~mrxpt).
(A 0 B)(C 0 D) = AC 0 BD (E
(13.2)
Proof: Simply verify Proof; Simply verify that that
L~=l al;kCkPBD
~[ =AC0BD.
]
L~=1 amkckpBD
0
Theorem 13.4. For Foral! all A and B, (A ® Bl = AT ® BT.
Proof' For the proof, simply verify Proof: verify using the definitions of transpose transpose and Kronecker Kronecker 0 product. D xn mxm Corollary 13.5. If A eE R" ]Rn xn and B E xm are are symmetric, then A® A ® B is symmetric. 13.5. If e !R. Rm
Theorem 13.6. If A and B Bare 13.6. If are nonsingular, (A ® B)-I =
A-I
® B- 1.
Proof: Using Theorem Proof: Theorem 13.3, 13.3, simply note that (A ® B)(A -1 ® B- 1 )
= 1 ® 1 = I.
0
13.2. Properties of Kronecker Product Product 13.2. Properties of the the Kronecker
141 141
xn mxm Theorem 13.7. 13.7. If am/ B are normal, is normal. normal. Theorem If A A Ee IR" IR nxn and B eR E IR mxm are normal, then then A® A0 B B is
Proof: Proof: (A 0 B{ (A 0 B) = (AT 0 BT)(A 0 B)
by Theorem 13.4
= AT A 0 BT B
by Theorem 13.3
= AAT 0 B BT
since A and B are normal
= (A 0 B)(A 0 B)T
by Theorem 13.3.
0
xn mxm Corollary 13.8. 13.8. If IR nxn orthogonal and IR mxm is 0 B is Corollary If A E € E" is orthogonal and B E eM 15 orthogonal, then then A is orthogonal. Sine] anddB Sin>] Th .,IS '1y seen Then it is easl easily seen that Example 13.9. B -= [Cos> E L et A A = [ _eose xamp Ie 139 .• Let sin e cose an _ sin> cos>O en It that A is orthogonal orthogonal with eigenvalues e±jO e±j9 and B is orthogonal orthogonal with eigenvalues eigenvalues e±j. The 4 x 4 ± (6>fJ > A® 0 5 B is then also orthogonal with eigenvalues e^'^+'W e±jeH» and ee±je -».\ matrix A ^ ~^
vI
mx Theorem 13.10. Lgf " have l/^E^Vj an^ /ef Theorem 13.10. Let A A EG E IR mxn have aa singular singular value value decomposition decomposition VA ~A and let pxq pxq IR singular value decomposition decomposition V B ~B VI. Then B E fi e^ have a singular UB^B^B-
yields of A (after aasimple simplereordering reorderingof ofthe thediagonal diagonal yields aa singular singular value value decomposition decomposition of A <8> 0 BB (after elements of O/£A <8>~B £5 and andthe thecorresponding correspondingright rightand andleft left singular singularvectors). vectors). ~A 0 elements q Corollary Corollary 13.11. Let A E e lR;"xn R™x" have singular singular values UI a\ :::: > ... • • • :::: > Uarr > > 0 and let B Ee IRfx have singular singular values values ... • • • :::: > T 0. Then Then A (or BB 0<8>A)A)has hasrsrssingular singularvalues values have > O. A ... • • • :::: > UffrrT 0 Qand rank(A 0 B)
= (rankA)(rankB) = rank(B 0
A) .
mmxw IR nnx xn"have xm have Theorem 13.12. Let A E e R haveeigenvalues eigenvaluesAi,A.,-,i / E e!!,n,and andletletBB E e IRR /zave eigenvalues € m. TTzen the ?/ze mn mn eigenvalues eigenvalues of of A® are eigenvalues jJij, JL j, 7j E m. Then A0 B Bare
Moreover, if x\, ...,, xxp are linearly independent right right eigenvectors eigenvectors of of A corresponding Moreover, if Xl, ••. linearly independent A corresponding p are AI, App (p (p < ::::: n), and and zi, ZI, ... independent right eigenvectors of of B to A - i ... , . . . ,, A. • • •,,Zq zq are linearly independent mnm"are corresponding to to JJL\ ...,,JLq \Juq (q (q < then ;c, <8>ZjZj E€ IR ffi. are linearly linearlyindependent independent right right corresponding JLI,, ... ::::: m), m), then Xi 0 eigenvectors of of A® to A.,/u, e l!! /?, 7j Ee 1· q. A0 B B corresponding corresponding to Ai JL j,7, ii E eigenvectors
Proof: The basic idea of the proof proof is as follows: Proof: follows: (A 0 B)(x 0 z) = Ax 0 Bz
=AX 0
JLZ
= AJL(X 0 z).
0
If and Bare B are diagonalizable diagonalizable in in Theorem Theorem 13.12, 13.12, we can take n and q —m If A A and we can take p p = nand q = m and and thus get the <8>B. B. InIngeneral, general,ififAAand and Bfi have haveJordan Jordan form form thus get the complete complete eigenstructure eigenstructure of of A A0
142 142
Chapter 1 13. Chapter 3. Kronecker Kronecker Products Products
decompositions given given by by P~ p-lI AP AP = JJA and Q-l BQ = JB, J B , respectively, respectively, then then we we get the decompositions Q~] BQ get the A and following Jordan-like Jordan-like structure: following structure: (P ® Q)-I(A ® B)(P ® Q) = (P- I ® Q-l)(A ® B)(P ® Q) = (P- 1 AP) ® (Q-l BQ)
= JA ® JB · Note that that JA® h ® JB, JR, while while upper upper triangular, triangular, is generally not not quite quite in and needs Note is generally in Jordan Jordan form form and needs further reduction reduction (to (to an ultimate Jordan form that that also depends on on whether whether or or not not certain further an ultimate Jordan form also depends certain eigenvalues are are zero zero or or nonzero). eigenvalues nonzero). A Schur Schur form form for for A ®B B can can be derived similarly. suppose P A ® be derived similarly. For For example, example, suppose P and and Q are i.e., are unitary unitary matrices matrices that that reduce reduce A A and and B, 5, respectively, respectively, to to Schur Schur (triangular) (triangular) form, form, i.e., H H pH AP = = T TAA and and Q QH BQ = = T TBB (and (and similarly similarly if if P and and Q are are orthogonal orthogonal similarities similarities P AP BQ reducing Schur form). Then reducing A A and and B B to to real real Schur form). Then (P ® Q)H (A ® B)(P ® Q) = (pH ® QH)(A ® B)(P ® Q)
= (pH AP) ® (QH BQ) = TA ® TR . IRnnxn xn and B e E R IR rnmxm xm.. Then Corollary 13.13. 13.13. Let A eE R 1. Tr(A ® B) = (TrA)(TrB) = Tr(B ® A). 2. det(A ® B) = (det A)m(det Bt = det(B ® A). mxm Definition 13.14. IR nnxn Xn and B e E R IRm xrn.. Then the Kronecker Kronecker sum (or tensor sum) Definition 13.14. Let A eE R of A and B, B, denoted is the (Im A)++ (B (B ®®In). /„).Note Note that, that,inin of A and denoted A A © EEl B, B, is the mn mn x mn mn matrix matrix Urn ® A) general, ^ B B© general, A A® EEl B B i= EEl A. A.
Example Example 13.15. 13.15. 1. 1. Let Let
A~U
2 2
!]andB~[ ; ~l
Then Then
3 AfflB = (h®A)+(B®h) =
1
2 2 1
3 0 1 0 4 0
0 0 0 0 0 0 0 0 0
3
0 0 0 2 2
0 0 0 3 4
+
2 0
0
0
2
0 0
0 0
0
2
2
0 0 2 0 0 2
0 0 0
1
0 0
0 3 0 0 0 3 0 0 0 3
The B0 (A 0h) /2)and andnote notethe thedifference difference The reader reader is is invited invited to to compute compute B EEl A A = = (/3 (h ® ® B) B) + (A with B. with A A © EEl B.
13.2. Properties Kronecker Product Product 13.2. Properties of of the the Kronecker
143 143
2. Recall Recall the the real real JCF JCF 2. M
I
0
o
M
I
0
0
M
1=
E jR2kx2k,
a f3 -f3 a
M
I
J. Define 0 0
Ek
0
o M
0 where M == [ where M
I
o
0 0
=
o o
0
Then can be be written in the (I} <8> ® M) ® h) = Then 1J can written in the very very compact compact form form 1J = (4 M)+ +(Ek (E^®l2) =M M$0 EEk. k. x mx Theorem 13.16. Let A Ee E" jRnxn jRmxm " have eigenvalues eigenvalues Ai, A,-,ii Ee !!. n, and let B Ee R '" have eigenvalues ra. Then TTzen the r/ze Kronecker sum A® B = (1m (Im (g> A)++ (B (B®In)/„)has /za^ fJ-j, j eE I!!. Kronecker sum A$ B ® A) mnran eigenvalues /z ;, 7 eigenvalues e/genva/wes
Al
+ fJ-t, ... , AI + fJ-m, A2 + fJ-t,···, A2 + fJ-m, ... , An + fJ-m'
Moreover, if x\,... linearly independent independent right right eigenvectors corresponding Moreover, if XI, .•• ,x , xp are linearly eigenvectors of of A A corresponding p are to AI, AI, ... . . . ,, X App (p (p ::s: < n), and and z\, ZI, ... ..., , Zq zq are are linearly linearly independent independent right eigenvectors eigenvectors of of B corresponding fJ-qq (q m), then Zj Zj ® corresponding to fJ-t, f j i \ , ... . . . ,, f^ (q ::s: < ra), <8>XiXiE€ jRmn W1" are arelinearly linearly independent independent right right eigenvectors of A® corresponding to A.,+ + [ij, € E, p, jj Ee fl· q. A$ B B corresponding to Ai fJ-j' ii E eigenvectors of Proof: The basic idea the proof Proof: The basic idea of of the proof is is as as follows: follows: [(1m ® A)
+ (B
= (Z
+ (Bz ® X) ® Ax) + (fJ-Z ® X)
=
+ fJ-)(Z ® X).
® In)](Z ® X) = (Z ® Ax) (A
0
If A A and Bare we can nand and If and B are diagonalizable diagonalizable in in Theorem Theorem 13.16, 13.16, we can take take p p =n and qq = m and thus get get the the complete complete eigenstructure eigenstructure of of A A 0 $ B. In In general, general, if if A A and and B have have Jordan Jordan form thus form p-I1AP = lA Q-t1 BQ BQ = JB, l B , respectively, respectively, then decompositions decompositions given given by P~ JA and and Q" then [(Q ® In)(lm ® p)rt[(lm ® A)
+ (B ® In)][CQ ® In)(lm ® P)] + (B ® In)][(Q ® In)(/m ®
= [(1m ® p)-I(Q ® In)-I][(lm ® A)
= [(1m ® p-I)(Q-I ® In)][(lm ® A)
= (1m ® lA)
+ (JB ® In)
is Jordan-like structure structure for A $© B. is aa Jordan-like for A B.
+ (B ®
P)]
In)][CQ ® In)(/m <:9 P)]
Chapter 13. 13. Kronecker Kronecker Products Products Chapter
144
A Schur Schur form fonn for for A A© EB B B can be derived derived similarly. Again, suppose P and unitary A can be similarly. Again, suppose P and Q are are unitary H fonn, i.e., pH AP = TAA matrices that that reduce reduce A and B, respectively, to to Schur Schur (triangular) form, i.e., P AP = T and QH QHBQ TB (and similarly if orthogonal similarities similarities reducing and B and BQ = = TB (and similarly if P P and and Q Q are are orthogonal reducing A A and B to real real Schur Schur fonn). to form). Then Then
((Q ® /„)(/« ® P)]"[(/m <8> A) + (B ® /B)][(e (g) /„)(/„, ® P)] = (/m <8> rA) + (7* (g) /„), [(Q <8> ® In)(lm where [(Q /„)(/«®®P)] P)] = = (Q (<2®®P) P) isisunitary unitaryby byTheorem Theorem 13.3 13.3and andCorollary Corollary 13.8. 13.8.
13.3 13.3
Application to Sylvester Sylvester and Lyapunov Equations Application to and Lyapunov Equations
In study the linear matrix In this this section section we we study the linear matrix equation equation (13.3)
AX+XB=C,
x mxm xm IRnxn IRmxm IRnxm. now often Sylvester where A eE R" ", , B eE R ,, and C eE M" . This equation equation is is now often called a Sylvester equation of 1.1. J.J. Sylvester Sylvester who studied general general linear linear matrix of the equation in in honor honor of who studied matrix equations equations of the form fonn k
LA;XB; =C. ;=1
A special case of (13.3) is the symmetric equation AX +XAT = C
(13.4)
T obtained by taking taking B B = AT. When C is symmetric, IRnx"xn is easily shown =A . When symmetric, the solution solution X E eW also to to be be symmetric is known as aa Lyapunov Lyapunov equation. also symmetric and and (13.4) (13.4) is known as equation. Lyapunovequations Lyapunov equations arise arise naturally naturally in in stability stability theory. theory. The first important question ask regarding (13.3) is, The first important question to to ask regarding (13.3) is, When When does does aa solution solution exist? exist? By writing writing the matrices in (13.3) (13.3) in in tenns terms of of their their columns, it is easily easily seen seen by equating the z'th ith columns columns that that
m
AXi
+ Xb; = C; = AXi + l:~>j;Xj. j=1
These equations as the These equations can can then then be be rewritten rewritten as the mn x x mn linear linear system system b 21 1
A+blll bl21
A
+ b 2Z 1
(13.5)
[ blml
b2ml
The in (13.5) (13.5) clearly as the sum (1m (Im 0* A) + The coefficient coefficient matrix matrix in clearly can can be be written written as the Kronecker Kronecker sum A) + (BTT 0® /„). very helpful in completing the writing (B In). The The following following definition definition is is very helpful in completing the writing of of (13.5) (13.5) as as an "ordinary" "ordinary" linear an linear system. system.
13.3. Application Lyapunov Equations Equations 13.3. Application to to Sylvester Sylvester and and Lyapunov
145 145
n nxm Definition 13.17. E E. jRn denote the ofC E jRnxm so that C = [ [CI, ]. Definition 13.17. Let Ci c( € the columns ofC e R n , ... . . . ,, Ccm}. Then vec(C) is defined to be the mn-vector formed by stacking the columns ofC on top of by C
::~~::~: ::d~~:::O:[]::::fonned
"ocking the colunuu of on top of
one another, i.e., vec(C) = Using Definition 13.17, 13.17, the can be Using Definition the linear linear system system (13.5) (13.5) can be rewritten rewritten in in the the form form [(1m ® A)
+ (B T
(13.6)
® In)]vec(X) = vec(C).
There exists aa unique and only + (B (BTT ® if and only if if [(I [(1m ® A) A) + ® /„)] In)] is is nonsingular. nonsingular. There exists unique solution solution to to (13.6) (13.6) if m ® T T (g) /„)] is nonsingular if and only if it has no zero eigenvalues. But [(I <8> A) + (B But [(1m ® A) + (B ® In)] is nonsingular if and only if it has no zero eigenvalues. m From A) ++ (BT (BT ®<8>In)] /„)] where From Theorem Theorem 13.16, 13.16, the the eigenvalues eigenvalues of of [(/ [(1mm ® A) areareAi A., ++Mj,IJLJ,where A,,(A), ii eE!!, n_,and andMj ^j Ee A(B), A(fi),j j E!!!.. e m.We Wethus thushave havethe thefollowing followingtheorem. theorem. Ai eE A A(A), mxm xm Theorem 13.1S. jRmxm,, and C e E R" jRnxm.. Then Theorem 13.18. Let A eE lR Rnxn,, B E GR 77ie/i the Sylvester equation
(13.7)
AX+XB=C
has aa unique if and only ifif A and —B have no eigenvalues in has unique solution solution if and only A and - B have no eigenvalues in common. common. Sylvester equations equations of the form (13.3) (or (or symmetric equations of of the the form form Sylvester of the form (13.3) symmetric Lyapunov Lyapunov equations (13.4)) are generally generally not mn "vee" "vec"formulation formulation(13.6). (13.6). The Themost most (13.4» are not solved solved using using the the mn mn x x mn commonly preferred in [2]. [2]. First First A to commonly preferred numerical numerical algorithm algorithm is is described described in A and and B B are are reduced reduced to (real) Schur Schur form. (real) form. An equivalent equivalent linear linear system system is is then then solved solved in in which which the the triangular triangular form form of the can be for the of aa suitably of the reduced reduced A and and B can be exploited exploited to to solve solve successively successively for the columns columns of suitably 3 transformed solution matrix say, n only 0O(n transformed solution matrix X. X. Assuming Assuming that, that, say, n > :::: m, m, this this algorithm algorithm takes takes only (n 3)) 66 operations rather than than the that would would be be required required by by solving (13.6) directly with operations rather the O(n O(n )) that solving (13.6) directly with Gaussian elimination. A further enhancement to is available available in in [6] [6] whereby Gaussian elimination. A further enhancement to this this algorithm algorithm is whereby the only to triangular the larger larger of of A A or or B B is is initially initially reduced reduced only to upper upper Hessenberg Hessenberg rather rather than than triangular Schur form. Schur form. The next 13.24, one one of The next few few theorems theorems are are classical. classical. They They culminate culminate in in Theorem Theorem 13.24, of many many elegant connections stability theory differential equations. equations. elegant connections between between matrix matrix theory theory and and stability theory for for differential mxm nxm Theorem jRmxm,, and C jRnxm.. Suppose Suppose further further that A and B Theorem 13.19. Let A eE jRnxn, Rnxn, B eE R C eE R are asymptotically stable (a (a matrix all its asymptotically stable matrix is is asymptotically asymptotically stable stable ifif all its eigenvalues eigenvalues have have real real are parts of the the Sylvester Sylvester equation equation parts in in the the open open left left half-plane). half-plane). Then Then the the (unique) (unique) solution solution of
(13.8)
AX+XB=C
can as can be be written written as (13.9)
Proof: are stable, (A)+ + Aj(B) A;-(B) =I^ 00 for for all alli,i, j j so sothere there exists exists aaunique unique Proof: Since Since A A and and B B are stable, A., Aj(A) solution 13.18. Now equation X XB solution to(13.8)by to (13.8) by Theorem Theorem 13.18. Now integrate integrate the the differential differential equation X = AX AX + X B (with X(0) X(O) = C) C) on on [0, (with [0, +00): +00): lim XU) - X(O) = A
I-Hoo
roo X(t)dt + ([+00 X(t)dt)
10
10
B.
(13.10)
146 146
Chapter Chapter 13. 13. Kronecker Kronecker Products Products
Using the results results of Section 11.1.6, it can be shown easily that lim elA = lim elB = = O.0. 1-->+00 1 .... +00 r—>+oo t—v+oo lB from Theorem = elACe O. Hence, using the solution XX((t) t) = etACetB Theorem 11.6, we have that lim XX((t) t) = — 0. t~+x /—<-+3C
Substituting in (13.10) we have -C
and so so X and X
=
{+oo
-1o
=
A
(1+
00
elACe lB dt)
+
elACe lB dt satisfies (13.8).
(1+
00
elACe lB dt) B
o
Remark AX + Remark 13.20. An equivalent condition for the existence of a unique solution to AX + XB = C is is that that [~ [ J __CcBfi ]] be similar to ] (via the similarity [ J _* ]). XB = be similar to [[~J _° _OB] (via the similarity [~ _~ ]). B x Let A, C E jRnxn. Theorem 13.21. Lef e R" ". Then TTzen the r/ze Lyapunov equation
AX+XAT
=C
(13.11)
has and only A TT have no eigenvalues has a unique unique solution if if and only if if A and and -—A eigenvalues in in common. common. If If C is symmetric and 13.11) has unique solution, solution, then that solution solution is is symmetric. symmetric and ((13.11) has aa unique then that symmetric. xn T If the matrix matrix A A E jRn xn has eigenvalues eigenvalues A.I )"" ,...,!„, ... , An, then -— A AT Remark 13.22. If e W has eigeneigenT values -AI, - An. Thus, a sufficient condition A and A T —A.], ... . . . ,, —k . sufficient that guarantees that A — A have n common eigenvalues eigenvalues is that A A be asymptotically asymptotically stable. Many useful results exist conno common cerning the relationship between stability and Lyapunov equations. Two basic results due to Lyapunov are the following, the first of which follows immediately from Theorem Theorem 13.19. 13.19. x Theorem 13.23. Let A,C A, C E jRnxn further that A is asymptotically stable. e R" " and suppose further asymptotically stable. Then the (unique) solution o/the of the Lyapunov equation
AX+XAT=C
can be written as can be written as
(13.12) x Theorem 13.24. A matrix A E jRnxn asymptotically stable if only if e R" " is asymptotically if and only if there exists a positive definite definite solution solution to to the the Lyapunov Lyapunov equation positive equation
AX +XAT = C,
(13.13)
where where C C -= C T < O. Proof: Suppose A is asymptotically asymptotically stable. By Theorems 13.21 l3.21 and 13.23 l3.23 a solution Proof: solution to (13.13) exists and takes the form (13.12). Now let vv be an arbitrary nonzero vector in jRn. E". Then Then
13.3. Application Sylvester and and Lyapunov Lyapunov Equations Equations 13.3. Application to to Sylvester
147 147
Since -C —C > > 00 and and etA etA is all t, the is positive. Hence Since is nonsingular nonsingular for for all the integrand integrand above above is positive. Hence T T > 00 and and thus thus X X is is positive positive definite. vv Xv Xv > definite. T XT > and let A(A) with corresponding Conversely, Conversely, suppose suppose X X = = X > 00 and let A A. Ee A (A) with corresponding left left eigeneigenvector vector y. y. Then Then 0> yHCy
=
yH AXy
= (A
+ yHXAT Y
+ I)yH Xy.
H Since yyH Xy > 0, 0, we + IA == 22 Re R eAA << 0O.. Since Since A A was Since Xy > we must must have have A A+ was arbitrary, arbitrary, A A must must be be asymptotically stable. D asymptotically stable. D
Remark 13.25. Lyapunov equation AX + XA X ATT = = C can also written using using the the Remark 13.25. The The Lyapunov equation AX C can also be be written vec in the vec notation notation in the equivalent equivalent form form [(/ ® A)
+ (A ® l)]vec(X) = vec(C).
X A = C. A subtle when dealing A TTXX + XA A subtle point point arises arises when dealing with with the the "dual" "dual" Lyapunov Lyapunov equation equation A C. The equivalent equivalent "vec "vec form" of this is The form" of this equation equation is [(/ ® AT)
+ (AT ® l)]vec(X) =
vec(C).
However, the the complex-valued XA = is equivalent to However, complex-valued equation equation AHHXX + XA =C C is equivalent to [(/ ® AH)
+ (AT ® l)]vec(X) =
vec(C).
The vec operator has most of of which which derive derive from from one one key key The vec operator has many many useful useful properties, properties, most result. result. Theorem 13.26. 13.26. For any three and C Theorem For any three matrices matrices A, A, B, B, and C for for which which the the matrix matrix product product ABC ABC is is defined, defined, vec(ABC) = (C T ® A)vec(B). Proof: The proof follows in in aa fairly fairly straightforward straightforward fashion fashion either either directly directly from the definidefiniProof: The proof follows from the the fact fact that tions or tions or from from the that vec(xyT) vec(;t;yr) = = y® <8>x.x. D D An application is existence and and uniqueness conditions of existence uniqueness conditions An immediate immediate application is to to the the derivation derivation of for the of the simple Sylvester-like Sylvester-like equation equation introduced introduced in in Theorem Theorem 6.11. 6.11. for the solution solution of the simple mxn px(} mxq Theorem 13.27. jRrnxn,, B B E jRPxq, jRrnxq.. Then the 13.27. Let A Ee R eR , and C Ee R the equation
AXB =C
(13.14)
nxp + jRn x p if A A++CB C B+ has has aa solution X eE R. if and only only if ifAA BB = C, C, in in which which case the the general solution solution is of the the form form is of (13.15) nxp + + jRnxp is of (13. 14) is BB+ ®A A+ where where Y Y eE R is arbitrary. arbitrary. The The solution of (13.14) is unique unique if if BB ® AA = = [. I.
Proof: (13.14) as as Proof: Write Write (13.14) (B T ® A)vec(X) = vec(C)
(13.16)
148 148
Chapter 3. Kronecker Chapter 113. Kronecker Products Products
by Theorem if by Theorem 13.26. 13.26. This This "vector "vector equation" equation" has has aa solution solution if if and and only only if (B T ® A)(B T ® A)+ vec(C)
= vec(C).
+
+
+
It that (M It is is aa straightforward straightforward exercise exercise to to show show that (M ® ® N) N) + = = M+ M ® <8>N+. N . Thus, Thus,(13.16) (13.16)has hasaa
if solution solution if if and and only only if vec(C)
=
(B T ® A)«B+{ ® A+)vec(C)
= [(B+ B{ ® AA+]vec(C) = vec(AA +C B+ B) + + and hence if C B+ B = C. and hence if and and only only if if AA AA+ CB B C. The general solution of (13 .16) by The general solution of (13.16) is is then then given given by
vec(X) = (B T ® A) + vec(C)
+ [I -
(B T ® A) + (B T ® A)]vec(Y),
where be rewritten the form where YY is is arbitrary. arbitrary. This This equation equation can can then then be rewritten in in the form vec(X)
= «B+{
® A+)vec(C)
+ [I
- (BB+{ ® A+ A]vec(y)
or, or, using using Theorem Theorem 13.26, 13.26,
+ ® The The solution solution is is clearly clearly unique unique if if B BBB+ <8>AA++A A ==I.I.
0D
EXERCISES EXERCISES I. A and 1. For For any any two two matrices matrices A and B B for for which which the the indicated indicated matrix matrix product product is is defined, defined, xn (vec(B» == Tr(A lR nxn show show that that (vec(A»T (vec(A)) r (vec(fl)) Tr(ATr B). £). In In particular, particular, if if B B Ee Rn ,, then then Tr(B) Tr(fl) == r vec(/J vec(fl). vec(Inl vec(B). 2. matrices A A and 2. Prove Prove that that for for all all matrices and B, B, (A (A ® ® B)+ B)+ = = A+ A+ ® ® B+. B+.
3. Show Show that that the the equation equation AX B == C C has has aa solution solution for for all all C C if if A full row row rank and 3. AX B A has has full rank and B has full full column column rank. rank. Also, Also, show show that that aa solution, solution, if it exists, exists, is unique if if A A has has full B has if it is unique full column rank and row rank. column rank and B B has has full full row rank. What What is is the the solution solution in in this this case? case? 4. Show Show that that the general linear linear equation 4. the general equation k
LAiXBi =C i=1
can can be be written written in in the the form form [BT ® AI
+ ... + B[ ® Ak]vec(X) =
vec(C).
Exercises Exercises
149 149
T 5. Let x E ]Rn. Show that * x rT ® T •. € ]Rm Mm and y Ee E". <8>yy==y Xyx
6. Let A e R" xn and £ e M m x m . (a) Show that ||A ® <8>BII2 B||2 = = IIAII2I1Blb. ||A||2||£||2. (a) Show that IIA (b) A ® II FF in terms of the Frobenius norms of A and your (b) What What is is II||A ®B B\\ in terms of the Frobenius norms of A and B? B? Justify Justify your answer carefully. carefully. answer (c) What is the spectral radius of A ® of A <8>BBininterms termsof ofthe thespectral spectralradii radiiof ofAAand and B? B? Justify your answer carefully. carefully. Justify your answer x 7. Let A, B eR" E ]Rnxn. 7. Let A, 5 ".
A)k = = I/ ® l ==BkBfc ®®I /forforallallintegers (a) Show that (l (/ ® A)* <8>Ak A*and and(B (fl®I /)* integersk.&. l A A 5 7 B A and eB®1 = e B ® I. (b) Show that el®A = I ® e e®
(c) Show that the ® AAand (c) Show that the matrices matrices /I (8) andBB®®I /commute. commute. (d) that (d) Show Show that e AEIlB
= eU®A)+(B®l) = e B ® e A .
(Note: This result would would look look aa little little "nicer" "nicer" had defined our our Kronecker Kronecker (Note: This result had we we defined sum the other way around. However, Definition 13.14 13.14 is conventional in the literature.) 8. Consider the Lyapunov matrix equation (13.11) with
[~ _~ ]
A = and C the symmetric and C the symmetric matrix matrix
[~
Clearly Clearly
Xs
=
[~ ~ ]
is the equation. Verify that that is aa symmetric symmetric solution solution of of the equation. Verify
Xns =
[_~ ~
]
is also aa solution solution and and is is nonsymmetric. in light light of of Theorem Theorem 13.21. 13.21. is also nonsymmetric. Explain Explain in 9. Block 9. Block Triangularization: Triangularization: Let Let
xn A eE Rn ]Rn xn and D E xm.. It is desired to find find a similarity where A e ]Rm Rmxm similarity transformation of form of the the form
T=[~ ~J
such that TST is is block block upper upper triangular. triangular. such that T l1ST
150 150
Chapter Products Chapter 13. 13. Kronecker Kronecker Products
(a) Show that S is similar to
[
A +OBX
B ] D-XB
if X X satisfies satisfies the so-called matrix matrix Riccati Riccati equation equation if the so-called
C-XA+DX-XBX=O. (b) Fonnulate Formulate a similar result for block lower triangularization of S. S.
to. Block Block Diagonalization: Let 10. S=
[~ ~
l
xn mxm where A Ee Rn jRnxn and D E jRmxm.. It is desired to find a similarity transfonnation of ER transformation of the fonn form
T=[~ ~]
such that TST is block block diagonal, diagonal. T l1ST (a) Show that S is similar to
if YY satisfies the Sylvester equation AY - YD = -B.
(b) Formulate Fonnulate a similar result for block diagonalization of of
Bibliography [1] [1] Albert, A., Regression and the Moore-Penrose Pseudoinverse, Academic Press, New York, NY, NY, 1972. 1972. York, [2] [2] Bartels, Bartels, RH., R.H., and and G.w. G.W. Stewart, Stewart, "Algorithm "Algorithm 432. 432. Solution Solution of the Matrix Equation Equation AX + XB = AX + XB = C," C," Comm. Cornm. ACM, 15(1972),820-826. 15(1972), 820-826. [3] New [3] Bellman, Bellman, R, R., Introduction to to Matrix Analysis, Second Second Edition, Edition, McGraw-Hill, McGraw-Hill, New York, NY, NY, 1970. York, 1970.
[4] Bjorck, for Least Squares Problems, SIAM, Philadelphia, [4] Bjorck, A., Numerical Numerical Methods Methodsfor Least Squares Problems, SIAM, Philadelphia, PA, PA, 1996. 1996. [5] on the the Generalized of the the Product Rev., [5] Cline, Cline, R.E., R.E., "Note "Note on Generalized Inverse Inverse of Product of of Matrices," Matrices," SIAM SIAM Rev., 6(1964),57-58. 6(1964), 57–58. [6] Nash, and the Problem [6] Golub, Golub, G.H., G.H., S. S. Nash, and C. C. Van Van Loan, Loan, "A "A Hessenberg-Schur Hessenberg-Schur Method Method for for the Problem AX + X B = C," IEEE Trans. Autom. Control, AC-24(1979), 909-913. AX XB = C," IEEE AC-24(1979), [7] [7] Golub, Golub, G.H., G.H., and and c.F. C.F. Van VanLoan, Loan,Matrix Matrix Computations, Computations, Third Third Edition, Edition, Johns JohnsHopkins Hopkins Univ. Press, Press, Baltimore, Baltimore, MD, 1996. Univ. MD, 1996. [8] [8] Golub, Golub, G.H., G.H., and and lH. J.H. Wilkinson, Wilkinson, "Ill-Conditioned "Ill-Conditioned Eigensystems Eigensystems and and the Computation Computation ofthe of the Jordan Canonical Form," SIAM SIAM Rev., 18(1976),578-619. 18(1976), 578-619. [9] T.N.E., "Note Inverse of of aa Matrix Product," SIAM Rev., [9] Greville, Greville, T.N.E., "Note on on the the Generalized Generalized Inverse Matrix Product," SIAM Rev., 8(1966),518-521 249]. 8(1966), 518–521 [Erratum, [Erratum, SIAM SIAM Rev., 9(1967), 9(1967), 249].
[10] Halmos, Halmos, P.R, PR., Finite-Dimensional Finite-Dimensional Vector Vector Spaces, Second Edition, Edition, Van Van Nostrand, Nostrand, [10] Spaces, Second Princeton, NJ, NJ, 1958. 1958. Princeton, Numerical Algorithms, Algorithms, Second [11] Higham, N.J., N.1., Accuracy Accuracy and Stability of [11] Higham, of'Numerical Second Edition, Edition, SIAM, SIAM, Philadelphia, Philadelphia, PA, 2002. 2002. [12] Hom, Horn, RA., R.A.,and andC.R. C.R.Johnson, Johnson,Matrix MatrixAnalysis, Analysis, Cambridge Cambridge Univ. Univ.Press, Press,Cambridge, Cambridge, UK, 1985. 1985. UK, [13] Hom, RA., and C.R. C.R. Johnson, Topics in Matrix Analysis, Analysis, Cambridge Univ. Univ. Press, Horn, R.A., Cambridge, UK, 1991. 1991. Cambridge, UK, 151 151
152 152
Bibliography Bibliography
[14] Kenney, C, C., and and A.J. AJ. Laub, Laub, "Controllability Stability Radii Radii for [14] Kenney, "Controllability and and Stability for Companion Companion Fonn Form Systems," Math. of of Control, Systems," Math, Control, Signals, and Systems, 1(1988),361-390. 1(1988), 361-390. [15] Kenney, C.S., [15] Kenney, C.S., andAJ. and A.J.Laub, Laub,"The "TheMatrix MatrixSign SignFunction," Function," IEEE IEEE Trans. Trans.Autom. Autom.Control, Control, 40(1995),1330-1348. 40(1995), 1330–1348. [16] Lancaster, P., P., and M. Tismenetsky, Tismenetsky, The Theory of Matrices, Second Edition with with [16] Lancaster, and M. Theory of Second Edition Applications, Academic FL, 1985. 1985. Applications, Academic Press, Press, Orlando, Orlando, FL, [17] Laub, A.J., AJ., "A Riccati Equations," IEEE Trans .. [17] Laub, "A Schur Schur Method Method for for Solving Solving Algebraic Algebraic Riccati Equations," IEEE Trans.. 913-921. Autom. Control, AC-24( 1979), 1979), 913–921.
Analysis and Applied Applied Linear Linear Algebra, SIAM, Philadelphia, PA, PA, [18] Meyer, C.D., [18] Meyer, C.D., Matrix Analysis SIAM, Philadelphia, 2000. 2000. [19] Moler, C.B.,and andc.P. C.F.Van VanLoan, Loan,"Nineteen "NineteenDubious DubiousWays WaystotoCompute Computethe theExponential Exponential [19] Moler, c.B., of of aa Matrix," Matrix," SIAM SIAM Rev., 20(1978),801-836. 20(1978), 801-836. [20] Noble, B., and Daniel, Applied Applied Linear Linear Algebra, Third Third Edition, [20] Noble, and J.w. J.W. Daniel, Edition, Prentice-Hall, Prentice-Hall, Englewood Cliffs, NJ, NJ, 1988. 1988. Englewood Cliffs, Plenum, New York, NY, NY, 1987. 1987. [21] [21] Ortega, Ortega, J., Matrix Theory. A Second Course, Plenum,
Proc. Cambridge Philos. Soc., Soc., [22] R., "A Inverse for for Matrices," [22] Pemose, Penrose, R., "A Generalized Generalized Inverse Matrices," Proc. 51(1955),406-413. 51(1955), 406–413. [23] NY, [23] Stewart, Stewart, G.W., G. W., Introduction to to Matrix Computations, Academic Academic Press, Press, New New York, York, NY, 1973. 1973.
[24] Strang, Strang, G., and Its Edition, Harcourt Brace [24] G., Linear Linear Algebra Algebra and Its Applications, Applications, Third Third Edition, Harcourt Brace Jovanovich, San San Diego, CA, 1988. Jovanovich, Diego, CA, 1988. of Matrix Computations, Second [25] [25] Watkins, D.S., D.S., Fundamentals of Second Edition, Edition, WileyInterscience, New York, York, 2002. 2002. Interscience, New [26] Wonham, W.M., W.M., Linear Multivariable Control. [26] Wonham, Control. A Geometric Approach, Third Third Edition, Edition, NY, 1985. 1985. Springer-Verlag, New York, York, NY, Springer-Verlag, New
Index Index A-invariant subspace, A–invariant subspace, 89 89 matrix matrix characterization characterization of, of, 90 90 algebraic multiplicity, multiplicity, 76 76 algebraic angle between between vectors, vectors, 58 58 angle
congruence, 103 congruence, 103 conjugate transpose, conjugate transpose, 22 contragredient transformation, transformation, 137 contragredient 137 controllability, 46 controllability, 46
11 basis, basis, 11 natural, 12 12 natural, block block matrix, matrix, 22 definiteness definiteness of, of, 104 104 diagonalization, 150 diagonalization, 150 inverse inverse of, of, 48 48 LV LU factorization, factorization, 55 triangularization, 149
defective, defective, 76 76 degree degree of 85 of aa principal principal vector, vector, 85 determinant, 4 determinant, 4 of block matrix, of aa block matrix, 55 properties properties of, of, 4-6 4–6 dimension, 12 dimension, 12 direct sum direct sum of subspaces, subspaces, 13 of 13 domain, 17 domain, 17
en, C",
e
1
(pmxn mxn
,
i
1
eigenvalue, eigenvalue, 75 75 invariance transforinvariance under under similarity similarity transformation, 81 mation,81 elementary 84 elementary divisors, divisors, 84 equivalence transformation, 95 equivalence transformation, 95 orthogonal, 95 unitary, unitary, 95 95 equivalent generalized generalized eigenvalue eigenvalue probprobequivalent lems, lems, 127 equivalent pencils, 127 127 equivalent matrix matrix pencils, exchange 89 exchange matrix, matrix, 39, 39, 89 exponential of aa Jordan block, 91, 91, 115 115 exponential of Jordan block, exponential 109 exponential of of aa matrix, matrix, 81, 81, 109 computation of, of, 114-118 114–118 computation inverse inverse of, of, 110 110 properties of, of, 109-112 109–112 properties
(p/nxn 1 e~xn, 1
Cauchy-Bunyakovsky-Schwarz InequalCauchy–Bunyakovsky–Schwarz Inequality,58 ity, 58 Cayley-Hamilton 75 Cayley–Hamilton Theorem, Theorem, 75 chain chain of eigenvectors, eigenvectors, 87 of 87 characteristic polynomial polynomial characteristic of 75 of aa matrix, matrix, 75 of 125 of aa matrix matrix pencil, pencil, 125 Cholesky factorization, factorization, 101 Cholesky 101 co-domain, 17 co–domain, 17 column column rank, rank, 23 23 vector, 11 vector, companion matrix companion matrix inverse 105 inverse of, of, 105 pseudoinverse of, 106 pseudoinverse of, 106 singular values values of, of, 106 singular 106 singular 106 singular vectors vectors of, of, 106 complement complement of aa subspace, subspace, 13 of 13 orthogonal, orthogonal, 21 21
field, field, 7 four four fundamental fundamental subspaces, subspaces, 23 23 function function of of aa matrix, matrix, 81 81 generalized 125 generalized eigenvalue, eigenvalue, 125 generalized real real Schur Schur form, form, 128 generalized 128
153
Index Index
154 generalized generalized Schur form, 127 generalized generalized singular value decomposition, decomposition, 134 134 geometric multiplicity, 76 geometric Holder Inequality, 58 Hermitian transpose, transpose, 2 higher-order difference equations higher–order conversion first-order form, 121 conversion to first–order higher–order higher-order differential equations conversion to first–order first-order form, 120 higher-order higher–order eigenvalue problems problems conversion to first–order first-order form, 136 i,2 i, 2 idempotent, idempotent, 6, 51 51 identity matrix, 4 inertia, 103 initial-value initial–value problem, 109 for higher-order higher–order equations, 120 for homogeneous homogeneous linear difference equations, 118 for homogeneous homogeneous linear differential equations, 112 for inhomogeneous inhomogeneous linear linear difference for difference equations, 119 for inhomogeneous inhomogeneous linear differendifferential equations, equations, 112 inner product product inner complex, 55 complex Euclidean, Euclidean, 44 complex Euclidean, 4, 54 real, 54 usual, 54 weighted, 54 invariant factors, 84 inverses of block matrices, 47
j,22 7, Jordan block, 82 Jordan canonical form (JCF), 82 Kronecker Kronecker canonical canonical form (KCF), 129 Kronecker Kronecker delta, 20
Kronecker product, 139 determinant determinant of, 142 eigenvalues of, 141 eigenvectors eigenvectors of, 141 products of, 140 pseudoinverse of, 148 singUlar singular values of, 141 trace of, 142 transpose of, 140 Kronecker sum, 142 eigenvalues of, 143 eigenvectors of, of, 143 143 eigenvectors exponential of, 149 leading principal submatrix, 100 left eigenvector, 75 left generalized eigenvector, 125 left invertible. invertible, 26 left left nullspace, 22 left principal vector, 85 linear dependence, 10 linear equations equations linear characterization of of all all solutions, solutions, 44 44 characterization existence of of solutions, solutions, 44 44 existence uniqueness of solutions, solutions, 45 45 uniqueness of linear independence, independence, 10 10 linear linear least squares problem, problem, 65 general solution of, 66 geometric solution of, 67 residual of, 65 solution via QR factorization, 71 71 decomsolution via singular value decomposition, 70 statement of, 65 uniqueness of solution, 66 linear regression, 67 linear transformation, 17 co–domain of, 17 co-domain composition of, 19 domain of, 17 invertible, 25 left invertible. invertible, 26 matrix representation of, 18 nonsingular, 25 nulls pace of, 20 nullspace
Index Index range of, 20 right invertible, 26 LV factorization, 6 LU block,55 block, Lyapunov differential equation, Lyapunov differential equation, 113 113 Lyapunov equation, equation, 144 and asymptotic stability, 146 integral form of solution, integral form of solution, 146 146 symmetry of solution, solution, 146 symmetry of 146 uniqueness of of solution, 146
matrix matrix asymptotically stable, 145 best rank k approximation to, 67 companion, 105 defective, 76 definite, 99 derogatory, 106 diagonal,2 diagonal, 2 exponential, 109 Hamiltonian, 122 Hermitian, 2 Householder, Householder, 97 97 indefinite, 99 lower Hessenberg, 2 lower triangular, 2 nearest singular matrix to, 67 nilpotent, 115 nonderogatory, 105 normal, 33, 95 orthogonal, 4 pentadiagonal, 2 quasi-upper-triangular, 98 quasi–upper–triangular, sign of of a, 91 square root of 10 1 of a, 101 symmetric, 2 symplectic, 122 tridiagonal, 2 unitary, unitary, 4 4 upper upper Hessenberg, Hessenberg, 22 upper triangular, 2 matrix matrix exponential, exponential, 81, 81, 91, 91, 109 109 matrix norm, 59 1-,60 1–.60 2-,60 2–, 60 00-,60 oo–, 60
155
p-,60 /?–, 60 61 consistent, 61 Frobenius, 60 induced by a vector norm, 61 mixed, 60 mixed,60 mutually consistent, 61 61 relations among, 61 Schatten,60 Schatten, 60 spectral, spectral, 60 subordinate subordinate to a vector norm, 61 unitarily unitarily invariant, invariant, 62 62 matrix pencil, 125 equivalent, 127 reciprocal, 126 regular, 126 singUlar, 126 singular, matrix sign function, 91 minimal polynomial, 76 monic polynomial, 76 Moore-Penrose pseudoinverse, 29 Moore–Penrose multiplication multiplication matrix-matrix, 3 matrix–matrix, matrix-vector, 3 matrix–vector, Mumaghan-Wintner Theorem, 98 Murnaghan–Wintner negative definite, 99 negative invariant subspace, 92 92 nonnegative definite, definite, 99 99 criteria for, 100 nonpositive definite, 99 norm norm induced, 56 induced,56 natural,56 natural, 56 normal equations, 65 normed normed linear linear space, space, 57 57 nullity, 24 nullspace,20 nullspace, 20 left, 22 22 right, 22 observability, 46 observability, 46 one-to-one (1–1), (1-1), 23 one–to–one conditions for, 25 onto, 23 for, 25 conditions for,
Index Index
156 156 orthogonal complement, 21 21 matrix, 4 4 matrix, projection, 52 52 projection, subspaces, 14 vectors, 4, 4, 20 20 vectors, orthonormal orthonormal vectors, 4, 20 outer product, product, 19 19 outer and Kronecker Kronecker product, 140 121 exponential of, 121 pseudoinverse of, 33 singular value decomposition decomposition of, 41 41 various matrix norms of, 63 pencil equivalent, 127 equivalent, 127 of of matrices, 125 reciprocal, 126 regular, 126 singular, 126 126 singular, Penrose theorem, theorem, 30 30 Penrose polar factorization, factorization, 41 41 polar polarization polarization identity, 57 positive definite, definite, 99 criteria for, 100 positive invariant invariant subspace, subspace, 92 92 positive power (kth) (Kth) of a Jordan block, 120 powers of a matrix computation of, 119-120 119–120 principal submatrix, 100 projection projection 51 oblique, 51 on four fundamental fundamental subspaces, 52 orthogonal, 52 pseudoinverse, 29 four Penrose conditions for, 30 of a full-column-rank full–column–rank matrix, 30 of a full-row-rank full–row–rank matrix, matrix, 30 of aa matrix matrix product, product, 32 32 of of aa scalar, scalar, 31 31 of of aa vector, vector, 31 31 of uniqueness, 30 via singular value decomposition, 38 Pythagorean Identity, 59
Q-orthogonality, Q –orthogonality, 55 QR factorization, factorization, 72 72 QR TO" JR.n,, 11I IK mxn i MJR.mxn,1 , 1 mxn 11 MlR.~xn, r '
Mnxn JR.~xn,1 I n ' '
range, 20 20 range, range inclusion range inclusion characterized by pseudoinverses, 33 rank, 23 column, 23 row, 23 row, 23 rank–one matrix, 19 rank-one matrix, 19 rational canonical form, 104 Rayleigh quotient, 100 reachability, 46 real Schur canonical form, form, 98 real Schur Schur form, form, 98 98 real reciprocal matrix pencil, 126 reconstructibility, 46 regular matrix pencil, 126 residual, 65 III resolvent, 111 reverse–order identity matrix, matrix, 39, 39, 89 89 reverse-order identity right eigenvector, 75 right generalized eigenvector, 125 right invertible, 26 right nullspace, 22 right principal vector, vector, 85 right principal 85 row row rank, 23 vector, 1I vector,
Schur canonical form, form, 98 generalized, 127 Schur complement, 6, 48, 102, 104 Schur Theorem, 98 Schur vectors, 98 second–order eigenvalue eigenvalue problem, problem, 135 second-order 135 conversion to to first–order form, 135 conversion first-order form, 135 Sherman-Morrison-Woodbury Sherman–Morrison–Woodbury formula, formula, 48 48 signature, 103 signature, 103 similarity transformation, transformation, 95 and invariance invariance of eigenvalues, eigenvalues, 81h
Index Index orthogonal, 95 orthogonal, unitary, 95 simple eigenvalue, 85 simultaneous diagonalization, 133 decomposition, 134 via singular value decomposition, singular matrix pencil, 126 singular value decomposition decomposition (SVD), 35 and bases for four fundamental subspaces, 38 pseudoinverse, 38 and pseudoinverse, and rank, 38 characterization of a matrix factorcharacterization ization as, 37 dyadic expansion, 38 examples, 37 compact, 37 full vs. compact, fundamental theorem, 35 nonuniqueness, 36 singular values, 36 singular vectors left, 36 right, 36 span, 11 spectral radius, 62, 107 spectral representation, 97 spectral representation, spectrum, 76 subordinate norm, 61 61 subspace, 9 A-invariant, 89 A–invariant, deflating, 129 reducing, 130 subspaces complements of, 13 complements direct sum of, 13 direct equality of, 10 four fundamental, 23 intersection of, 13 orthogonal, 14 sum of, 13 Sylvester differential differential equation, 113 Sylvester Sylvester equation, 144 of solution, 145 integral form of uniqueness of solution, 145
157 157 Sylvester's Law of Inertia, 103 Sylvester's symmetric generalized generalized eigenvalue problem,131 lem, 131 squares, 68 total least squares, trace, 6 transpose, 2 characterization by inner product, 54 characterization of a block matrix, 2 of triangle inequality for matrix norms, 59 for vector norms, 57 unitarily invariant matrix norm, 62 vector norm, 58 variation of of parameters, 112 vec vec of a matrix, 145 of of a matrix product, 147 of vector norm, 57 1-,57 l–, 57 2-,57 2–, 57 00-,57 oo–, 57 p-,57 P–, 51 equivalent, 59 Euclidean, 57 Euclidean, Manhattan, 57 relations among, 59 unitarily invariant, 58 weighted, 58 p-, 58 weighted p–, vector space, 8 dimension dimension of, 12 vectors, 1 column, 1 linearly dependent, 10 linearly independent, independent, 10 linearly orthogonal, 4, 20 orthonormal, 4, 20 row, 11 row, of a set of, 11 span of zeros of a linear dynamical system, 130 of