Content deleted Content added
→Rectangular matrices: Link to the [[semi-orthogonal matrix |
m removed incorrect bolding |
||
(34 intermediate revisions by 24 users not shown) | |||
Line 1:
{{Short description|Real square matrix whose columns and rows are orthogonal unit vectors}}
{{for|matrices with orthogonality over the [[complex number]] field|unitary matrix}}
{{More footnotes needed|date=May 2023}}
In [[linear algebra]], an '''orthogonal matrix''', or '''orthonormal matrix''', is a real [[square matrix]] whose columns and rows are [[Orthonormality|orthonormal]] [[Vector (mathematics and physics)|vectors]].
One way to express this is
<math display="block">Q^\mathrm{T} Q = Q Q^\mathrm{T} = I,</math>
where {{math|''Q''<sup>T</sup>}} is the [[transpose]] of {{mvar|Q}} and {{mvar|I}} is the [[identity matrix]].
This leads to the equivalent characterization: a matrix {{mvar|Q}} is orthogonal if its transpose is equal to its [[Invertible matrix|inverse]]:
<math display="block">Q^\mathrm{T}=Q^{-1},</math>
where {{math|''Q''<sup>−1</sup>}} is the inverse of {{mvar|Q}}.
An orthogonal matrix {{mvar|Q}} is necessarily
The set of {{math|''n'' × ''n''}} orthogonal matrices, under multiplication, forms
==Overview==
[[File:Matrix multiplication transpose.svg|thumb|275px|Visual understanding of multiplication by the transpose of a matrix. If A is an orthogonal matrix and B is its transpose, the ij-th element of the product AA<sup>T</sup> will vanish if i≠j, because the i-th row of A is orthogonal to the j-th row of A.]]
An orthogonal matrix is the real specialization of a unitary matrix, and thus always a [[normal matrix]]. Although we consider only real matrices here, the definition can be used for matrices with entries from any [[field (mathematics)|field]]. However, orthogonal matrices arise naturally from [[dot product]]s, and for matrices of complex numbers that leads instead to the unitary requirement. Orthogonal matrices preserve the dot product,<ref>[http://tutorial.math.lamar.edu/Classes/LinAlg/OrthogonalMatrix.aspx "Paul's online math notes"]{{Full citation needed|date=January 2013|note=See talk page.}}, Paul Dawkins, [[Lamar University]], 2008. Theorem 3(c)</ref> so, for vectors {{math|'''u'''}} and {{math|'''v'''}} in an {{mvar|n}}-dimensional real [[Euclidean space]]
<math display="block">{\mathbf u} \cdot {\mathbf v} = \left(Q {\mathbf u}\right) \cdot \left(Q {\mathbf v}\right) </math>
where {{mvar|Q}} is an orthogonal matrix. To see the inner product connection, consider a vector {{math|'''v'''}} in an {{mvar|n}}-dimensional real [[Euclidean space]]. Written with respect to an orthonormal basis, the squared length of {{math|'''v'''}} is {{math|'''v'''<sup>T</sup>'''v'''}}. If a linear transformation, in matrix form {{math|''Q'''''v'''}}, preserves vector lengths, then
<math display="block">{\mathbf v}^\mathrm{T}{\mathbf v} = (Q{\mathbf v})^\mathrm{T}(Q{\mathbf v}) = {\mathbf v}^\mathrm{T} Q^\mathrm{T} Q {\mathbf v} .</math>
Thus [[dimension (vector space)|finite-dimensional]] linear isometries—rotations, reflections, and their combinations—produce orthogonal matrices. The converse is also true: orthogonal matrices imply orthogonal transformations. However, linear algebra includes orthogonal transformations between spaces which may be neither finite-dimensional nor of the same dimension, and these have no orthogonal matrix equivalent.
Orthogonal matrices are important for a number of reasons, both theoretical and practical. The {{math|''n'' × ''n''}} orthogonal matrices form a [[group (mathematics)|group]] under matrix multiplication, the [[orthogonal group]] denoted by {{math|O(''n'')}}, which—with its subgroups—is widely used in mathematics and the physical sciences. For example, the [[point group]] of a molecule is a subgroup of O(3). Because floating point versions of orthogonal matrices have advantageous properties, they are key to many algorithms in numerical
==Examples==
Line 42 ⟶ 39:
\cos \theta & -\sin \theta \\
\sin \theta & \cos \theta \\
\end{bmatrix}</math>
*<math>
\begin{bmatrix}
Line 65 ⟶ 59:
The {{nowrap|2 × 2}} matrices have the form
<math display="block">\begin{bmatrix}
p & t\\
q & u
\end{bmatrix},</math>
which orthogonality demands satisfy the three equations
<math display="block">\begin{align}
1 & = p^2+t^2, \\
1 & = q^2+u^2, \\
0 & = pq+tu.
\end{align}</math>
In consideration of the first equation, without loss of generality let {{math|1=''p'' = cos ''θ''}}, {{math|1=''q'' = sin ''θ''}}; then either {{math|1=''t'' = −''q''}}, {{math|1=''u'' = ''p''}} or {{math|1=''t'' = ''q''}}, {{math|1=''u'' = −''p''}}. We can interpret the first case as a rotation by {{mvar|θ}} (where {{math|1=''θ'' = 0}} is the identity), and the second as a reflection across a line at an angle of {{math|{{sfrac|''θ''|2}}}}.
\begin{bmatrix}
\cos \theta & -\sin \theta \\
Line 95 ⟶ 84:
The special case of the reflection matrix with {{math|1=''θ'' = 90°}} generates a reflection about the line at 45° given by {{math|1=''y'' = ''x''}} and therefore exchanges {{mvar|x}} and {{mvar|y}}; it is a [[permutation matrix]], with a single 1 in each column and row (and otherwise 0):
<math display="block">\begin{bmatrix}
0 & 1\\
1 & 0
\end{bmatrix}.</math>
The identity is also a permutation matrix.
A reflection is [[Involutory matrix|its own inverse]], which implies that a reflection matrix is [[symmetric matrix|symmetric]] (equal to its transpose) as well as orthogonal. The product of two rotation matrices is a [[rotation matrix]], and the product of two reflection matrices is also a rotation matrix.
===Higher dimensions===
Regardless of the dimension, it is always possible to classify orthogonal matrices as purely rotational or not, but for {{nowrap|3 × 3}} matrices and larger the non-rotational matrices can be more complicated than reflections. For example,
<math display="block">
\begin{bmatrix}
-1 & 0 & 0\\
Line 121 ⟶ 108:
represent an ''[[Inversion in a point|inversion]]'' through the origin and a ''[[improper rotation|rotoinversion]]'', respectively, about the {{math|z}}-axis.
Rotations become more complicated in higher dimensions; they can no longer be completely characterized by one angle, and may affect more than one planar subspace. It is common to describe a {{nowrap|3 × 3}} rotation matrix in terms of an [[axis and angle]], but this only works in three dimensions. Above three dimensions two or more angles are needed, each associated with a [[plane of rotation]].
Line 135 ⟶ 117:
A [[Householder reflection]] is constructed from a non-null vector {{math|'''v'''}} as
<math display="block">Q = I - 2 \frac{{\mathbf v}{\mathbf v}^\mathrm{T}}{{\mathbf v}^\mathrm{T}{\mathbf v}} .</math>
Here the numerator is a symmetric matrix while the denominator is a number, the squared magnitude of {{math|'''v'''}}. This is a reflection in the hyperplane perpendicular to {{math|'''v'''}} (negating any vector component parallel to {{math|'''v'''}}). If {{math|'''v'''}} is a unit vector, then {{math|1=''Q'' = ''I'' − 2'''vv'''<sup>T</sup>}} suffices. A Householder reflection is typically used to simultaneously zero the lower part of a column. Any orthogonal matrix of size {{nowrap|''n'' × ''n''}} can be constructed as a product of at most {{mvar|n}} such reflections.
Line 147 ⟶ 128:
===Matrix properties===
A real square matrix is orthogonal [[if and only if]] its columns form an [[orthonormal basis]] of the [[Euclidean space]] {{math|
The [[determinant]] of any orthogonal matrix is +1 or −1. This follows from basic facts about determinants, as follows:
<math display="block">1=\det(I)=\det\left(Q^\mathrm{T}Q\right)=\det\left(Q^\mathrm{T}\right)\det(Q)=\bigl(\det(Q)\bigr)^2 .</math>
The converse is not true; having a determinant of ±1 is no guarantee of orthogonality, even with orthogonal columns, as shown by the following counterexample.
<math display="block">\begin{bmatrix}
2 & 0 \\
0 & \frac{1}{2}
Line 167 ⟶ 146:
The inverse of every orthogonal matrix is again orthogonal, as is the matrix product of two orthogonal matrices. In fact, the set of all {{math|''n'' × ''n''}} orthogonal matrices satisfies all the axioms of a [[group (mathematics)|group]]. It is a [[compact space|compact]] [[Lie group]] of dimension {{math|{{sfrac|''n''(''n'' − 1)|2}}}}, called the [[orthogonal group]] and denoted by {{math|O(''n'')}}.
The orthogonal matrices whose determinant is +1 form a [[connected space|path-connected]] [[normal subgroup]] of {{math|O(''n'')}} of [[index of a subgroup|index]] 2, the [[special orthogonal group]] {{math|SO(''n'')}} of
Now consider {{math|(''n'' + 1) × (''n'' + 1)}} orthogonal matrices with bottom right entry equal to 1. The remainder of the last column (and last row) must be zeros, and the product of any two such matrices has the same form. The rest of the matrix is an {{math|''n'' × ''n''}} orthogonal matrix; thus {{math|O(''n'')}} is a subgroup of {{math|O(''n'' + 1)}} (and of all higher groups).
& & & 0\\
& \mathrm{O}(n) & & \vdots\\
& & & 0\\
0 & \cdots & 0 & 1
\end{bmatrix}</math>
Since an elementary reflection in the form of a [[Householder matrix]] can reduce any orthogonal matrix to this constrained form, a series of such reflections can bring any orthogonal matrix to the identity; thus an orthogonal group is a [[reflection group]]. The last column can be fixed to any unit vector, and each choice gives a different copy of {{math|O(''n'')}} in {{math|O(''n'' + 1)}}; in this way {{math|O(''n'' + 1)}} is a [[fiber bundle|bundle]] over the unit sphere {{math|''S''<sup>''n''</sup>}} with fiber {{math|O(''n'')}}.
Similarly, {{math|SO(''n'')}} is a subgroup of {{math|SO(''n'' + 1)}}; and any special orthogonal matrix can be generated by [[Givens rotation|Givens plane rotations]] using an analogous procedure. The bundle structure persists: {{math|SO(''n'') ↪ SO(''n'' + 1) → ''S''<sup>''n''</sup>}}. A single rotation can produce a zero in the first row of the last column, and series of {{math|''n'' − 1}} rotations will zero all but the last row of the last column of an {{math|''n'' × ''n''}} rotation matrix. Since the planes are fixed, each rotation has only one degree of freedom, its angle. By induction, {{math|SO(''n'')}} therefore has
<math display="block">(n-1) + (n-2) + \cdots + 1 = \frac{n(n-1)}{2}</math>
degrees of freedom, and so does {{math|O(''n'')}}.
Permutation matrices are simpler still; they form, not a Lie group, but only a finite group, the order [[factorial|{{math|''n''
===Canonical form===
More broadly, the effect of any orthogonal matrix separates into independent actions on orthogonal two-dimensional subspaces. That is, if {{mvar|Q}} is special orthogonal then one can always find an orthogonal matrix {{mvar|P}}, a (rotational) [[change of basis]], that brings {{mvar|Q}} into block diagonal form:
R_1 & & \\ & \ddots & \\ & & R_k
\end{bmatrix}\ (n\text{ even}),
\ P^\mathrm{T}QP = \begin{bmatrix} R_1 & & & \\ & \ddots & & \\ & & R_k & \\ & & & 1
\end{bmatrix}\ (n\text{ odd}).</math>
where the matrices {{math|''R''<sub>1</sub>, ..., ''R''<sub>''k''</sub>}} are {{nowrap|2 × 2}} rotation matrices, and with the remaining entries zero. Exceptionally, a rotation block may be diagonal, {{math|±''I''}}. Thus, negating one column if necessary, and noting that a {{nowrap|2 × 2}} reflection diagonalizes to a +1 and −1, any orthogonal matrix can be brought to the form
<math display="block">P^\mathrm{T}QP = \begin{bmatrix}
\begin{matrix}R_1 & & \\ & \ddots & \\ & & R_k\end{matrix} & 0 \\
0 & \begin{matrix}\pm 1 & & \\ & \ddots & \\ & & \pm 1\end{matrix} \\
Line 209 ⟶ 185:
===Lie algebra===
Suppose the entries of {{mvar|Q}} are differentiable functions of {{mvar|t}}, and that {{math|1=''t'' = 0}} gives {{math|1=''Q'' = ''I''}}. Differentiating the orthogonality condition
<math display="block">Q^\mathrm{T} Q = I </math>
yields
<math display="block">\dot{Q}^\mathrm{T} Q + Q^\mathrm{T} \dot{Q} = 0</math>
Evaluation at {{math|1=''t'' = 0}} ({{math|1=''Q'' = ''I''}}) then implies
<math display="block">\dot{Q}^\mathrm{T} = -\dot{Q} .</math>
In Lie group terms, this means that the [[Lie algebra]] of an orthogonal matrix group consists of [[skew-symmetric matrix|skew-symmetric matrices]]. Going the other direction, the [[matrix exponential]] of any skew-symmetric matrix is an orthogonal matrix (in fact, special orthogonal).
For example, the three-dimensional object physics calls [[angular velocity]] is a differential rotation, thus a vector in the Lie algebra
<math display="block">
\Omega = \begin{bmatrix}
0 & -z\theta & y\theta \\
Line 232 ⟶ 203:
The exponential of this is the orthogonal matrix for rotation around axis {{math|'''v'''}} by angle {{mvar|θ}}; setting {{math|1=''c'' = cos {{sfrac|''θ''|2}}}}, {{math|1=''s'' = sin {{sfrac|''θ''|2}}}},
<math display="block">\exp(\Omega) = \begin{bmatrix}
1 - 2s^2 + 2x^2 s^2 & 2xy s^2 - 2z sc & 2xz s^2 + 2y sc\\
2xy s^2 + 2z sc & 1 - 2s^2 + 2y^2 s^2 & 2yz s^2 - 2x sc\\
2xz s^2 - 2y sc & 2yz s^2 + 2x sc & 1 - 2s^2 + 2z^2 s^2
\end{bmatrix}.</math>
==Numerical linear algebra==
===Benefits===
[[Numerical analysis]] takes advantage of many of the properties of orthogonal matrices for numerical
Permutations are essential to the success of many algorithms, including the workhorse [[Gaussian elimination]] with [[Pivot element#Partial and complete pivoting|partial pivoting]] (where permutations do the pivoting). However, they rarely appear explicitly as matrices; their special form allows more efficient representation, such as a list of {{mvar|n}} indices.
Line 262 ⟶ 229:
Consider an [[overdetermined system of linear equations]], as might occur with repeated measurements of a physical phenomenon to compensate for experimental errors. Write {{math|1=''A'''''x''' = '''b'''}}, where {{mvar|A}} is {{math|''m'' × ''n''}}, {{math|''m'' > ''n''}}.
A {{mvar|QR}} decomposition reduces {{mvar|A}} to upper triangular {{mvar|R}}. For example, if {{mvar|A}} is {{nowrap|5 × 3}} then {{mvar|R}} has the form
<math display="block">R = \begin{bmatrix}
\cdot & \cdot & \cdot \\
0 & \cdot & \cdot \\
Line 275 ⟶ 241:
In the case of a linear system which is underdetermined, or an otherwise non-[[invertible matrix]], singular value decomposition (SVD) is equally useful. With {{mvar|A}} factored as {{math|''U''Σ''V''<sup>T</sup>}}, a satisfactory solution uses the Moore-Penrose [[pseudoinverse]], {{math|''V''Σ<sup>+</sup>''U''<sup>T</sup>}}, where {{math|Σ<sup>+</sup>}} merely replaces each non-zero diagonal entry with its reciprocal. Set {{math|'''x'''}} to {{math|''V''Σ<sup>+</sup>''U''<sup>T</sup>'''b'''}}.
The case of a square invertible matrix also holds interest. Suppose, for example, that {{mvar|A}} is a {{nowrap|3 × 3}} rotation matrix which has been computed as the composition of numerous twists and turns. Floating point does not match the mathematical ideal of real numbers, so {{mvar|A}} has gradually lost its true orthogonality. A [[Gram–Schmidt process]] could [[orthogonalization|orthogonalize]] the columns, but it is not the most reliable, nor the most efficient, nor the most invariant method. The [[polar decomposition]] factors a matrix into a pair, one of which is the unique ''closest'' orthogonal matrix to the given matrix, or one of the closest if the given matrix is singular. (Closeness can be measured by any [[matrix norm]] invariant under an orthogonal change of basis, such as the spectral norm or the Frobenius norm.) For a near-orthogonal matrix, rapid convergence to the orthogonal factor can be achieved by a "[[Newton's method]]" approach due to {{harvtxt|Higham|1986}} ([[#CITEREFHigham1990|1990]]), repeatedly averaging the matrix with its inverse transpose. {{harvtxt|Dubrulle|
For example, consider a non-orthogonal matrix for which the simple averaging algorithm takes seven steps
<math display="block">\begin{bmatrix}3 & 1\\7 & 5\end{bmatrix}
\rightarrow
\begin{bmatrix}1.8125 & 0.0625\\3.4375 & 2.6875\end{bmatrix}
\rightarrow \cdots \rightarrow
\begin{bmatrix}0.8 & -0.6\\0.6 & 0.8\end{bmatrix}</math>
and which acceleration trims to two steps (with {{mvar|γ}} = 0.353553, 0.565685).
\rightarrow
\begin{bmatrix}1.41421 & -1.06066\\1.06066 & 1.41421\end{bmatrix}
Line 295 ⟶ 259:
Gram-Schmidt yields an inferior solution, shown by a Frobenius distance of 8.28659 instead of the minimum 8.12404.
\rightarrow
\begin{bmatrix}0.393919 & -0.919145\\0.919145 & 0.393919\end{bmatrix}</math>
===Randomization===
Some numerical applications, such as [[Monte Carlo method]]s and exploration of high-dimensional data spaces, require generation of [[uniform distribution (continuous)|uniformly distributed]] random orthogonal matrices. In this context, "uniform" is defined in terms of [[Haar measure]], which essentially requires that the distribution not change if multiplied by any freely chosen orthogonal matrix. Orthogonalizing matrices with [[statistical independence|independent]] uniformly distributed random entries does not result in uniformly distributed orthogonal matrices{{Citation needed|date=June 2009}}, but the [[QR decomposition|{{mvar|QR}} decomposition]] of independent [[normal distribution|normally distributed]] random entries does, as long as the diagonal of {{mvar|R}} contains only positive entries {{harv|Mezzadri|2006}}. {{harvtxt|Stewart|1980}} replaced this with a more efficient idea that {{harvtxt|Diaconis|Shahshahani|1987}} later generalized as the "subgroup algorithm" (in which form it works just as well for permutations and rotations). To generate an {{math|(''n'' + 1) × (''n'' + 1)}} orthogonal matrix, take an {{math|''n'' × ''n''}} one and a uniformly distributed unit vector of dimension {{nowrap|''n'' + 1}}. Construct a
===Nearest orthogonal matrix===
The problem of finding the orthogonal matrix {{mvar|Q}} nearest a given matrix {{mvar|M}} is related to the [[Orthogonal Procrustes problem]]. There are several different ways to get the unique solution, the simplest of which is taking the [[singular value decomposition]] of {{mvar|M}} and replacing the singular values with ones. Another method expresses the {{mvar|R}} explicitly but requires the use of a [[matrix square root]]:<ref>[http://people.csail.mit.edu/bkph/articles/Nearest_Orthonormal_Matrix.pdf "Finding the Nearest Orthonormal Matrix"], [[Berthold K.P. Horn]], [[MIT]].</ref>
<math display="block">Q = M \left(M^\mathrm{T} M\right)^{-\frac 1 2}</math>
This may be combined with the Babylonian method for extracting the square root of a matrix to give a recurrence which converges to an orthogonal matrix quadratically:
<math display="block">Q_{n + 1} = 2 M \left(Q_n^{-1} M + M^\mathrm{T} Q_n\right)^{-1}</math>
where {{math|1=''Q''<sub>0</sub> = ''M''}}.
These iterations are stable provided the [[condition number]] of {{mvar|M}} is less than three.<ref>[http://www.maths.manchester.ac.uk/~nareports/narep91.pdf "Newton's Method for the Matrix Square Root"] {{Webarchive|url=https://web.archive.org/web/20110929131330/http://www.maths.manchester.ac.uk/~nareports/narep91.pdf |date=2011-09-29 }}, Nicholas J. Higham, Mathematics of Computation, Volume 46, Number 174, 1986.</ref>
Line 318 ⟶ 279:
Using a first-order approximation of the inverse and the same initialization results in the modified iteration:
==Spin and pin==
A subtle technical problem afflicts some uses of orthogonal matrices. Not only are the group components with determinant +1 and −1 not [[Connected space|connected]] to each other, even the +1 component, {{math|SO(''n'')}}, is not [[simply connected space|simply connected]] (except for SO(1), which is trivial). Thus it is sometimes advantageous, or even necessary, to work with a [[covering map|covering group]] of SO(''n''), the [[spinor group|spin group]], {{math|Spin(''n'')}}. Likewise, {{math|O(''n'')}} has covering groups, the [[pin group]]s, Pin(''n''). For {{
The Pin and Spin groups are found within [[Clifford algebra]]s, which themselves can be built from orthogonal matrices.
Line 333 ⟶ 294:
There is no standard terminology for these matrices. They are variously called "semi-orthogonal matrices", "orthonormal matrices", "orthogonal matrices", and sometimes simply "matrices with orthonormal rows/columns".
For the case {{math|''n'' ≤ ''m''}}, matrices with orthonormal columns may be referred to as [[k-frame|orthogonal k-frames]] and they are elements of the [[Stiefel manifold]].
==See also==
* [[Biorthogonal system]]
==Notes==
Line 350 ⟶ 311:
| first2 = Mehrdad |last2= Shahshahani
| title = The subgroup algorithm for generating uniform random variables
| journal =
| volume = 1
| pages = 15–32
| year = 1987
| issn = 0269-9648
| doi = 10.1017/S0269964800000255
}}
* {{citation
| last= Dubrulle
| first=Augustin A.
| title = An Optimum Iteration for the Matrix Polar Decomposition
| journal =
| volume = 8
| pages = 21–25
Line 402 ⟶ 364:
| issn = 0196-5204
| doi = 10.1137/0911038 | citeseerx = 10.1.1.230.4322
| s2cid = 14268409
}} [https://web.archive.org/web/20051016153437/http://www.ma.man.ac.uk/~higham/pap-mf.html]
* {{citation
Line 414 ⟶ 377:
| year = 1976
| doi = 10.1007/BF01462266
| s2cid = 120372682
| issn = 0029-599X }}
* {{citation
Line 420 ⟶ 384:
| author-link = G. W. Stewart
| title = The Efficient Generation of Random Orthogonal Matrices with an Application to Condition Estimators
| journal = SIAM
| volume = 17
| issue = 3
Line 426 ⟶ 390:
| year = 1980
| issn = 0036-1429
| doi = 10.1137/0717034
}}
* {{citation
| last = Mezzadri
Line 433 ⟶ 398:
| journal = Notices of the American Mathematical Society
| volume = 54
| year = 2006
| bibcode = 2006math.ph...9050M
}}
<!--
* [[Persi Diaconis]], Mehrdad Shahshahani. The subgroup algorithm for generating uniform random variables. '' Prob. in Eng. and Info. Sci.'', vol. 1, 15–32, 1987. {{issn|0269-9648}}.
Line 445 ⟶ 412:
==External links==
{{sister project|project=Wikiversity
|text=[[v:Linear algebra/Orthogonal matrix|Wikiversity introduces the '''orthogonal matrix'''.]]}}
* {{springer|title=Orthogonal matrix|id=p/o070320}}
* [http://people.revoledu.com/kardi/tutorial/LinearAlgebra/MatrixOrthogonal.html Tutorial and Interactive Program on Orthogonal Matrix]
|