Reanimator Ltd

High-performance coding by Eddie Edwards

Matrix Decomposition for Dummies

27 Jan 2014, 09:55 UTC

A lot of problems involve - or can involve - matrices. To solve school problems, matrices are inverted using Gaussian Elimination. But real-world matrices are often singular, which presents problems. You can buy in code that does some matrix decompositions (e.g. the Singular Value Decomposition) but there's always some inaccuracies, even if the SVD code is robust to them. But some problems have integer matrices and it seems weird to start getting round-off errors when you can do exact operations on those integers, using for instance a rational number library .

It's not so easy (or doesn't appear so easy) to find working C++ code that solves a degenerate matrix equation in modular arithmetic, or working Lisp code (say) that solves one using rational arithmetic. (Since Lisp contains perfect rational numbers as primitive types, it's a great place to play with perfect rational matrix decompositions.)

So here's a dummy's guide to hacking your own matrix decomposition codes.

Row & Column Matrices

Matrices have this kind of wave-particle duality about them. They are both a set of rows, and a set of columns. The transpose operation switches between these two views, which is why it's so important. Let's recap on matrix multiply:

(a b c) (d x x)   (ad + be + cf  x x)
(x x x) (e x x) = (     x        x x)
(x x x) (f x x)   (     x        x x)

This is the way I was taught matrix multiply (read: this is the only way I can ever remember how it works). To compute an element you go along that element's row in the first matrix and down that element's column in the second matrix, and do a kind of dot product.

From this you can derive some important facts which give far more insight into what matrix multiplication actually means. These insights have to do with considering the matrices as sets of rows or sets of columns, not as individual elements.

If we consider the right-hand (RHS) matrix to be a set of row vectors, then the output matrix is also a set of row vectors:

(a b c) ( R0 )   (a * R0 + b * R1 + c * R2)
(d e f) ( R1 ) = (d * R0 + e * R1 + f * R2)
(g h k) ( R2 )   (g * R0 + h * R1 + k * R2)

The first row in the left-hand (LHS) matrix specifies how the first row in the result matrix is constructed, as a linear combination of all the rows in the RHS matrix. You can interpret this in terms of 3D coordinate systems, but I won't.

OK, so consider an operation on the RHS matrix, say we swap some of its rows around.

( R0 )    ( R0 )
( R1 ) -> ( R2 )
( R2 )    ( R1 )

Since the output rows are combinations of the input rows, this transformation can be represented by pre-multiplying the matrix by some other matrix. And we can pretty much just write this other matrix down:

(1 0 0) ( R0 )   ( R0 )
(0 0 1) ( R1 ) = ( R2 )
(0 1 0) ( R2 ) = ( R1 )

So the take-home here is that an operation on a matrix that combines rows can be expressed as pre-multiplication by some matrix, which we'll call R.

Now let's think about the converse - column matrices. It's quite challenging to write this down using ASCII art, but fundamentally we have a similar result:

(C0 | C1 | C2) (a b c) = (a*C0 + d*C1 + g*C2 | b*C0 + ... | c * C0 + ...)
               (d e f)
               (g h i)

The first column in the RHS matrix specifies how the first column in the result matrix is constructed, as a linear combination of all the rows in the LHS matrix.

This means that an operation on a matrix that combines columns can be expressed as post-multiplication by some matrix, which we'll call C.

R-C decomposition

So we have a matrix M which is given to us, and we have to solve some equation involving M (typically aM = b or Ma = b). What's the simplest equation involving M?

M = Ir * M

Here, Ir is an identity matrix. If M has r rows, then Ir is an rxr matrix.

What we want to do is perform operations on M to reduce it to a simpler form. This usually involves making some entries of M be zero. The basic strategies are:

  1. Subtract a multiple of one row from another row, or one column from another column.
  2. Apply a 2D rotation to two rows, or two columns.
  3. As (1) but aim to make the rows/columns orthogonal instead of introducing zeroes

The first case is called "Gaussian elimination"; it can introduce zeroes like this:

(x 4 x) - 4 * (x 1 x) = (x 0 x)

The second case is called "Givens rotations"; it can introduce zeroes like this:

(x 4 x) * cos(90) + (x 0 x) * sin(90) = (x 0 x)

The third case is called "Gram-Schmidt orthogonalization"; it doesn't necessarily introduce zeros but it makes rows/columns be at right-angles:

(1 1 1) - 1 * (1 0 0) = (0 1 1)
where (0 1 1).(1 0 0) = 0

In all cases we are only operating on two rows at a time, which keeps the algorithms simple. More importantly, in all cases it's possible to "undo" the row operation and recover the original rows (mathematically if not numerically).

Applying our row operation is equivalent to pre-multiplying by an (invertible) matrix R, so our identity becomes:

Before: M = Ir * M
After:  R * M = R * Ir * M

Now the trick: if W is our "work matrix", which is initialized to M, and then we apply row operations directly to this matrix, then W = R * M. If LEFT is another work matrix which is initialized to Ir, and then we apply the same row operations directly to this matrix, then LEFT = R * Ir. Which means LEFT = R, so we never have to explicitly calculate R, we only have to mirror the row operations on W as row operations on LEFT, to produce R at the end.

What about applying a sequence of row operations? Since we pre-multiply each time we have:

Rn * ... * R2 * R1 * M = Rn * ... * R2 * R1 * Ir * M

If we apply the row operations sequentially to W and LEFT, we still get W = Rn * ... * R2 * R1 * M, and LEFT = Rn * ... * R2 * R1 * Ir.

So the code is like this:

W = M;
R = Ir;
assert(W == R * M);
assert(R.IsInvertible());
for (...)
{
    ApplyRowOperation(W);
    ApplyRowOperation(R);
    assert(W == R * M);
    assert(R.IsInvertible());
}

The first two asserts are trivially obvious; the second are consequences of the work we've done.

So a general row-based matrix decomposition can be done using code like the above, and we get

W = R * M
where W is in some simpler form
and R is invertible

Since R is invertible we can invert it to get RI, and then we have a "proper" matrix decomposition, that expresses M as the product of some matrices

M = RI * W
where W is in some simpler form
and RI is invertible

Adding Column Operations

Column operations can be added to the code in a similar way. The initial equation is now:

M = Ir * M * Ic

where Ir is an rxr identity matrix (M has r rows) and Ic is a cxc identity matrix (M has c columns).

After a column operation this changes to:

M * C = Ir * M * Ic * C

So the code is like this:

W = M;
R = Ir;
C = Ic;
assert(W == R * M * C);
assert(R.IsInvertible());
assert(C.IsInvertible());
for (...)
{
    if (RowOperation) {
        ApplyRowOperation(W);
        ApplyRowOperation(R);
        assert(W == R * M * C);
        assert(R.IsInvertible());
    } else {
        ApplyColOperation(W);
        ApplyColOperation(C);
        assert(W == R * M * C);
        assert(C.IsInvertible());
    }
}

And the new matrix decomposition, once we know CI, the inverse of C, is:

M = RI * W * CI
where W is in some simpler form
and RI and CI are invertible

Optimization

The above assumes we can calculate RI and CI after the fact, accurately and cheaply. This often isn't really the case, but we can get RI and CI directly by applying the inverse of the row/column operations to another working matrix. It's tricky to get right (a bit of trial and error tends to be involved) but the code looks a bit like this:

W = M;
R = Ir;
C = Ic;
RI = Ir;
CI = Ic;
assert(W == R * M * C);
assert(R * RI == I);
assert(RI * R == I);
assert(C * CI == I);
assert(CI * C == I);
for (...)
{
    if (RowOperation) {
        ApplyRowOperation(W);
        ApplyRowOperation(R);
        ApplyInverseOperationToColumns(RI);
        assert(W == R * M * C);
        assert(R * RI == I)
        assert(RI * R == I)
    } else {
        ApplyColOperation(W);
        ApplyColOperation(C);
        ApplyInverseOperationToRows(CI);
        assert(W = R * M * C);
        assert(C * CI == I);
        assert(CI * C == I);
    }
}

Also, it's almost always cheaper to do row operations than column operations, so it's better to construct RIt and Ct than RI and C, because they can be constructed using row operations:

W = M;
R = Ir;
Ct = Ic;
RIt = Ir;
CI = Ic;
assert(W == R * M * Ct.Transpose());
assert(R * RIt.Transpose() == I);
assert(RIt.Transpose() * R == I);
assert(Ct.Transpose() * CI == I);
assert(CI * Ct.Transpose() == I);
for (...)
{
    if (RowOperation) {
        ApplyRowOperation(W);
        ApplyRowOperation(R);
        ApplyInverseOperationToRows(RIt);
        assert(W == R * M * Ct.Transpose());
        assert(R * RIt.Transpose() == I);
        assert(RIt.Transpose() * R == I);
    } else {
        ApplyColOperation(W);
        ApplyRowOperation(Ct);
        ApplyInverseOperationToRows(CI);
        assert(W == R * M * Ct.Transpose());
        assert(Ct.Transpose() * CI == I);
        assert(CI * Ct.Transpose() == I);
    }
}

Now, only column operations on W are actually done as column operations; everything else works as row operations.

Since row operations are faster than column operations, if you have a batch of column operations to do on W, it's better to do them as row operations. This can be done by carefully transposing matrices. Say your decomposition works like this:

W = M;
R = Ir;
C = Ic;
RI = Ir;
CI = Ic;
DoRowOperationSet(W,R,RI,C,CI);
DoColumnOperationSet(W,R,RI,C,CI);

Then it's better to do it like this:

W = M;
R = Ir;
C = Ic;
RI = Ir;
CI = Ic;
DoRowOperationSet(W,R,RI,C,CI);
W = W.Transpose();
R = R.Transpose();
RI = RI.Transpose();
C = C.Transpose();
CI = CI.Transpose();
// NB: now Wt = Ct * Mt * Rt
DoRowOperationSetInstead(W,C,CI,R,RI);
W = W.Transpose();
R = R.Transpose();
RI = RI.Transpose();
C = C.Transpose();
CI = CI.Transpose();

Wow, that's a lot of transposes! But look what happens if we use RIt and Ct instead:

W = M;
R = Ir;
Ct = Ic;
RIt = Ir;
CI = Ic;
DoRowOperationSet(W,R,RIt,Ct,CI);
DoColumnOperationSet(W,R,RIt,Ct,CI);

this becomes simply:

W = M;
R = Ir;
Ct = Ic;
RIt = Ir;
CI = Ic;
DoRowOperationSet(W,R,RIt,Ct,CI);
W = W.Transpose();
DoRowOperationSetInstead(W,Ct,CI,R,RIt);
W = W.Transpose();

(Can you see why?)

The latter is therefore the preferred form, if a batch of column operations are to be done.

New comments are disabled for this page