Saturday, November 26, 2011

Affine transformations and their inverse

When you're programming games or other 3d applications using OpenGL or DirectX, it is often required to use affine transformation matrices to describe transformations in space (4x4 matrices and the like). There is not a lot of documentation about this online, and most implementations I've seen, have some sort of hack. In this tutorial, I'll describe what affine transformations are, and more importantly, how to invert them correctly and efficiently. It is assumed that the reader knows what a matrices are and how to multiply them.

The affine transformation
Imagine you have a ball lying at (1,0) in your coordinate system. You want to move this ball to (0,2) by first rotating the ball 90 degrees to (0,1) and then moving it upwards with 1. This transformation is described by a rotation and translation. The rotation is: $$ \left[\begin{array}{cc} 0 & -1\\ 1 & 0\\ \end{array}\right] $$ and the translation is (0,1). To apply this transformation to a vector $\vec{x}$, we do: $$\vec{x}^\prime = R \vec{x} + \vec{T}$$ where R is a rotation matrix, and T is a translation vector. This is called an affine transformation.

If you're in 2d space, there is no 2x2 matrix that will do this transformation for all points. However, if we go one dimension higher, to a 3x3 matrix, you can! That's why OpenGL uses 4x4 matrices to describe 3d transformations, as we'll see later. 

The matrix representation
The best way to explain how to make this matrix, is to give the matrix for the example above.

$$ \left[\begin{array}{ccc} 0 & -1 & 0 \\ 1 & 0 & 1 \\ 0 & 0 & 1 \\ \end{array}\right]  $$

As you can see, the left upper block is the rotation matrix, and to the right going downwards we have our translation. Because it's a 3x3 matrix now, we have to apply it to a 3d vector. We take the vector we had, $(1,0)$, and set the third component to 1. See what happens:

$$ \left[\begin{array}{ccc} 0 & -1 & 0 \\ 1 & 0 & 1 \\ 0 & 0 & 1 \\ \end{array}\right] \begin{pmatrix} 1\\0\\1 \end{pmatrix} = \begin{pmatrix} 0\\2\\1 \end{pmatrix} $$

This gives the result we wanted! If you look closer to the matrix multiplication, we see why this works. The trick is to set the last component of the vector to 1, so that the translation just gets added. In a short-hand notation the matrix and vector look like this:

$$ \left[\begin{array}{c|c} R & T \\ \hline 0 & 1 \\ \end{array}\right] \begin{pmatrix} x\\ \hline 1 \end{pmatrix}=\begin{pmatrix} x^\prime\\ \hline 1 \end{pmatrix}$$

Inverting an affine transformation matrix
Sometimes it is very imporant to invert an affine transformation, for example to transform back from world space to object space. A naive approach is to just write a function that inverts 3x3 or 4x4 matrices. This is very inefficient,  because there are some nice properties we can use.


If we think about what happens when we apply the affine transformation matrix, we rotate first over an angle $\alpha$, and then translate over $(T_x, T_y)$. So the inverse should translate first with $(-T_x, -T_y)$, and then rotate over $-\alpha$. Unfortunately, that's not what happens. What does happen is that the rotation is always applied first. So we have to correct for that by modifying our translation. A derivation:

$$
\begin{array}
\vec{x}^\prime = R\vec{x} + T \\
\vec{x}^\prime - T = R\vec{x} \\
R^{-1}(\vec{x}^\prime - T) = \vec{x}\\
\end{array}
$$

So to get back the original vector $\vec{x}$, we have a new affine transformation:

$$ \vec{x} = R^{-1}\vec{x}^\prime  - (R^{-1}T) $$

What we have to do now is calculate the inverse of a rotation matrix and using that result, calculate our new translation.


Let's recall how a general rotation matrix in 2d looks like:

$$ \left[\begin{array}{cc} \cos(\alpha) & - \sin(\alpha) \\ \sin(\alpha) & \cos(\alpha) \\ \end{array}\right] $$

Because a rotation matrix is unitary, the inverse of a rotation matrix is equal to its transpose, so inverting can be done very quickly:

$$ \left[\begin{array}{cc} \cos(\alpha) & \sin(\alpha) \\ -\sin(\alpha) & \cos(\alpha) \\ \end{array}\right] $$

Now all we have to do is apply this to T, to get all the components for our inverse matrix:

$$ \left[\begin{array}{c|c} R^{-1} & R^{-1}T \\ \hline 0 & 1 \\ \end{array}\right] \begin{pmatrix} x^\prime\\ \hline 1 \end{pmatrix}$$

Putting it together

As we've seen, general 2d affine transformation matrices look like

$$ \left[\begin{array}{ccc} \cos(\alpha) & - \sin(\alpha) & T_x \\ \sin(\alpha) & \cos(\alpha) & T_y \\ 0 & 0 & 1 \\ \end{array}\right] $$

Applying the strategy we've derived above, the inverse is:

$$ \left[\begin{array}{ccc} \cos(\alpha) & \sin(\alpha) & -T_x \cos(\alpha) - T_y \sin(\alpha) \\ -\sin(\alpha) & \cos(\alpha) & -T_y \cos(\alpha) + T_x \sin(\alpha) \\ 0 & 0 & 1 \\ \end{array}\right] $$


Expanding to 3d is trivial, since the same rules hold. For example, let's pick the rotation of $\theta$ over the axis $(0,1,0)$ and a translation $(1,2,3)$.

The matrix becomes:
$$ \left[\begin{array}{cccc} \cos(\theta) & 0 &  \sin(\theta) &1 \\ 0 & 1 & 0 & 2\\ -\sin(\theta) & 0 & \cos(\theta) & 3 \\ 0 & 0 & 0 & 1 \\ \end{array}\right] $$

And the inverse is:
$$
\left[


\begin{array}{cccc}
 \cos (\theta ) & 0 & -\sin (\theta ) & 3 \sin (\theta )-\cos (\theta ) \\
 0 & 1 & 0 & -2 \\
 \sin (\theta ) & 0 & \cos (\theta ) & -3 \cos (\theta )-\sin (\theta ) \\
 0 & 0 & 0 & 1
\end{array}

\right]

$$

These 4x4 matrices are the ones that OpenGL expects in functions like glMultMatrixf!

In order to use this knowledge in your code, you should write a matrix class that can 1) create a rotation matrix from an angle and axis 2) transpose a matrix and 3) be applied to a vector.


Conclusion
Hopefully this tutorial has helped you better grasp the concepts of affine transformations. We've seen the definition of these transformations and how to use those to find a shortcut for a quick inversion algorithm. If you have any questions, please let me know!

5 comments:

  1. You have very intellectual to express anything as you discussed affine transformation, an affine transformation is any transformation that preserves col-linearity and ratios of distances. In this sense, affine indicates a special class of projective transformations that do not move any objects from the affine space to the plane at infinity or conversely. An affine transformation is also called an affinity.
    How to Construct a Trapezium

    ReplyDelete
  2. Although this is right as far as it goes, the inversion only covers a rotation+translation transform, not a (rotation+scale)+translation transform, where the R-1 needs to be scaled by it's determinant, which is not unity for a scale-change. This makes the inverse matrix a bit more complicated :-)

    ReplyDelete
  3. Thanks Ben, this would have saved me a lot of time back in the day. I hate to be this guy, but...

    Although @dgk is right as far as it goes, the last missing component is a shear transformation (Assuming we're discussing 2D, I am not sure about higher dimensions). This gives us the sixth degree of freedom, assuming the scaling mentioned earlier is non-uniform.

    With scaling, it's possible for the transform to be non-invertible, giving infinite solutions. With non-uniform scaling or shearing, the quality of an invertible transformation can become unstable (Condition Number). This makes the matrix inversion a bit more complicated.

    Thanks Manoj for your insightful MathWorld quote.

    ReplyDelete
    Replies
    1. In the more general case of an arbitrary 2D or 3D affine transform, I would just use formulas 3 and 6 from MathWorld : Matrix Inverse for the "R" matrix.

      If the determinant is zero, you have no inverse. Be aware of stability concerns and finite precision.

      Delete
    2. This comment has been removed by the author.

      Delete