Chapter 8
Rotation in Three Dimensions
If you do not change direction,
you may end up where you are heading.
— Lao Tzu (600–531 BCE)
This chapter tackles the difficult problem of describing the orientation of an object in 3D. It
also discusses the closely related concepts of rotation and angular displacement. There are
several different ways we can express orientation and angular displacement in 3D. Here we
discuss the three most important methods—matrices, Euler angles, and quaternions—as well as
two lesser known forms—axisangle and exponential map. For each method, we define precisely
how the representation method works, and discuss the peculiarities, advantages, and disadvantages
of the method.
Different techniques are needed in different circumstances, and each technique has its advantages
and disadvantages. It is important to know not only how each method works, but also which
technique is most appropriate for a particular situation and how to convert between
representations.
The discussion of orientation in 3D is divided into the following sections:

Section 8.1 discusses the subtle differences
between terms like “orientation,” “direction,” and “angular
displacement.”

Section 8.2 describes how to express orientation
using a matrix.

Section 8.3 describes how to express angular
displacement using Euler angles.

Section 8.4 describes the axisangle and exponential map
forms.

Section 8.5 describes how to express angular
displacement using a quaternion.

Section 8.6 compares and contrasts the different methods.

Section 8.7 explains how to convert
an orientation from one form to another.
This chapter makes extensive use of the terms
object space and
upright space. If you aren't familiar with these terms, you should flip back to
Section 3.2, where the terms were first introduced.
8.1What Exactly is “Orientation”?
Before we can begin to discuss how to describe orientation in 3D, let us first define exactly
what it is that we are attempting to describe. The term orientation is related to other
similar terms, such as

direction

angular displacement

rotation.
Intuitively, we know that the “orientation” of an object basically tells us what direction the
object is facing. However, “orientation” is not exactly the same as “direction.”
For example, a vector has a direction, but not an orientation. The difference is that when a
vector points in a certain direction, you can twist the vector along its length (see
Figure 8.1), and there is no real change to the vector, since a vector has no
thickness or dimension other than its length.
In contrast to a simple vector, consider an object, such as a jet, facing a certain direction. If
we twist the jet (see Figure 8.2) in the same way that we twisted the vector, we
will change the orientation of the jet. In Section 8.3, we refer to
this twisting component of an object's orientation as bank.
The fundamental difference between direction and orientation is seen concretely by the fact that
we can parameterize a direction in 3D with just two numbers (the spherical coordinate
angles—see Section 7.3.2), whereas an orientation requires a
minimum of three numbers (Euler angles—see Section 8.3).
Section 2.4.1 discussed that it's impossible to describe the position
of an object in absolute terms—we must always do so within the context of a specific reference
frame. When we investigated the relationship between “points” and “vectors,” we noticed
that specifying a position is actually the same as specifying an amount of translation from some
other given reference point (usually the origin of some coordinate system).
In the same way, orientation cannot be described in absolute terms. Just as a position is given
by a translation from some known point, an orientation is given by a rotation from some
known reference orientation (often called the “identity” or “home” orientation). The amount
of rotation is known as an angular displacement. In other words, describing an orientation
is mathematically equivalent to describing an angular displacement.
We say “mathematically equivalent” because in this book, we make a subtle distinction between
“orientation” and terms such as “angular displacement” and “rotation.” It is helpful to
think of an “angular displacement” as an operator that accepts an input and produces an output.
A particular direction of transformation is implied; for example, the angular displacement
from the old orientation to the new orientation, or from upright space
to object space. An example of an angular displacement is, “Rotate
$90{}^{\mathrm{o}}$
about
the
$z$
axis.” It's an action that we can perform on a vector.
However, we frequently encounter state variables and other situations in which this operator
framework of input/output is not helpful and a parent/child relationship is more natural. We
tend to use the word “orientation” in those situations. An example of an orientation is,
“Standing upright and facing east.” It describes a state of affairs.
Of course, we can describe the orientation “standing upright and facing east” as an angular
displacement by saying, “Stand upright, facing north, and then rotate
$90{}^{\mathrm{o}}$
about the
$z$
axis.” This distinction between orientation and angular displacement is similar to the
distinction between points and vectors, which are two other terms that are equivalent
mathematically but not identical conceptually. In both cases, the first term is used primarily
to describe a single state, and the second term primarily used to describe a difference between
two states. Of course, these conventions are purely a matter of preference, but they can be
helpful.
You might also hear the word “attitude” used to refer the orientation of an object, especially
if that object is an aircraft.
One way to describe the orientation of a coordinate space in 3D is to tell which way the basis
vectors of that coordinate space (the
$+x$
,
$+y$
, and
$+z$
axes) point. Of course, we don't
measure these vectors in the coordinate space we are attempting to describe—by definition, they
are
$[1,0,0]$
,
$[0,1,0]$
, and
$[0,0,1]$
no matter what orientation the coordinate space is in.
We must describe the basis vectors using some other coordinate space. By doing so, we've
established the relative orientation of the two coordinate spaces.
When these basis vectors are used to form the rows of a
$3\times 3$
matrix, we have expressed
the orientation in matrix form. Another way of saying all this is that we can express the
relative orientation of two coordinate spaces by giving a rotation matrix that can be used to
transform vectors from one coordinate space to the other.
We have already seen how a matrix can be used to transform points from one coordinate space to
another. In Figure 8.3, the matrix in the upper righthand corner can be used
to rotate points from the object space of the jet into upright space. We've pulled out the rows
of this matrix to emphasize their direct relationship to the coordinates for the jet's body axes.
The rotation matrix contains the object axes, expressed in upright space. Simultaneously, it is
a rotation matrix: we can multiply row vectors by this matrix to transform those vectors from
objectspace coordinates to uprightspace coordinates.
Legitimate question to ask are: Why does the matrix contain the body axes expressed using
uprightspace coordinates? Why not the upright axes expressed in objectspace coordinates?
Another way to phrase this is, Why did we choose to give a rotation matrix that transforms
vectors from object space to upright space? Why not from upright space to object space?
From a mathematical perspective, this question is a bit ridiculous. Because rotation matrices
are orthogonal, their inverse is the same as their transpose (see
Section 6.3.2). Thus, the decision is entirely a
cosmetic one.
But practically speaking, in our opinion, it is quite important. At issue is whether you can
write code that is intuitive to read and works the first time, or whether it requires a lot of
work to decipher, or a knowledge of conventions that are not stated because they are “obvious”
to everyone but you. So please allow us a brief digression to continue a line of thought begun
when we introduced the term “upright space” in Section 3.2.4
concerning the practical aspects of what happens when the math of coordinate space
transformations gets translated into code. Also please allow some latitude to express some
opinions based on our observations watching programmers grapple with rotation matrices. We don't
expect that everyone will agree with our assertions, but we hope that every reader will at least
appreciate the value in considering these issues.
Certainly every good math library will have a
$3\times 3$
matrix class that can represent any
arbitrary transformation, which is to say that it makes no assumptions about the value of the
matrix elements. (Or perhaps it is a
$4\times 4$
matrix that can do projection, or a
$4\times 3$
which can do translation but not projection—those distinctions are not important here.) For
a matrix like this, the operations inherently are in terms of some input coordinate space and an
output coordinate space. This is just implicit in the idea of matrix multiplication. If you
need to go from output to input, then you must obtain the inverse of the matrix.
It is common practice to use the generic transform matrix class to describe the orientation of an
object. In this case, rotation is treated just like any other transformation. The interface
remains in terms of a source and destination space. Unfortunately, it is our experience that the
following two matrix operations are by far the most commonly used:

Take an objectspace vector and express it in upright coordinates.

Take an uprightspace vector and express it in object coordinates.
Notice that we need to be able to go in both directions. We have no experience or evidence that
either direction is significantly more common than the other. But more important, the very
nature of the operations and the way programmers think about the operations is in terms of “object
space” and “upright space” (or some other equivalent terminology, such as “parent space” and
“child space”). We do not think of them in terms of a source space and a destination
space. It is in this context that we wish to consider the question posed at the beginning of
this section: Which matrix should we use?
First, we should back up a bit and remind ourselves of the mathematically moot but yet
conceptually important distinction between orientation and angular displacement. (See the notes
on terminology at the end of Section 8.1.) If your purpose is to create a matrix
that performs a specific angular displacement (for example, “rotate 30 degrees about the
$x$
axis”), then the two operations above are not really the ones you probably have in your
head, and using a generic transform matrix with its implied direction of transformation is no
problem, and so this discussion does not apply. Right now, we are focusing on the situation in
which the orientation of some object is stored as a state variable.
Let's assume that we adopt the common policy and store orientation using the generic
transformation matrix. We are forced to arbitrarily pick a convention, so let's decide that
multiplication by this matrix will transform from object to upright space. If we have a vector
in upright space and we need to express it in objectspace coordinates, we must multiply this
vector by the inverse of the matrix.
Now let's see how our policy affects the code that is written and read hundreds of times by
average game programmers.

Rotate some vector from object space to upright space is translated into code
as multiplication by the matrix.

Rotate a vector from upright space to object space is translated into code
as multiplication by the inverse (or transpose) of the matrix.
Notice that the code does not match onetoone with the highlevel intentions of the programmer.
It forces every user to remember what the conventions are every time they use the matrix. It is
our experience that this coding style is a contributing factor to the difficulty that beginning
programmers have in learning how to use matrices; they often end up transposing and negating
things randomly when things don't look right.
We have found it helpful to have a special
$3\times 3$
matrix class that is used exclusively for
storing the orientation of an object, not for arbitrary transforms. The class assumes, as an
invariant, that the matrix is orthogonal, meaning it contains only rotation. (We also would
probably assume that the matrix does not contain a reflection, even though that is possible in an
orthogonal matrix.) With these assumptions in place, we are now free to perform rotations using
the matrix at a higher level of abstraction. Our interface functions match exactly the
highlevel intentions of the programmer. Furthermore, we have removed the confusing linear
algebra details having to do with row vectors versus column vectors, which space is on the left
or right, which way is the regular way and which is the inverse, and so forth. Or rather, we
have confined such details to the class internals—the person implementing the class certainly
needs to pick a convention (and hopefully document it). In fact, in this specialized matrix
class, the operations of “multiply a vector” and “invert this matrix” really are not that
useful. We would advocate keeping this dedicated matrix class confined to operations in terms of
upright space and object space, rather than multiply a vector.
So, back to the question posed at the start of this section: Which matrix should we use? Our
answer is, “It shouldn't matter.” By that we mean there is a way to design your matrix code in
such a way that it can be used without knowing what choice was made. As far as the C++ code
goes, this is purely a cosmetic change. For example, perhaps we just replace the function name
multiply() with objectToUpright(), and likewise we replace
multiplyByTranspose() with uprightToObject(). The version of the code
with descriptive, named coordinate spaces is easier to read and write.
8.2.2Direction Cosines Matrix
You might come across the (very old school) term direction cosines in the context of using
a matrix to describe orientation. A direction cosines matrix is the same thing as a rotation
matrix; the term just refers to a special way to interpret (or construct) the matrix, and this
interpretation is interesting and educational, so let's pause for a moment to take a closer look.
Each element in a rotation matrix is equal to the dot product of a cardinal axis in one space
with a cardinal axis in the other space. For example, the center element
${m}_{22}$
in a
$3\times 3$
matrix gives the dot product that the
$y$
axis in one space makes with the
$y$
axis in the
other space.
More generally, let's say that the basis vectors of a coordinate space are the mutually
orthogonal unit vectors
$\mathbf{p}$
,
$\mathbf{q}$
, and
$\mathbf{r}$
, while a second coordinate
space with the same origin has as its basis a different (but also orthonormal) basis
${\mathbf{p}}^{\prime}$
,
${\mathbf{q}}^{\prime}$
, and
${\mathbf{r}}^{\prime}$
. (Please allow us to break from convention by
dropping all the hats from the unit vectors in this section, to avoid distracting clutter in the
equations.) The rotation matrix that rotates row vectors from the first space to the second can
be constructed from the cosines of the angles between each pair of basis vectors. Of course, the
dot product of two unit vectors is exactly equal to the cosine of the angle between them, so the
matrix product is
$$\begin{array}{}\text{(8.1)}& \mathbf{v}\left[\text{}\begin{array}{ccc}\mathbf{p}\cdot {\mathbf{p}}^{\prime}& \mathbf{q}\cdot {\mathbf{p}}^{\prime}& \mathbf{r}\cdot {\mathbf{p}}^{\prime}\\ \mathbf{p}\cdot {\mathbf{q}}^{\prime}& \mathbf{q}\cdot {\mathbf{q}}^{\prime}& \mathbf{r}\cdot {\mathbf{q}}^{\prime}\\ \mathbf{p}\cdot {\mathbf{r}}^{\prime}& \mathbf{q}\cdot {\mathbf{r}}^{\prime}& \mathbf{r}\cdot {\mathbf{r}}^{\prime}\end{array}\right]={\mathbf{v}}^{\prime}.\end{array}$$
These axes can be interpreted as geometric rather than numeric entities, so it really does not
matter what coordinates are used to describe the axes (provided we use the same coordinate space to
describe all of them), the rotation matrix will be the same.
For example, let's say that our axes are described using coordinates relative to the first basis.
Then
$\mathbf{p}$
,
$\mathbf{q}$
, and
$\mathbf{r}$
have the trivial forms
$[1,0,0]$
,
$[0,1,0]$
and
$[0,0,1]$
, respectively. The basis vectors of the second space,
${\mathbf{p}}^{\prime}$
,
${\mathbf{q}}^{\prime}$
,
and
${\mathbf{r}}^{\prime}$
have arbitrary coordinates. When we substitute the trivial vectors
$\mathbf{p}$
,
$\mathbf{q}$
, and
$\mathbf{r}$
into the matrix in Equation (8.1) and
expand the dot products, we get
$$\left[\text{}\begin{array}{ccc}[1,0,0]\cdot {\mathbf{p}}^{\prime}& [0,1,0]\cdot {\mathbf{p}}^{\prime}& [0,0,1]\cdot {\mathbf{p}}^{\prime}\\ [1,0,0]\cdot {\mathbf{q}}^{\prime}& [0,1,0]\cdot {\mathbf{q}}^{\prime}& [0,0,1]\cdot {\mathbf{q}}^{\prime}\\ [1,0,0]\cdot {\mathbf{r}}^{\prime}& [0,1,0]\cdot {\mathbf{r}}^{\prime}& [0,0,1]\cdot {\mathbf{r}}^{\prime}\end{array}\right]=\left[\text{}\begin{array}{ccc}{p}_{x}^{\prime}& {p}_{y}^{\prime}& {p}_{z}^{\prime}\\ {q}_{x}^{\prime}& {q}_{y}^{\prime}& {q}_{z}^{\prime}\\ {r}_{x}^{\prime}& {r}_{y}^{\prime}& {r}_{z}^{\prime}\end{array}\right]=\left[\begin{array}{c}{\mathbf{p}}^{\prime}\\ {\mathbf{q}}^{\prime}\\ {\mathbf{r}}^{\prime}\end{array}\right].$$
In other words, the rows of the rotation matrix are the basis vectors of the output coordinate
space, expressed by using the coordinates of the input coordinate space. Of course, this fact is
not just true for rotation matrices, it's true for all transformation matrices. This is
the central idea of why a transformation matrix works, which was developed in
Section 4.2.
Now let's look at the other case. Instead of using coordinates relative to the first basis,
we'll measure everything using the second coordinate space (the output space). This time,
${\mathbf{p}}^{\prime}$
,
${\mathbf{q}}^{\prime}$
, and
${\mathbf{r}}^{\prime}$
have trivial forms, and
$\mathbf{p}$
,
$\mathbf{q}$
, and
$\mathbf{r}$
are arbitrary. Putting these into the direction cosines matrix
produces
$$\left[\text{}\begin{array}{ccc}\mathbf{p}\cdot [1,0,0]& \mathbf{q}\cdot [1,0,0]& \mathbf{r}\cdot [1,0,0]\\ \mathbf{p}\cdot [0,1,0]& \mathbf{q}\cdot [0,1,0]& \mathbf{r}\cdot [0,1,0]\\ \mathbf{p}\cdot [0,0,1]& \mathbf{q}\cdot [0,0,1]& \mathbf{r}\cdot [0,0,1]\end{array}\right]=\left[\text{}\begin{array}{ccc}{p}_{x}& {q}_{x}& {r}_{x}\\ {p}_{y}& {q}_{y}& {r}_{y}\\ {p}_{z}& {q}_{z}& {r}_{z}\end{array}\right]=\left[\begin{array}{ccc}& & \\ {\mathbf{p}}^{\mathrm{T}}& {\mathbf{q}}^{\mathrm{T}}& {\mathbf{r}}^{\mathrm{T}}\\ & & \end{array}\right].$$
This says that the columns of the rotation matrix are formed from the basis vectors of the
input space, expressed using the coordinates of the output space. This is not true of
transformation matrices in general; it applies only to orthogonal matrices such as rotation
matrices.
Also, remember that our convention is to use row vectors on the left. If you are using column
vectors on the right, things will be transposed.
Matrix form is a very explicit form of representing orientation. This explicit
nature provides some benefits.

Rotation of vectors is immediately available. The most
important property of matrix form is that you can use a matrix to rotate
vectors between object and upright space. No other representation of
orientation allows this—to
rotate vectors, we must convert the orientation to matrix form.

Format used by graphics APIs. Partly due to reasons in
the previous item, graphics APIs use matrices to express orientation.
(API stands for Application Programming Interface. Basically, this is
the code we use to communicate with the graphics hardware.)
When we are communicating with the API, we are going to have to
express our transformations as matrices. How we store
transformations internally in our program is up to us, but if we
choose another representation, we are going to have to convert them
into matrices at some point in the graphics pipeline.

Concatenation of multiple angular displacements. A third advantage of
matrices is that it is possible to “collapse” nested coordinate space
relationships. For example, if we know the orientation of object A relative to
object B, and we know the orientation of object B relative to object C, then by using
matrices, we can determine the orientation of object A relative to object C. We
encountered these concepts before when we discussed nested coordinate spaces in
Chapter 3, and then we discussed how matrices could be concatenated
in Section 5.6.

Matrix inversion. When an angular displacement is represented in matrix
form, it is possible to compute the “opposite” angular displacement by using matrix
inversion. What's more, since rotation matrices are orthogonal, this computation is
a trivial matter of transposing the matrix.
The explicit nature of a matrix provides some advantages, as we have just discussed. However, a
matrix uses nine numbers to store an orientation, and it is possible to parameterize orientation
with only three numbers. The “extra” numbers in a matrix can cause some problems.

Matrices take more memory. If we need to store many orientations (for
example, keyframes in an animation sequence), that extra space for nine numbers
instead of three can really add up. Let's take a modest example. Let's say we are
animating a model of a human that is broken up into 15 pieces for different body
parts. Animation is accomplished strictly by controlling the orientation of each
part relative to its parent part. Assume we are storing one orientation for each
part, per frame, and our animation data is stored at a modest rate, say, 15 Hz. This
means we will have 225 orientations per second. Using matrices and 32bit floating
point numbers, each frame will take 8,100 bytes. Using Euler angles (which we will
meet next in Section 8.3), the same data would take only 2,700 bytes.
For a mere 30 seconds of animation data, matrices would take 162K more than the same
data stored using Euler angles!

Difficult for humans to use. Matrices are not intuitive for humans to work
with directly. There are just too many numbers, and they are all between
$1$
and
$+1$
. What's more, humans naturally think about orientation in terms of angles, but a
matrix is expressed in terms of vectors. With practice, we can learn how to decipher the
orientation from a given matrix. (The techniques from
Section 4.2 for visualizing a matrix help a lot for
this.) But still, this is much more difficult than Euler angles. And going the other
way is much more difficult—it would take forever to construct the matrix for
a nontrivial orientation by hand. In general, matrices just aren't the way people
naturally think about orientation.

Matrices can be illformed. As we have said, a matrix uses nine numbers,
when only three are necessary. In other words, a matrix contains six degrees of
redundancy. There are six constraints that must be satisfied for a matrix to be
“valid” for representing an orientation. The rows must be unit vectors, and they
must be mutually perpendicular (see
Section 6.3.2).
Let's consider this last point in more detail. If we take any nine numbers at random and create a
$3\times 3$
matrix, it is very unlikely that these six constraints will be satisfied, and thus
the nine numbers will not form a valid rotation matrix. In other words, matrices can be
illformed, at least for purposes of representing an orientation. Illformed matrices can be a
problem because they can lead to numerical exceptions, weird stretched graphics, and other
unexpected behavior.
How could we ever end up with a bad matrix? There are several ways:

We may have a matrix that contains scale, skew, reflection, or projection.
What is the “orientation” of an object that has been affected by such
operations? There really isn't a clear definition for this. Any
nonorthogonal matrix is not a welldefined rotation matrix. (See
Section 6.3 for a complete discussion
on orthogonal matrices.) And reflection matrices (which are orthogonal)
are not valid rotation matrices, either.

We may just get bad data from an external source. For
example, if we are using a physical data acquisition system, such as
motion capture, there could be errors due to the capturing process.
Many modeling packages are notorious for producing illformed
matrices.

We can actually create bad data due to floating point round off error. For
example, suppose we apply a large number of incremental changes to an orientation,
which could routinely happen in a game or simulation that allows a human to
interactively control the orientation of an object. The large number of matrix
multiplications, which are subject to limited floating point precision, can result in
an illformed matrix. This phenomenon is known as matrix creep. We can
combat matrix creep by orthogonalizing the matrix, as we already discussed in
Section 6.3.3.
Let's summarize what Section 8.2 has said about matrices.

Matrices are a “brute force” method of expressing orientation: we
explicitly list the basis vectors of one space in the coordinates
of some different space.

The term direction cosines matrix alludes to the fact that
each element in a rotation matrix is equal to the dot product of one
input basis vector with one output basis vector. Like all
transformation matrices, the rows of the matrix are the outputspace coordinates
of the inputspace basis vectors. Furthermore, the columns
of a rotation matrix are the inputspace coordinates of the outputspace
basis vectors, a fact that is only true by virtue of the orthogonality of
a rotation matrix.

The matrix form of representing orientation is useful primarily
because it allows us to rotate vectors between coordinate spaces.

Modern graphics APIs express orientation by using matrices.

We can use matrix multiplication to collapse matrices for nested
coordinate spaces into a single matrix.

Matrix inversion provides a mechanism for determining the
“opposite” angular displacement.

Matrices can take two to three times as much memory as other
techniques. This can become significant when storing
large numbers of orientations, such as animation data.

The numbers in a matrix aren't intuitive for humans to work with.

Not all matrices are valid for describing an orientation. Some matrices
contain mirroring or skew. We can end up with a illformed matrix
either by getting bad data from an external source or through matrix
creep.
8.3Euler Angles
Another common method of representing orientation is known as Euler angles. (Remember,
Euler is pronounced “oiler,” not “yoolur.”) The technique is named after the famous
mathematician who developed them,
Leonhard Euler (1707–1783).
Section 8.3.1 describes how Euler angles work and discusses the most
common conventions used for Euler angles. Section 8.3.2 discusses
other conventions for Euler angles, including the important fixed axis system. We consider
the advantages and disadvantages of Euler angles in Section 8.3.3 and
Section 8.3.4. Section 8.3.5 summarizes the most
important concepts concerning of Euler angles.
This section utilizes many ideas, terms, and conventions
from
Section 7.3.2 concerning
spherical coordinates.
8.3.1What Are Euler Angles?
The basic idea behind Euler angles is to define an angular displacement as a sequence of three
rotations about three mutually perpendicular axes. This sounds complicated, but actually it is
quite intuitive. (In fact, its ease of use by humans is one of its primary advantages.)
So Euler angles describe orientation as three rotations about three mutually perpendicular axes.
But which axes? And in what order? As it turns out, any three axes in any order will work, but
most people have found it practical to use the cardinal axes in a particular order. The most
common convention, and the one we use in this book, is the socalled “headingpitchbank”
convention for Euler angles. In this system, an orientation is defined by a heading
angle, a pitch angle, and a bank angle.
Before we define the terms heading, pitch, and bank precisely, let us briefly review the
coordinate space conventions we use in this book. We use a lefthanded system, where
$+x$
is to
the right,
$+y$
is up, and
$+z$
is forward. (Check out
Figure 1.15 for an illustration.) Also, if you have
forgotten how positive rotation is defined according to the lefthand rule, you might want to
flip back to Figure 1.14 to refresh your memory.
Given heading, pitch, and bank angles, we can determine the orientation described by these Euler
angles using a simple fourstep process.
Step 1.Begin in the “identity” orientation—that is, with the
objectspace axes aligned with the upright axes.
Step 2.Perform the heading rotation, rotating about the
$y$
axis, as shown in Figure 8.5. Positive rotation
rotates to the right (clockwise when viewed from above).
Step 3.After heading has been applied, pitch measures the amount of
rotation about the
$x$
axis. This is the objectspace
$x$
axis, not the upright
$x$
axis. Staying consistent with the lefthand rule, positive rotation rotates
downward. In other words, pitch actually measures the angle of
declination. This is illustrated in Figure 8.6.
Step 4.After heading and pitch angles have been applied, bank
measures the amount of rotation about the
$z$
axis. Again, this is
the objectspace
$z$
axis, not the original uprightspace
$z$
axis.
The lefthand rule dictates that positive bank rotates
counterclockwise when viewed from the origin looking towards
$+z$
. This is
illustrated in Figure 8.7.
It may seem contradictory that positive bank is counterclockwise, since positive heading is
clockwise. But notice that positive heading is clockwise when viewed from the positive end of
the axis towards the origin, the opposite perspective from the one used when judging
clockwise/counterclockwise for bank. If we look from the origin to the positive end of the
$y$
axis, then positive heading does rotate counterclockwise. Or if we look from the positive
end of the
$z$
axis towards the origin (looking backward from in front of the object), then
positive bank appears to rotate the object clockwise. In either case, the lefthand rule
prevails.
Now you have reached the orientation described by the Euler angles. Notice the similarity of
Steps 1–3 to the procedure used in Section 7.3.2 to locate the
direction described by the spherical coordinate angles. In other words, we can think of heading
and pitch as defining the basic direction that the object is facing, and bank defining the amount
of twist.
8.3.2Other Euler Angle Conventions
The headingpitchbank system described in the previous section isn't the only way to define a
rotation using three angles about mutually perpendicular axes. There are many variations on this
theme. Some of these differences turn out to be purely nomenclature; others are more meaningful.
Even if you like our conventions, we encourage you to not skip this section, as some very
important concepts are discussed; these topics are the source of much confusion, which we hope to
dispel.
First of all, there is the trivial issue of naming. The most common variation you will find was
made popular by the field of aerospace, the yawpitchroll method. The term “roll” is completely synonymous with bank, and for all purposes they
are identical. Similarly, within the limited context of yawpitchroll, the term “yaw” is
practically identical to the term heading. (However, in a broader sense, the word “yaw” actually
has a subtly different meaning, and it is this subtle difference that drives our preference for
the term heading. We discuss this rather nitpicky distinction in just a moment, but for
the moment yaw and heading are the same.) So essentially yawpitchroll is the same system as
headingpitchbank.
Other less common terms are often used. Heading also goes by the name azimuth. The vertical
angle that we call pitch is also called attitude or elevation. The final angle of rotation,
which we call “bank,” is sometimes called tilt or twist.
And, of course, there are those perverse mathematicians who (motivated by the need to save space
when writing on a chalkboard?) insist on assaulting your eyeballs with a slew of Greek letters.
You may see any of the following:
It's all Greek to us
$(\varphi ,\theta ,\psi )$

$(\psi ,\theta ,\varphi )$

$(\mathrm{\Omega},i,\omega )$

$(\alpha ,\beta {\textstyle \phantom{\rule{thinmathspace}{0ex}}}\gamma )$
. 
Of course, these are cosmetic differences. Perhaps more interesting is that fact that you will
often hear these same three words listed in the opposite order: rollpitchyaw. (A quick Google
search for “roll pitch yaw” or “yaw pitch roll” yields plenty of results for both forms, with
neither appearing more predominant.) Considering how the order of rotations is so critical, are
people really that perverse that they choose to list them in the reverse order? We're not just
dwelling on terminology here; the distinctions in thinking hinted at by the differences in
terminology will actually become useful when we consider how to convert Euler angles to a
rotation matrix. As it turns out, there is a perfectly reasonable explanation for this
“backwards” convention: it's the order in which we actually do the rotations inside a computer!
The fixedaxis system is very closely related to the Euler angle system. In an Euler
angle system, the rotation occurs about the body axes, which change after each rotation.
Thus, for example, the physical axis for the bank angle is always the longitudinal body space
axis, but in general it is arbitrarily oriented in upright space. In a fixedaxis system, in
contrast, the axes of rotation are always the fixed, upright axes. But as it turns out,
the fixedaxis system and the Euler angle system are actually equivalent, provided that we
take the rotations in the opposite order.
You should visualize the following example to
convince yourself this is true. Let's say we have a heading (yaw) of
$h$
and a pitch of
$p$
.
(We'll ignore bank/roll for the moment.) According to the Euler angle convention, we first do the
heading axis and rotate about the vertical axis (the
$y$
axis) by
$h$
. Then we rotate about the
objectspace lateral axis (the
$x$
axis) by the angle
$p$
. Using a fixedaxis scheme, we
arrive at this same ending orientation by doing the rotations in the opposite order. First, we do
the pitch, rotating about the upright
$x$
axis by
$p$
. Then, we perform the heading rotation,
rotating about the upright
$y$
axis by
$h$
. Although we might visualize Euler angles, inside a
computer when rotating vectors from upright space to object space, we actually use a fixedaxis
system. We discuss this in greater detail in Section 8.7.1, when we show how to
convert Euler angles to a rotation matrix. The fixedaxis conventions are also called
extrinsic, the typical Euler angle conventions being referred to as intrinsic.
Euler angles rotate about the body axes, so the axis of rotation for a given step depends
on the angles used in prior rotations. In the fixedaxis system, the axes of rotation are
always the same—the upright axes. The two systems are equivalent, provided that the
rotations are performed in the opposite order.
Now we'd like to make a brief but humble campaign for a more precise use of the term “yaw.” A lot
of aeronautical terminology is inherited nautical terminology. In a nautical context, the original meaning of the word “yaw” was
essentially the same thing as heading, both in terms of absolute angle and also a change in that
angle. In the context of airplanes and other freely rotating bodies, however, we don't feel that
yaw and heading are the same thing. A yawing motion produces a rotation about the object
$y$
axis, whereas a change in heading produces a rotation about the upright
$y$
axis. For
example, when the pilot of an airplane uses the pedals to control the rudder, he is performing a
yaw rotation, because the rotation caused by the rudder is always about the objectspace
$y$
axis of the plane. Imagine a plane diving straight down. If the pilot performs a
$90{}^{\mathrm{o}}$
yaw, the plane will end up “on its ear,” no longer looking downward, but looking
towards the horizon, banked
$90{}^{\mathrm{o}}$
. This is illustrated in Figure 8.8.
In contrast, when players navigating a firstperson shooter move the mouse from left to right,
they are performing a heading rotation. The rotation is always about the vertical axis
(the upright
$y$
axis). If players are looking downward and move the mouse horizontally to
perform a heading rotation, they continue to look downward and spin in place. The point is
certainly not that heading is better than yaw because that's what we do in firstperson
shooters. The point is that a yawing motion cannot be accomplished by adjusting a single
Euler angle, but a heading motion can. That's why we think “heading” is a better term:
it's the action that results when you make an incremental change to the first Euler angle.
Alas, the same argument can be leveled against the term “pitch.” If bank is nonzero, an incremental change to the
middle Euler angle does not produce a rotation about the object's lateral axis. But then, there
isn't really a simple, good word to describe the angle that the object's longitudinal axis makes with
the horizontal, which is what the middle Euler angle really specifies. (“Inclination” is no
good as it is specific to the righthanded conventions.)
We hope you have read our opinions with the humility we intended, and also have received the more
important message: investigating (seemingly cosmetic) differences in convention can sometimes
lead us to a deeper understanding of the finer points. And then sometimes it's just plain
nitpicking. Generations of aerospace engineers have been putting men on the moon and robots on
Mars, and building airplanes that safely shuttle the authors to and from distant cities, all the
while using the terms yaw and roll. Would you believe that some of these guys don't even know
who we are!? Given the choice to pick your own terminology, we say to use the word “heading” when
you can, but if you hear the word “yaw,” then for goodness sake don't make as big of a deal out of
it as we have in these pages, especially if the person you are talking to is smarter than you.
Although in this book we do not follow the righthanded aerospace coordinate conventions (and we
have a minor quibble about terminology), when it comes to the basic strategy of Euler angles, in
a physical sense, we believe complete compliance with the wisdom of the aerospace forefathers is
the only way to go, at least if your universe has some notion of “ground.” Remember that, in
theory, any three axes can be used as the axes of rotation, in any order. But really, the
conventions they chose are the only ones that make any practical sense, if you want the
individual angles to be useful and meaningful. No matter how you label your axes, the first
angle needs to rotate about the vertical, the second about the body lateral axis, and the third
about the body longitudinal axis.
As if these weren't enough complications, let us throw in a few more. In the system we have been
describing, each rotation occurs about a different body axes.
However, Euler's own original system was a “symmetric” system in which the
first and last rotations are performed around the same axis. These methods are more
convenient in certain situations, such as describing the motion of a top, where the three angles
correspond to precession, nutation, and spin. You may encounter some purists who object to the
name “Euler angles” being attached to an asymmetric system, but this usage is widespread in
many fields, so rest assured that you outnumber them. To distinguish between the two systems, the
symmetric Euler angles are sometimes called “proper” Euler angles, with the more common
conventions being called TaitBryan angles, first documented by the aerospace forefathers
we mentioned [1]. O'Reilly [10]
discusses proper Euler angles, even more methods of describing rotation, such as the
Rodrigues vector, CayleyKlein parameters, and interesting historical remarks. James Diebel's
summary [3] compares different Euler angle conventions and the
other major methods for describing rotation, much as this chapter does, but assumes a higher
level of mathematical sophistication.
If you have to deal with Euler angles that use a different convention from the one you prefer, we
offer two pieces of advice:

First, make sure you understand exactly how the other Euler angle
system works. Little details such as the definition of positive rotation
and order of rotations make a big difference.

Second, the easiest way to convert the Euler angles to your format is to
convert them to matrix form and then convert the matrix back to your
style of Euler angles. We will learn how to perform these conversions
in Section 8.7. Fiddling with the angles
directly is much more difficult than it would seem. See
[12]
for more information.
8.3.3Advantages of Euler Angles
Euler angles parameterize orientation using only three numbers, and these numbers are angles.
These two characteristics of Euler angles provide certain advantages over other forms of
representing orientation.

Euler angles are easy for humans to use—considerably easier than matrices
or quaternions. Perhaps this is because the numbers in an Euler angle triple are
angles, which is naturally how people think about orientation. If the
conventions most appropriate for the situation are chosen, then the most
practical angles can be expressed directly. For example, the angle of declination is
expressed directly by the headingpitchbank system. This ease of use is a serious
advantage. When an orientation needs to be displayed numerically or entered at the
keyboard, Euler angles are really the only choice.

Euler angles use the smallest possible representation. Euler angles use
three numbers to describe an orientation. No system can parameterize 3D orientation
using fewer than three numbers. If memory is at a premium, then Euler angles are the
most economical way to represent an orientation.
Another reason to choose Euler angles when you need to save space is that the numbers
you are storing are more easily compressed. It's relatively easy to pack Euler angles
into a smaller number of bits using a trivial fixedprecision system. Because Euler
angles are angles, the data loss due to quantization is spread evenly. Matrices
and quaternions require using very small numbers, because the values stored are sines
and cosines of the angles. The absolute numeric difference between two values is not
proportionate to the perceived difference, however, as it is with Euler angles. In
general, matrices and quaternions don't pack into a fixedpoint system easily.
Bottom line: if you need to store a lot of 3D rotational data in as little memory as
possible, as is very common when handling animation data, Euler angles (or the
exponential map format—to be discussed in Section 8.4)
are the best choices.

Any set of three numbers is valid. If we take any three numbers at random,
they form a valid set of Euler angles that we can interpret as an expression of an
orientation. In other words, there is no such thing as an invalid set of Euler angles.
Of course, the numbers may not be correct but at least they are valid.
This is not the case with matrices and quaternions.
8.3.4Disadvantages of Euler Angles
This section discusses some disadvantages of the Euler angle method of representing orientation;
primarily,

The representation for a given orientation is not unique.

Interpolating between two orientations is problematic.
Let's address these points in detail. First, we have the problem that for a given orientation,
there are many different Euler angle triples that can be used to describe that orientation. This
is known as aliasing and can be somewhat of an inconvenience. These irritating problems are
very similar to those we met dealing with spherical coordinates in
Section 7.3.4. Basic questions such as “Do two Euler angle triples
represent the same angular displacement?” are difficult to answer due to aliasing.
We've seen one trivial type of aliasing before with polar coordinates: adding a multiple of
$360{}^{\mathrm{o}}$
does not change the orientation expressed, even though the numbers are different.
A second and more troublesome form of aliasing occurs because the
three angles are not completely independent of each other. For
example, pitching down 135° is the same as heading
180°, pitching down 45°, and then banking 180°.
To deal with aliasing of spherical coordinates, we found it useful to establish a canonical
set; any given point has a unique representation in the canonical set that is the “official” way
to describe that point using polar coordinates. We use a similar technique for Euler angles. In
order to guarantee a unique Euler angle representation for any given orientation, we restrict the
ranges of the angles. One common technique is to limit heading and bank to
$(180{}^{\mathrm{o}},+180{}^{\mathrm{o}}]$
and to limit pitch to
$[90{}^{\mathrm{o}},+90{}^{\mathrm{o}}]$
. For any orientation, there
is only one Euler angle triple in the canonical set that represents that orientation. (Actually,
there is one more irritating singularity that must be handled, which we describe in just a moment.)
Using canonical Euler angles simplifies many basic tests such as “am I facing approximately
east?”
The most famous (and irritating) type of aliasing problem suffered by Euler angles is illustrated
by this example: if we head right
$45{}^{\mathrm{o}}$
and then pitch down
$90{}^{\mathrm{o}}$
, this is the same
as pitching down 90°and then banking 45°. In fact, once we chose
$\pm 90{}^{\mathrm{o}}$
as the pitch angle, we are restricted to rotating about the vertical axis. This phenomenon, in
which an angle of
$\pm 90{}^{\mathrm{o}}$
for the second rotation can cause the first and third rotations
to rotate about the same axis, is known as Gimbal lock. To remove this aliasing from the
canonical set of Euler angle triples, we assign all rotation about the vertical axis to
heading in the Gimbal lock case. In other words, in the canonical set, if pitch is
$\pm 90{}^{\mathrm{o}}$
, then bank is zero.
This last rule for Gimbal lock completes the rules for the canonical
set of Euler angles:
Conditions satisfied by Euler angles in the canonical set
$$\begin{array}{c}180{}^{\mathrm{o}}<h\le 180{}^{\mathrm{o}}\\ 90{}^{\mathrm{o}}\le p\le 90{}^{\mathrm{o}}\\ 180{}^{\mathrm{o}}<b\le 180{}^{\mathrm{o}}\\ p=\pm 90{}^{\mathrm{o}}\text{}\text{}\Rightarrow \text{}\text{}b=0.\end{array}$$
When writing C++ that accepts Euler angle arguments, it's usually best to ensure that they work
given Euler angles in any range. Luckily this is usually easy; things frequently just work without
taking any extra precaution, especially if the angles are fed into trig functions. However, when
writing code that computes or returns Euler angles, it's good practice to return the
canonical Euler angle triple. The conversion methods shown in
Section 8.7 demonstrate these principles.
A common misconception is that, because of Gimbal lock, certain orientations cannot be described
using Euler angles. Actually, for the purposes of describing an orientation, aliasing
doesn't pose any problems. To be clear, any orientation in 3D can be described by using
Euler angles, and that representation is unique within the canonical set. Also, as we mentioned in
the previous section, there is no such thing as an “invalid” set of Euler angles. Even if the
angles are outside the usual range, we can always agree on what orientation is described by the
Euler angles.
So for purposes of simply describing orientation, aliasing isn't a huge problem, especially
when canonical Euler angles are used. So what's so bad about aliasing and Gimbal lock? Let's say we
wish to interpolate between two orientations
${\mathbf{R}}_{0}$
and
${\mathbf{R}}_{1}$
. In other words, for
a given parameter
$t$
,
$0\le t\le 1$
, we wish to compute an intermediate orientation
$\mathbf{R}(t)$
that interpolates smoothly from
${\mathbf{R}}_{0}$
to
${\mathbf{R}}_{1}$
as
$t$
varies from
$0$
to
$1$
. This is an extremely useful operation for character animation and camera control, for
example.
The naïve approach to this problem is to apply the standard linear interpolation formula
(“lerp”) to each of the three angles independently:
Simple linear interpolation between two angles
$$\begin{array}{rl}\mathrm{\Delta}\theta & ={\theta}_{1}{\theta}_{0},\\ {\theta}_{t}& ={\theta}_{0}+t{\textstyle \phantom{\rule{thinmathspace}{0ex}}}\mathrm{\Delta}\theta .\end{array}$$
This is fraught with problems.
First, if canonical Euler angles are not used, we may have large
angle values. For example, imagine the heading of
${\mathbf{R}}_{0}$
,
denoted
${h}_{0}$
, is 720°. Assume
${h}_{1}=45{}^{\mathrm{o}}$
.
Now,
$720{}^{\mathrm{o}}=2\times 360{}^{\mathrm{o}}$
, which is the same
as 0°, so basically the
${h}_{1}$
and
${h}_{2}$
are only
45° apart. However, naïve interpolation will spin
around nearly twice in the wrong direction, as shown in
Figure 8.9.
Of course, the solution to this problem is to use canonical Euler angles. We could assume that
we will always be interpolating between two sets of canonical Euler angles. Or we could attempt
to enforce this by converting to canonical values inside our interpolation routine. (Simply
wrapping angles within the
$(180{}^{\mathrm{o}},+180{}^{\mathrm{o}}]$
range is easy, but dealing with pitch
values outside the
$[90{}^{\mathrm{o}},+90{}^{\mathrm{o}}]$
range is more challenging.)
However, even using canonical angles doesn't completely solve the problem. A second type of
interpolation problem can occur because of the cyclic nature of rotation angles. Suppose
${h}_{0}=170{}^{\mathrm{o}}$
and
${h}_{1}=170{}^{\mathrm{o}}$
. Notice that these are canonical values for heading, both in the range
$(180{}^{\mathrm{o}},+180{}^{\mathrm{o}}]$
. The two heading values are only
$20{}^{\mathrm{o}}$
apart, but
again, naïve interpolation will not behave correctly, rotating the long way around by a
clockwise rotation of
$340{}^{\mathrm{o}}$
instead of taking the shorter counterclockwise path of
$20{}^{\mathrm{o}}$
, as shown in Figure 8.10.
The solution to this second type of problem is to wrap the differences between angles used in the
interpolation equation in the range
$(180{}^{\mathrm{o}},+180{}^{\mathrm{o}}]$
in order to find the shortest
arc. To do this, we introduce the notation
Wrapping an angle between
$\mathbf{\pm}\mathbf{180}{\mathbf{}}^{\mathbf{o}}$
$$\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}\mathrm{P}\mathrm{i}(x)=x360{}^{\mathrm{o}}\lfloor (x+180{}^{\mathrm{o}})/360{}^{\mathrm{o}}\rfloor ,$$
where
$\lfloor \cdot \rfloor $
denotes the floor function.
The
$\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}\mathrm{P}\mathrm{i}$
function is a small, sharp tool that every game programmer should have in their
toolbox. It elegantly handles common situations in which we must account for the cyclic nature
of angles. It works by adding or subtracting the appropriate multiple of
$360{}^{\mathrm{o}}$
.
Listing 8.1 shows how it would be implemented in C.
float wrapPi(float theta) {
// Check if already in range. This is not strictly necessary,
// but it will be a very common situation. We don't want to
// incur a speed hit and perhaps floating precision loss if
// it's not necessary
if (fabs(theta) <= PI) {
// One revolution is 2 PI.
const float TWOPPI = 2.0f*PI;
// Out of range. Determine how many "revolutions"
// we need to add.
float revolutions = floor((theta + PI) * (1.0f/TWOPPI));
// Subtract it off
theta = revolutions*TWOPPI;
}
return theta;
}
Let's go back to Euler angles. As expected, using
$\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}\mathrm{P}\mathrm{i}()$
makes it easy to take the shortest
arc when interpolating between two angles:
Taking the shortest arc when interpolating between two angles
$$\begin{array}{rl}\mathrm{\Delta}\theta & =\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}\mathrm{P}\mathrm{i}({\theta}_{1}{\theta}_{0}),\\ {\theta}_{t}& ={\theta}_{0}+t{\textstyle \phantom{\rule{thinmathspace}{0ex}}}\mathrm{\Delta}\theta .\end{array}$$
But even with these two BandAids, Euler angle interpolation still
suffers from Gimbal lock, which in many situations causes a jerky, unnatural course. The object
whips around suddenly and appears to be hung somewhere. The basic problem is that the angular
velocity is not constant during the interpolation. If you have never experienced what Gimbal
lock looks like, you may be wondering what all the fuss is about. Unfortunately, it is very
difficult to fully appreciate the problem from illustrations in a book—you need to experience
it in real time. Fortunately, though, it's easy to find an animation demonstrating the problem:
just do a youtube.com search for “gimbal lock.”
The first two problems with Euler angle interpolation were irritating, but certainly not
insurmountable. Canonical Euler angles and
$\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}\mathrm{P}\mathrm{i}$
provide relatively simple workarounds.
Gimbal lock, unfortunately, is more than a minor nuisance; it's a fundamental problem. Could we
perhaps reformulate our rotations and devise a system that does not suffer from these problems?
Unfortunately, this is not possible. There is simply an inherent problem with using three
numbers to describe 3D orientation. We could change our problems, but not eliminate them. Any
system that parameterizes 3space orientation using three numbers is guaranteed to have
singularities in the parameterization space and therefore be subject to problems such as Gimbal
lock. The exponential map form (see Section 8.4),
a different scheme for parameterizing 3D rotation with three numbers, manages to consolidate
the singularities to a single point: the antipodes. This behavior is more benign for certain
practical situations, but it does not remove the singularities completely. To do that, we must
use quaternions, which are discussed in Section 8.5.
8.3.5Summary of Euler Angles
Let's summarize our findings from Section 8.3 about Euler angles.

Euler angles store orientation by using three angles. These angles are ordered
rotations about the three objectspace axes.

The most common system of Euler angles is the headingpitchbank system. Heading and
pitch tell which way the object is facing—heading gives a “compass reading” and
pitch measures the angle of declination. Bank measures the amount of “twist.”

In a fixedaxis system, the rotations occur about the upright axes
rather than the moving body axes. This system is equivalent to Euler
angles, provided that we perform the rotations in the opposite order.

Lots of smart people use lots of different terms for Euler angles, and they can have
good reasons for using different conventions. It's best not to rely on terminology when using Euler angles.
Always make sure you get a precise working definition, or you're likely to get very
confused.

In most situations, Euler angles are more intuitive for humans to work
with compared to other methods of representing orientation.

When memory is at a premium, Euler angles use the minimum amount
of data possible for storing an orientation in 3D, and Euler
angles are more easily compressed than quaternions.

There is no such thing as an invalid set of Euler angles. Any three numbers have a
meaningful interpretation.

Euler angles suffer from aliasing problems due to the cyclic nature of rotation
angles and because the rotations are not completely independent of one another.

Using canonical Euler angles can simplify many basic queries on Euler angles. An
Euler angle triple is in the canonical set if heading and bank are in the range
$(180{}^{\mathrm{o}},+180{}^{\mathrm{o}}]$
and pitch is in the range
$[90{}^{\mathrm{o}},+90{}^{\mathrm{o}}]$
. What's more, if pitch is
$\pm 90{}^{\mathrm{o}}$
, then bank is zero.

Gimbal lock occurs when pitch is
$\pm 90{}^{\mathrm{o}}$
. In this case,
one degree of freedom is lost because heading and bank both rotate
about the vertical axis.

Contrary to popular myth, any orientation in 3D can be represented by using
Euler angles, and we can agree on a unique representation for that orientation within
the canonical set.

The
$\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}\mathrm{P}\mathrm{i}$
function is a very handy tool that simplifies situations in which we
have to deal with the cyclic nature of angles. Such situations arise frequently in
practice, especially in the context of Euler angles, but at other times as well.

Simple forms of aliasing are irritating, but there are workarounds. Gimbal lock is a
more fundamental problem with no easy solution. Gimbal lock is a problem because the
parameter space of orientation has a discontinuity. This means small changes in
orientation can result in large changes in the individual angles. Interpolation between
orientations using Euler angles can freak out or take a wobbly path.
8.4AxisAngle and Exponential Map
Representations
Euler's name is attached to all sorts of stuff related to rotation (we just
discussed Euler angles in Section 8.3). His name is also attached to Euler's
rotation theorem, which basically says that any 3D angular displacement can be accomplished via
a single rotation about a carefully chosen axis. To be more precise, given any two
orientations
${\mathbf{R}}_{1}$
and
${\mathbf{R}}_{2}$
, there exists an axis
$\hat{\mathbf{n}}$
such that we
can get from
${\mathbf{R}}_{1}$
to
${\mathbf{R}}_{2}$
by performing one rotation about
$\hat{\mathbf{n}}$
. With Euler angles, we need three rotations to describe any orientation, since we
are restricted to rotate about the cardinal axis. However, when we are free to choose the axis
of rotation, its possible to find one such that only one rotation is needed. Furthermore, as we
will show in this section, except for a few minor details, this axis of rotation is uniquely
determined.
Euler's rotation theorem leads to two closely related methods for describing orientation. Let's
begin with some notation. Assume we have chosen a rotation angle
$\theta $
and an axis of rotation
that passes through the origin and is parallel to the unit vector
$\hat{\mathbf{n}}$
. (In this book,
positive rotation is defined according to the lefthand rule; see
Section 1.3.3.)
Taking the two values
$\hat{\mathbf{n}}$
and
$\theta $
as is, we have described an angular displacement
in the axisangle form. Alternatively, since
$\hat{\mathbf{n}}$
has unit length, we can
multiply it by
$\theta $
without loss of information, yielding the single vector
$\mathbf{e}=\theta \hat{\mathbf{n}}$
. This scheme for describing rotation goes by the rather
intimidating and obscure name of exponential map. The
rotation angle can be deduced from the length of
$\mathbf{e}$
; in other words,
$\theta =\parallel \mathbf{e}\parallel $
, and the axis is obtained by normalizing
$\mathbf{e}$
. The
exponential map is not only more compact than the axisangle (three numbers instead of four), it
elegantly avoids certain singularities and has better interpolation and differentiation
properties.
We're not going to discuss the axisangle and exponential map forms in quite as much detail as the
other methods of representing orientation because in practice their use is a bit specialized. The
axisangle format is primarily a conceptual tool. It's important to understand, but the method gets
relatively little direct use compared to the other formats. It's one notable capability is that we
can directly obtain an arbitrary multiple of the displacement. For example, given a rotation in
axisangle form, we can obtain a rotation that represents one third of the rotation or 2.65 times
the rotation, simply by multiplying
$\theta $
by the appropriate amount. Of course, we can do this same
operation with the exponential map just as easily. Quaternions can do this through exponentiation,
but an inspection of the math reveals that it's really using the axisangle format under the hood.
(Even though quaternions claim to be using the exponential map under the hood!) Quaternions can
also do a similar operation using slerp, but in a more roundabout way and without the ability for
intermediate results to store rotations beyond 180 degrees. We look at quaternions in
Section 8.5.
The exponential map gets more use than the axisangle. First of all, its interpolation
properties are nicer than Euler angles. Although it does have singularities (discussed next),
they are not as troublesome as for Euler angles. Usually, when one thinks of interpolating
rotations, one immediately thinks of quaternions; however, for some applications, such as storage
of animation data, the underappreciated exponential map can be a viable alternative
[5].
But the most important and frequent use of the exponential map is to store not angular
displacement, but rather angular velocity. This is because the exponential map
differentiates nicely (which is somewhat related to its nicer interpolation properties) and can
represent multiple rotations easily.
Like Euler angles, the axisangle and exponential map forms exhibit aliasing and singularities,
although of a slightly more restricted and benign manner. There is an obvious singularity at the
identity orientation, or the quantity “no angular displacement.” In this case,
$\theta =0$
, and
our choice of axis is irrelevant—any axis may be used. Notice, however, that the exponential map
nicely tucks this singularity away, since multiplication by
$\theta $
causes
$\mathbf{e}$
to vanish,
no matter what rotation axis
$\hat{\mathbf{n}}$
is chosen. Another trivial form of aliasing in
axisangle space can be produced by negating both
$\theta $
and
$\hat{\mathbf{n}}$
. However, the
exponential map dodges this issue as well, since negating both
$\theta $
and
$\hat{\mathbf{n}}$
leaves
$\mathbf{e}=\theta \hat{\mathbf{n}}$
unchanged!
The other aliases cannot be dispatched so easily. As with Euler angles, adding a multiple of
$360{}^{\mathrm{o}}$
to
$\theta $
produces an angular displacement that results in the same ending
orientation, and this form of aliasing affects both the axisangle and exponential map. However,
this is not always a shortcoming—for describing angular velocity, this ability to represent such
“extra” rotation is an important and useful property. For example, it's quite important to be
able to distinguish between rotation about the
$x$
axis at a rate of
$720{}^{\mathrm{o}}$
per second
versus rotation about the same axis at a rate of
$1080{}^{\mathrm{o}}$
per second, even though these
displacements result in the same ending orientation if applied for an integral number of seconds.
It is not possible to capture this distinction in quaternion format.
As it turns out, given any angular displacement that can be described by a rotation matrix, the
exponential map representation is uniquely determined. Although more than one exponential map
may produce the same rotation matrix, it is possible to take a subset of the exponential maps
(those for which
$\parallel \mathbf{e}\parallel <2\pi $
) and form a onetoone correspondence with the rotation
matrices. This is the essence of Euler's rotation theorem.
Now let's consider concatenating multiple rotations. Let's say
${\mathbf{e}}_{1}$
and
${\mathbf{e}}_{2}$
are two rotations in exponential map format. The result of performing the rotations in sequence,
for example,
${\mathbf{e}}_{1}$
and then
${\mathbf{e}}_{2}$
, is not the same as performing a
single rotation
${\mathbf{e}}_{1}+{\mathbf{e}}_{2}$
. We know this cannot be true, because ordinary
vector addition is commutative, but threespace rotations are not. Assume that
${\mathbf{e}}_{1}=[90{}^{\mathrm{o}},0,0]$
, and
${\mathbf{e}}_{2}=[0,90{}^{\mathrm{o}},0]$
. With our conventions, this
is a
$90{}^{\mathrm{o}}$
downward pitch rotation, and a
$90{}^{\mathrm{o}}$
heading rotation to the east.
Performing
${\mathbf{e}}_{1}$
followed by
${\mathbf{e}}_{2}$
, we would end up looking downward with our
head pointing east, but doing them in the opposite order, we end up “on our ear” facing east.
But what if the angles were much smaller, say
$2{}^{\mathrm{o}}$
instead of
$90{}^{\mathrm{o}}$
? Now the ending
rotations are more similar. As we take the magnitude of the rotation angles down, the importance
of the order decreases, and at the extreme, for
“infinitesimal” rotations, the order is completely irrelevant. In other words, for
infinitesimal rotations, exponential maps can be added vectorially. Infinitesimals are
important topics from calculus, and they are at the heart of defining rate of change. We look at
these topics in Chapter 11, but for now, the basic idea is that exponential
maps do not add vectorially when used to define an amount of rotation (an angular displacement or
an orientation), but they do properly add vectorially when they describe a rate of
rotation. This is why exponential maps are perfectly suited for describing angular velocity.
Before we leave this topic, a regretful word of warning regarding terminology. Alternative names
for these two simple concepts abound. We have tried to choose the most standard names possible,
but it was difficult to find strong consensus. Some authors use the term “axisangle” to
describe both of these (closely related) methods and don't really distinguish between
them. Even more confusing is the use of the term
“Euler axis” to refer to either form (but not to Euler angles!).
“Rotation vector” is another term you might see attached to what we are calling exponential
map. Finally, the term “exponential map,” in the broader context of
Lie algebra, from whence the term originates, actually refers to an operation (a “map”)
rather than a quantity. We apologize for the confusion, but it's not ourfault.
8.5Quaternions
The term quaternion is somewhat of a buzzword in 3D math. Quaternions carry a certain
mystique—which is a euphemismistic way of saying that many people find quaternions complicated
and confusing. We think the way quaternions are presented in most texts contributes to their
confusion, and we hope that our slightly different approach will help dispel quaternions'
“mystique.”
There is a mathematical reason why using only three numbers to represent a 3space orientation is
guaranteed to cause the problems we discussed with Euler angles, such as Gimbal lock. It has
something to do with some fairly advanced math terms such as
“manifolds.” A quaternion avoids these problems by using four numbers to express an
orientation (hence the name quaternion).
This section describes how to use a quaternion to define an angular displacement. We're going to
deviate somewhat from the traditional presentation, which emphasizes the interesting (but, in our
opinion, nonessential) interpretation of quaternions as complex numbers. Instead, we will be
developing quaternions from a primarily geometric perspective. Here's what's in store: First,
Section 8.5.1 introduces some basic notation. Section 8.5.2
is probably the most important section—it explains how a quaternion may be interpreted
geometrically. Sections 8.5.3 through
Section 8.5.11 review the basic quaternion properties and operations,
examining each from a geometric perspective. Section 8.5.12 discusses the
important slerp operation, which is used to interpolate between two quaternions and is one of the
primary advantages of quaternions.
Section 8.5.13 discusses the advantages and disadvantages of
quaternions. Section 8.5.14 is an optional digression into how
quaternions may be interpreted as 4D complex numbers. Section 8.5.15 summaries
the properties of quaternions.
Don't be scared off by what seems like a lot of hairy math in this section. The most important
things to remember about quaternions are the highlevel concepts that are summarized in
Section 8.5.15. The nuts and bolts of quaternions are given here to show that
everything about quaternions can be derived, and you don't have to take our word on faith. A
detailed understanding of quaternions is not really needed in order to use them,
but you need to understand what quaternions can do.
8.5.1Quaternion Notation
A quaternion contains a scalar component and a 3D vector component. We usually refer to the
scalar component as
$w$
. We may refer to the vector component as a single entity
$\mathbf{v}$
or
as individual components
$x$
,
$y$
, and
$z$
. Here are examples of both notations:
Two types of quaternion notation
$$\left[\begin{array}{cc}w& \mathbf{v}\end{array}\right],\phantom{\rule{.5in}{0ex}}\left[\begin{array}{cc}w& \left(\begin{array}{ccc}x& y& z\end{array}\right)\end{array}\right].$$
In some cases it will be convenient to use the shorter notation, using
$\mathbf{v}$
, and in some
cases the “expanded” version is clearer. This chapter presents most equations in both forms.
We also may write expanded quaternions vertically:
$$\left[\begin{array}{c}w\\ \left(\begin{array}{c}x\\ y\\ z\end{array}\right)\end{array}\right].$$
Unlike regular vectors, there is no significant distinction between “row” and “column”
quaternions. We are free to make the choice strictly for aesthetic purposes.
We denote quaternion variables with the same typeface conventions used for vectors: lowercase
letters in bold (e.g.,
$\mathbf{q}$
). When vectors and quaternions appear together, the context
(and the letters chosen for the variables!) usually make clear which are which.
8.5.2What Do Those Four Numbers Mean?
The quaternion form is closely related to the axisangle and exponential map forms from
Section 8.4. Let's briefly review the notation from that section, as
the same notation will be used here. The unit vector
$\hat{\mathbf{n}}$
defines an axis of rotation,
and the scalar
$\theta $
is the amount of rotation about this axis. Thus, the pair
$(\theta ,\hat{\mathbf{n}})$
define an angular displacement using the axisangle system. You need a left or right
hand to determine which way
is positive rotation.
A quaternion also contains an axis and angle, but
$\hat{\mathbf{n}}$
and
$\theta $
aren't simply stored
in the four numbers of the quaternion directly, as they are in axis angle (that would be too
easy!). Instead, they are encoded in a way that at first might seem weird, but turns out to be
highly practical. Equation (8.2) shows how the values of a quaternion are
related to
$\theta $
and
$\hat{\mathbf{n}}$
, using both forms of quaternion notation:
Geometric meaning of the four values of a quaternion
$$\begin{array}{rl}\text{(8.2)}& \left[\begin{array}{cc}w& \mathbf{v}\end{array}\right]& =\left[\begin{array}{cc}\mathrm{cos}(\theta /2)& \mathrm{sin}(\theta /2)\hat{\mathbf{n}}\end{array}\right],\left[\begin{array}{cc}w& \left(\begin{array}{ccc}x& y& z\end{array}\right)\end{array}\right]& =\left[\begin{array}{cc}\mathrm{cos}(\theta /2)& \left(\begin{array}{ccc}\mathrm{sin}(\theta /2){n}_{x}& \mathrm{sin}(\theta /2){n}_{y}& \mathrm{sin}(\theta /2){n}_{z}\end{array}\right)\end{array}\right].\end{array}$$
Keep in mind that
$w$
is related to
$\theta $
, but they are not the same thing. Likewise,
$\mathbf{v}$
and
$\hat{\mathbf{n}}$
are related, but not identical.
The next several sections discuss a number of quaternion operations from mathematical and geometric
perspectives.
8.5.3Quaternion Negation
Quaternions can be negated. This is done in the obvious way of negating each component:
Quaternion negation
$$\begin{array}{r}\begin{array}{rl}\mathbf{q}& =\left[\begin{array}{cc}w& \left(\begin{array}{ccc}x& y& z\end{array}\right)\end{array}\right]=\left[\begin{array}{cc}w& \left(\begin{array}{ccc}x& y& z\end{array}\right)\end{array}\right]\\ & =\left[\begin{array}{cc}w& \mathbf{v}\end{array}\right]=\left[\begin{array}{cc}w& \mathbf{}\mathbf{v}\end{array}\right].\end{array}\end{array}$$
The surprising fact about negating a quaternion is that it really doesn't do anything, at least
in the context of angular displacement.
The quaternions
$\mathbf{q}$
and
$\mathbf{q}$
describe
the same angular displacement. Any angular displacement
in 3D has exactly two distinct representations in
quaternion format, and they are negatives of each other.
It's not too difficult to see why this is true. If we add
$360{}^{\mathrm{o}}$
to
$\theta $
, it doesn't
change the angular displacement represented by
$\mathbf{q}$
, but it negates all four components
of
$\mathbf{q}$
.
8.5.4Identity Quaternion(s)
Geometrically, there are two “identity” quaternions that represent “no angular displacement.”
They are
Identity quaternions
$$\left[\begin{array}{cc}1& \mathbf{0}\end{array}\right]\phantom{\rule{.25in}{0ex}}\text{and}\phantom{\rule{.25in}{0ex}}\left[\begin{array}{cc}1& \mathbf{0}\end{array}\right].$$
(Note the boldface zero, which indicates the zero vector.) When
$\theta $
is an even multiple of
360°, then
$\mathrm{cos}(\theta /2)=1$
, and we have the first form. If
$\theta $
is an odd multiple
of 360°, then
$\mathrm{cos}(\theta /2)=1$
, and we have the second form. In both cases,
$\mathrm{sin}(\theta /2)$
= 0, so the value of
$\hat{\mathbf{n}}$
is irrelevant. This makes intuitive sense; if
the rotation angle
$\theta $
is a whole number of complete revolutions about any axis, then
no real change is made to the orientation.
Algebraically, there is really only one identity quaternion:
$[1,\mathbf{0}]$
. When we multiply
any quaternion
$\mathbf{q}$
by the identity quaternion, the result is
$\mathbf{q}$
. (We present
quaternion multiplication in Section 8.5.7.) When we multiply a quaternion
$\mathbf{q}$
by the other “geometric identity” quaternion
$[1,\mathbf{0}]$
, we get
$\mathbf{q}$
. Geometrically, this results in the same quaternion, since
$\mathbf{q}$
and
$\mathbf{q}$
represent the same angular displacement. Mathematically, however,
$\mathbf{q}$
and
$\mathbf{q}$
are not equal, so
$[1,\mathbf{0}]$
is not a “true” identity quaternion.
8.5.5Quaternion Magnitude
We can compute the magnitude of a quaternion, just as we can for vectors and complex numbers. The
notation and formula shown in Equation (8.4) are similar to those used for
vectors:
Quaternion magnitude
$$\begin{array}{}\text{(8.4)}& \begin{array}{rl}\parallel \mathbf{q}\parallel & =\parallel \left[\begin{array}{cc}w& \left(\begin{array}{ccc}x& y& z\end{array}\right)\end{array}\right]\parallel =\sqrt{{w}^{2}+{x}^{2}+{y}^{2}+{z}^{2}}\\ & =\parallel \left[\begin{array}{cc}w& \mathbf{v}\end{array}\right]\parallel =\sqrt{{w}^{2}+{\parallel \mathbf{v}\parallel}^{2}}.\end{array}\end{array}$$
Let's see what this means geometrically for a rotation quaternion:
Rotation quaternions have unit magnitude
$$\begin{array}{rl}\parallel \mathbf{q}\parallel & =\parallel \left[\begin{array}{cc}w& \mathbf{v}\end{array}\right]\parallel =\sqrt{{w}^{2}+{\parallel \mathbf{v}\parallel}^{2}}\\ & =\sqrt{{\mathrm{cos}}^{2}(\theta /2)+{(\mathrm{sin}(\theta /2)\parallel \hat{\mathbf{n}}\parallel )}^{2}}& & \text{(substituting using}\theta \text{and}\hat{\mathbf{n}}\text{)}\\ & =\sqrt{{\mathrm{cos}}^{2}(\theta /2)+{\mathrm{sin}}^{2}(\theta /2){\parallel \hat{\mathbf{n}}\parallel}^{2}}\\ & =\sqrt{{\mathrm{cos}}^{2}(\theta /2)+{\mathrm{sin}}^{2}(\theta /2)(1)}& & \text{(}\hat{\mathbf{n}}\text{is a unit vector)}\\ & =\sqrt{1}& & ({\mathrm{sin}}^{2}x+{\mathrm{cos}}^{2}x=1)\\ & =1.\end{array}$$
This is an important observation.
For our purposes of using quaternions to represent orientation,
all quaternions are socalled unit quaternions, which
have a magnitude equal to unity.
For information concerning nonnormalized quaternions, see the technical report by Dam et al.
[2].
8.5.6Quaternion Conjugate and Inverse
The conjugate of a quaternion, denoted
${\mathbf{q}}^{\ast}$
, is obtained by negating the vector
portion of the quaternion:
Quaternion conjugate
$$\begin{array}{rl}\text{(8.5)}& {\mathbf{q}}^{\ast}& ={\left[\begin{array}{cc}w& \mathbf{v}\end{array}\right]}^{\ast}=\left[\begin{array}{cc}w& \mathbf{v}\end{array}\right]& ={\left[\begin{array}{cc}w& \left(\begin{array}{ccc}x& y& z\end{array}\right)\end{array}\right]}^{\ast}=\left[\begin{array}{cc}w& \left(\begin{array}{ccc}x& y& z\end{array}\right)\end{array}\right].\end{array}$$
The term “conjugate” is inherited from the interpretation of a quaternion as a complex number. We
look at this interpretation in more detail in Section 8.5.14.
The inverse of a quaternion, denoted
${\mathbf{q}}^{1}$
, is defined as the conjugate of a
quaternion divided by its magnitude:
Quaternion inverse
$$\begin{array}{}\text{(8.6)}& {\mathbf{q}}^{1}=\frac{{\mathbf{q}}^{\ast}}{\parallel \mathbf{q}\parallel}.\end{array}$$
The quaternion inverse has an interesting correspondence with the multiplicative inverse for real
numbers (scalars). For real numbers, the multiplicative inverse
${a}^{1}$
is
$1/a$
. In other
words,
$a({a}^{1})={a}^{1}a=1$
. The same applies to quaternions. When we multiply a quaternion
$\mathbf{q}$
by its inverse
${\mathbf{q}}^{1}$
, we get the identity quaternion
$[1,\mathbf{0}]$
.
(We discuss quaternion multiplication in Section 8.5.7.)
Equation (8.6) is the official definition of quaternion inverse. However,
if you are interested only in quaternions that represent pure rotations, like we are in this book,
then all the quaternions are unit quaternions and so the conjugate and inverse are equivalent.
The conjugate (inverse) is interesting because
$\mathbf{q}$
and
${\mathbf{q}}^{\ast}$
represent opposite
angular displacements. It is easy to see why this is the case. By negating
$\mathbf{v}$
, we are
negating the axis of rotation
$\hat{\mathbf{n}}$
. This doesn't change the axis in the physical sense,
since
$\hat{\mathbf{n}}$
and
$\hat{\mathbf{n}}$
are parallel. However, it does flip the direction that we
consider to be positive rotation. Thus,
$\mathbf{q}$
rotates about an axis by an amount
$\theta $
,
and
${\mathbf{q}}^{\ast}$
rotates in the opposite direction by the same amount.
For our purposes, an alternative definition of quaternion conjugate could have been to negate
$w$
, leaving
$\mathbf{v}$
(and thus
$\hat{\mathbf{n}}$
) unchanged. This would negate the amount of
rotation
$\theta $
, rather than reversing what is considered positive rotation by flipping the
axis of rotation. This would have been equivalent to the definition given in
Equation (8.5) (for our geometric purposes, at least) and provided for a
slightly more intuitive geometric interpretation. However, the term conjugate has a
special significance in the context of complex numbers, so let's stick with the original
definition.
8.5.7Quaternion Multiplication
Quaternions can be multiplied. The result is similar to the cross product for vectors, in that
it yields another quaternion (not a scalar), and it is not commutative. However, the notation is
different: we denote quaternion multiplication simply by placing the two operands sidebyside.
The formula for quaternion multiplication can be easily derived based upon the definition of
quaternions as complex numbers (see Exercise 6), but we state it
here without development, using both quaternion notations:
Quaternion product
$$\begin{array}{rl}{\mathbf{q}}_{1}{\mathbf{q}}_{2}& =\left[\begin{array}{cc}{w}_{1}& \left(\begin{array}{ccc}{x}_{1}& {y}_{1}& {z}_{1}\end{array}\right)\end{array}\right]\left[\begin{array}{cc}{w}_{2}& \left(\begin{array}{ccc}{x}_{2}& {y}_{2}& {z}_{2}\end{array}\right)\end{array}\right]\\ \text{(8.5.7)}& & =\left[\begin{array}{c}{w}_{1}{w}_{2}{x}_{1}{x}_{2}{y}_{1}{y}_{2}{z}_{1}{z}_{2}\\ \left(\begin{array}{c}{w}_{1}{x}_{2}+{x}_{1}{w}_{2}+{y}_{1}{z}_{2}{z}_{1}{y}_{2}\\ {w}_{1}{y}_{2}+{y}_{1}{w}_{2}+{z}_{1}{x}_{2}{x}_{1}{z}_{2}\\ {w}_{1}{z}_{2}+{z}_{1}{w}_{2}+{x}_{1}{y}_{2}{y}_{1}{x}_{2}\end{array}\right)\end{array}\right]& =\left[\begin{array}{cc}{w}_{1}& {\mathbf{v}}_{1}\end{array}\right]\left[\begin{array}{cc}{w}_{2}& {\mathbf{v}}_{2}\end{array}\right]\\ & =\left[\begin{array}{cc}{w}_{1}{w}_{2}{\mathbf{v}}_{1}\cdot {\mathbf{v}}_{2}& {w}_{1}{\mathbf{v}}_{2}+{w}_{2}{\mathbf{v}}_{1}+{\mathbf{v}}_{1}\times {\mathbf{v}}_{2}\end{array}\right].\end{array}$$
The quaternion product is also known as the Hamilton product; you'll understand why after
reading about the history of quaternions in Section 8.5.14.
Let's quickly mention three properties of quaternion multiplication, all of which can be easily
shown by using the definition given above. First, quaternion multiplication is associative, but not
commutative:
Quaternion multiplication is associative, but not commutative
$$\begin{array}{rl}(\mathbf{a}\mathbf{b})\mathbf{c}& =\mathbf{a}(\mathbf{b}\mathbf{c}),\\ \mathbf{a}\mathbf{b}& \ne \mathbf{b}\mathbf{a}.\end{array}$$
Second, the magnitude of a quaternion product is equal to the product of the magnitudes (see
Exercise 9):
Magnitude of quaternion product
$$\parallel {\mathbf{q}}_{1}{\mathbf{q}}_{2}\parallel =\parallel {\mathbf{q}}_{1}\parallel \parallel {\mathbf{q}}_{2}\parallel .$$
This is very significant because it guarantees us that when we multiply two unit quaternions, the
result is a unit quaternion.
Finally, the inverse of a quaternion product is equal to the product of the inverses taken in
reverse order:
Inverse of quaternion product
$$\begin{array}{rl}(\mathbf{a}\mathbf{b}{)}^{1}& ={\mathbf{b}}^{1}{\mathbf{a}}^{1},\\ ({\mathbf{q}}_{1}{\mathbf{q}}_{2}\cdots {\mathbf{q}}_{n1}{\mathbf{q}}_{n}{)}^{1}& ={{\mathbf{q}}_{n}}^{1}{{\mathbf{q}}_{n1}}^{1}\cdots {{\mathbf{q}}_{2}}^{1}{{\mathbf{q}}_{1}}^{1}.\end{array}$$
Now that we know some basic properties of quaternion multiplication, let's talk about why the
operation is actually useful. Let us “extend” a standard 3D point
$(x,y,z)$
into quaternion
space by defining the quaternion
$\mathbf{p}=[0,(x,y,z)]$
. In general,
$\mathbf{p}$
is not
a valid rotation quaternion, since it can have any magnitude. Let
$\mathbf{q}$
be a
rotation quaternion in the form we have been discussing,
$[\mathrm{cos}\theta /2,\hat{\mathbf{n}}\mathrm{sin}\theta /2]$
, where
$\hat{\mathbf{n}}$
is a unit vector axis of rotation, and
$\theta $
is the rotation
angle. It is surprising to realize that we can rotate the 3D point
$\mathbf{p}$
about
$\hat{\mathbf{n}}$
by performing the rather oddlooking
quaternion multiplication
Using quaternion multiplication to rotate a 3D vector
$$\begin{array}{}\text{(8.7)}& {\mathbf{p}}^{\prime}=\mathbf{q}\mathbf{p}{\mathbf{q}}^{1}.\end{array}$$
We could prove this by expanding the multiplication, substituting in
$\hat{\mathbf{n}}$
and
$\theta $
,
and comparing the result to the matrix we derived to rotate about an arbitrary axis
(Equation (5.1.3)), and indeed this is the approach taken
in most texts on quaternions. While this certainly is an effective way to verify that the trick
works, it leaves us wondering how the heck somebody could have ever stumbled upon it in the first
place. In Section 8.7.3, we derive the conversion from quaternion to matrix
form in a straightforward way, solely from the geometry of the rotations and without referring to
$\mathbf{q}\mathbf{p}{\mathbf{q}}^{1}$
. As for how the association was discovered, we cannot
say for sure, but we will offer a train of thought that can lead a person to discover the
connection between this strange product and rotations in Section 8.5.14.
This discussion also explains how a person might have discovered that it would be fruitful to use
half of the rotation angle for the components.
As it turns out, the correspondence between quaternion multiplication and 3D vector rotations is
more of a theoretical interest than a practical one. Some people (“quaterniophiles?”) like to
attribute quaternions with the useful property that vector rotations are immediately accessible by
using Equation (8.7). To the quaternion lovers, we admit that this compact
notation is an advantage of sorts, but its practical benefit in computations is dubious. If you
actually work through this math, you will find that it is just about the same number of operations
involved as converting the quaternion to the equivalent rotation matrix (by using
Equation (8.20), which is developed in Section 8.7.3) and then
multiplying the vector by this matrix. Because of this, we don't consider quaternions to possess
any direct ability to rotate vectors, at least for practical purposes in acomputer.
Although the correspondence between
$\mathbf{q}\mathbf{p}{\mathbf{q}}^{1}$
and rotation is
not of direct practical importance, it is of supreme theoretical importance. It leads us to a
slightly different use of quaternion multiplication, and this use is highly practical in
programming. Examine what happens when multiple rotations are applied to a vector. We'll rotate
the vector
$\mathbf{p}$
by the quaternion
$\mathbf{a}$
, and then rotate that result by another
quaternion
$\mathbf{b}$
:
Concatenating multiple rotations with quaternion algebra
$$\begin{array}{rl}{\mathbf{p}}^{\prime}& =\mathbf{b}(\mathbf{a}\mathbf{p}{\mathbf{a}}^{1}){\mathbf{b}}^{1}\\ & =(\mathbf{b}\mathbf{a})\mathbf{p}({\mathbf{a}}^{1}{\mathbf{b}}^{1})\\ & =(\mathbf{b}\mathbf{a})\mathbf{p}(\mathbf{b}\mathbf{a}{)}^{1}.\end{array}$$
Notice that rotating by
$\mathbf{a}$
and then by
$\mathbf{b}$
is equivalent to performing a
single rotation by the quaternion product
$\mathbf{b}\mathbf{a}$
. This is a key
observation.
Quaternion multiplication can be used to concatenate multiple
rotations, just like matrix multiplication.
We say “just like matrix multiplication,” but in fact there is a slightly irritating
difference. With matrix multiplication, our preference to use row vectors puts the vectors on
the left, resulting in the nice property that concatenated rotations read lefttoright in the
order of transformation. With quaternions, we don't have this flexibility: concatenation of
multiple rotations will always read “inside out” from right to left.
8.5.8Quaternion “Difference”
Using the quaternion multiplication and inverse, we can compute the difference between two
quaternions, with “difference” meaning the angular displacement from one orientation to another.
In other words, given orientations
$\mathbf{a}$
and
$\mathbf{b}$
, we can compute the angular
displacement
$\mathbf{d}$
that rotates from
$\mathbf{a}$
to
$\mathbf{b}$
. This can be expressed
compactly as
$$\mathbf{d}\mathbf{a}=\mathbf{b}.$$
(Remember that quaternion multiplication performs the rotations from righttoleft.)
Let's solve for
$\mathbf{d}$
. If the variables in the equation represented scalars, we could
simply divide by
$\mathbf{a}$
. However, we can't divide quaternions; we can only multiply them.
Perhaps multiplication by the inverse will achieve the desired effect? Multiplying both sides by
${\mathbf{a}}^{1}$
on the right (we have to be careful since quaternion multiplication is not
commutative) gives us
The quaternion “difference”
$$\begin{array}{rl}(\mathbf{d}\mathbf{a}){\mathbf{a}}^{1}& =\mathbf{b}{\mathbf{a}}^{1},\\ \mathbf{d}(\mathbf{a}{\mathbf{a}}^{1})& =\mathbf{b}{\mathbf{a}}^{1},\\ \mathbf{d}{\textstyle \phantom{\rule{thinmathspace}{0ex}}}\left[\begin{array}{cc}1& \mathbf{0}\end{array}\right]& =\mathbf{b}{\mathbf{a}}^{1},\\ \mathbf{d}& =\mathbf{b}{\mathbf{a}}^{1}.\end{array}$$
Now we have a way to generate a quaternion that represents the angular displacement from one
orientation to another. We use this in Section 8.5.12, when we explore slerp.
Mathematically, the angular difference between two quaternions is actually more similar to a
division than a true difference (subtraction).
8.5.9Quaternion Dot Product
The dot product operation is defined for quaternions. The notation and definition for this
operation is very similar to the vector dot product:
Quaternion dot product
$$\begin{array}{r}\begin{array}{rl}{\mathbf{q}}_{1}\cdot {\mathbf{q}}_{2}& =\left[\begin{array}{cc}{w}_{1}& {\mathbf{v}}_{1}\end{array}\right]\cdot \left[\begin{array}{cc}{w}_{2}& {\mathbf{v}}_{2}\end{array}\right]\\ & ={w}_{1}{w}_{2}+{\mathbf{v}}_{1}\cdot {\mathbf{v}}_{2}\\ & =\left[\begin{array}{cc}{w}_{1}& \left(\begin{array}{ccc}{x}_{1}& {y}_{1}& {z}_{1}\end{array}\right)\end{array}\right]\cdot \left[\begin{array}{cc}{w}_{2}& \left(\begin{array}{ccc}{x}_{2}& {y}_{2}& {z}_{2}\end{array}\right)\end{array}\right]\\ & ={w}_{1}{w}_{2}+{x}_{1}{x}_{2}+{y}_{1}{y}_{2}+{z}_{1}{z}_{2}.\end{array}\end{array}$$
Like the vector dot product, the result is a scalar. For unit quaternions
$\mathbf{a}$
and
$\mathbf{b}$
,
$1\le \mathbf{a}\cdot \mathbf{b}\le 1$
.
The dot product is perhaps not one of the most frequently used quaternion operators, at least in
video game programming, but it does have an interesting geometric interpretation. In
Section 8.5.8, we considered the difference quaternion
$\mathbf{d}=\mathbf{b}{\mathbf{a}}^{\ast}$
, which describes the angular displacement from orientation
$\mathbf{a}$
to orientation
$\mathbf{b}$
. (We assume unit quaternions and replace the quaternion inverse with
the conjugate.) If we expand the product and examine the contents of
$\mathbf{d}$
, we find that
the
$w$
component is equal to the dot product
$\mathbf{a}\cdot \mathbf{b}$
!
What does this mean geometrically? Remember Euler's rotation theorem: we can rotate from the
orientation
$\mathbf{a}$
into the orientation
$\mathbf{b}$
via a single rotation about a
carefully chosen axis. This uniquely determined (up to a reversal of sign) axis and angle are
precisely the ones encoded in
$\mathbf{d}$
. Remembering the relationship between the
$w$
component
and the rotation angle
$\theta $
, we see that
$\mathbf{a}\cdot \mathbf{b}=\mathrm{cos}(\theta /2)$
,
where
$\theta $
is the amount of rotation needed to go from the orientation
$\mathbf{a}$
to the
orientation
$\mathbf{b}$
.
In summary, the quaternion dot product has an interpretation similar to the vector dot product. The
larger the absolute value of the quaternion dot product
$\mathbf{a}\cdot \mathbf{b}$
, the more
“similar” are the angular displacements represented by
$\mathbf{a}$
and
$\mathbf{b}$
. While
the vector dot product gives the cosine of the angle between vectors, the quaternion dot product
gives the cosine of half of the angle needed to rotate one quaternion into the other. For
the purpose of measuring similarity, usually we are interested only in the absolute value of
$\mathbf{a}\cdot \mathbf{b}$
, since
$\mathbf{a}\cdot \mathbf{b}=(\mathbf{a}\cdot \mathbf{b})$
, even though
$\mathbf{b}$
and
$\mathbf{b}$
represent the same angular
displacement.
Although direct use of the dot product is infrequent in most video game code, the dot product is
the first step in the calculation of the slerp function, which we discuss in
Section 8.5.12.
8.5.10Quaternion log, exp, and Multiplication by a Scalar
This section discusses three operations on quaternions that, although they are seldom used
directly, are the basis for several important quaternion operations. These operations are the
quaternion logarithm, exponential, and multiplication by a scalar.
First, let us reformulate our definition of a quaternion by introducing a variable
$\alpha $
to
equal the halfangle,
$\theta /2$
:
Defining a quaternion in terms of the halfangle
$\mathit{\alpha}$
$$\begin{array}{rlrl}\alpha & =\theta /2,& \mathbf{q}& =\left[\begin{array}{cc}\mathrm{cos}\alpha & \hat{\mathbf{n}}\mathrm{sin}\alpha \end{array}\right].\end{array}$$
The logarithm of
$\mathbf{q}$
is defined as
The logarithm of a quaternion
$$\begin{array}{rl}\mathrm{log}\mathbf{q}& =\mathrm{log}\left(\left[\begin{array}{cc}\mathrm{cos}\alpha & \hat{\mathbf{n}}\mathrm{sin}\alpha \end{array}\right]\right)\equiv \left[\begin{array}{cc}0& \alpha \hat{\mathbf{n}}\end{array}\right].\end{array}$$
We use the notation
$\equiv $
to mean equal by definition. In general,
$\mathrm{log}\mathbf{q}$
is not
a unit quaternion. Note the similarity between taking the logarithm of a quaternion, and the
exponential map format (see Section 8.4).
The exponential function is defined in the exact opposite manner. First we define the quaternion
$\mathbf{p}$
to be of the form
$[0,\alpha \hat{\mathbf{n}}]$
, with
$\hat{\mathbf{n}}$
a unit vector:
$$\begin{array}{rlrl}\mathbf{p}& =\left[\begin{array}{cc}0& \alpha \hat{\mathbf{n}}\end{array}\right],& & (\parallel \hat{\mathbf{n}}\parallel =1).\end{array}$$
Then the exponential function is defined as
The exponential function of a quaternion
$$\begin{array}{rl}\mathrm{exp}\mathbf{p}& =\mathrm{exp}\left(\left[\begin{array}{cc}0& \alpha \hat{\mathbf{n}}\end{array}\right]\right)\equiv \left[\begin{array}{cc}\mathrm{cos}\alpha & \hat{\mathbf{n}}\mathrm{sin}\alpha \end{array}\right].\end{array}$$
Note that, by definition,
$\mathrm{exp}\mathbf{p}$
always returns a unit quaternion.
The quaternion logarithm and exponential are related to their scalar analogs. For any scalar
$a$
,
$${e}^{\mathrm{ln}a}=a.$$
In the same way, the quaternion exp function is defined to be the inverse of the quaternion log
function:
$$\mathrm{exp}({\textstyle \phantom{\rule{thinmathspace}{0ex}}}\mathrm{log}\mathbf{q}{\textstyle \phantom{\rule{thinmathspace}{0ex}}})=\mathbf{q}.$$
Finally, quaternions can be multiplied by a scalar, with the result computed in the obvious way
of multiplying each component by the scalar. Given a scalar
$k$
and a quaternion
$\mathbf{q}$
,
Multiplying a quaternion by a scalar
$$\begin{array}{rl}k\mathbf{q}& =k\left[\begin{array}{cc}w& \mathbf{v}\end{array}\right]=\left[\begin{array}{cc}kw& k\mathbf{v}\end{array}\right].\end{array}$$
This will not usually result in a unit quaternion, which is why multiplication by a scalar is not a
very useful operation in the context of representing angular displacement. (But we will find a use
for it in Section 8.5.11.)
8.5.11Quaternion Exponentiation
Quaternions can be exponentiated, which means that we can raise a quaternion to a scalar
power. Quaternion exponentiation, denoted
${\mathbf{q}}^{t}$
, should not be confused with the
exponential function
$\mathrm{exp}\mathbf{q}$
. The exponential function accepts only one argument: a
quaternion. Quaternion exponentiation has two arguments: the quaternion
$\mathbf{q}$
and the
scalar exponent
$t$
.
The meaning of quaternion exponentiation is similar to that of real numbers. Recall that for any
scalar
$a$
, besides zero,
${a}^{0}=1$
and
${a}^{1}=a$
. As the exponent
$t$
varies from
$0$
to
$1$
the
value of
${a}^{t}$
varies from
$1$
to
$a$
. A similar statement holds for quaternion exponentiation: as
$t$
varies from
$0$
to
$1$
the quaternion exponentiation
${\mathbf{q}}^{t}$
varies from
$[1,\mathbf{0}]$
to
$\mathbf{q}$
.
Quaternion exponentiation is useful because it allows us to extract a “fraction” of an angular
displacement. For example, to compute a quaternion that represents one third of the angular
displacement represented by the quaternion
$\mathbf{q}$
, we would compute
${\mathbf{q}}^{1/3}$
.
Exponents outside the
$[0,1]$
range behave mostly as expected—with one major caveat. For example,
${\mathbf{q}}^{2}$
represents twice the angular displacement as
$\mathbf{q}$
. If
$\mathbf{q}$
represents a clockwise rotation of 30° about the
$x$
axis, then
${\mathbf{q}}^{2}$
represents
a clockwise rotation of 60° about the
$x$
axis, and
${\mathbf{q}}^{1/3}$
represents a
counterclockwise rotation of 10° about the
$x$
axis. Notice in particular that the inverse
notation
${\mathbf{q}}^{1}$
can also be interpreted in this context and the result is the same: the
quaternion that performs the opposite rotation.
The caveat we mentioned is this: a quaternion represents angular displacements using the shortest
arc. Multiple spins cannot be represented. Continuing our example above,
${\mathbf{q}}^{8}$
is not a
240° clockwise rotation about the
$x$
axis as expected; it is a 120°\
counterclockwise rotation. Of course, rotating 240° in one direction produces the same end
result as rotating 120° in the opposite direction, and this is the point: quaternions
really capture only the end result. In general, many of the algebraic identities concerning
exponentiation of scalars, such as
$({a}^{s}{)}^{t}={a}^{st}$
, do not apply to quaternions.
In some situations, we do care about the total amount of rotation, not just the end
result. (The most important example is that of angular velocity.) In these situations,
quaternions are not the correct tool for the job; use the exponential map (or its cousin, the
axisangle format) instead.
Now that we understand what quaternion exponentiation is used for, let's see how it is
mathematically defined. Quaternion exponentiation is defined in terms of the “utility”
operations we learned in the previous section. The definition is given by
Raising a quaternion to a power
$$\begin{array}{}\text{(8.8)}& {\mathbf{q}}^{t}=\mathrm{exp}(t\mathrm{log}\mathbf{q}).\end{array}$$
Notice that a similar statement is true regarding exponentiation of a scalar:
$${a}^{t}={e}^{(t\mathrm{ln}a)}.$$
It is not too difficult to understand why
${\mathbf{q}}^{t}$
interpolates from identity to
$\mathbf{q}$
as
$t$
varies from
$0$
to
$1$
. Notice that the
$\mathrm{log}$
operation essentially
converts the quaternion to exponential map format (except for a factor of 2). Then, when we
perform the scalar multiplication by the exponent
$t$
, the effect is to multiply the angle
by
$t$
. Finally, the
$\mathrm{exp}$
“undoes” what the
$\mathrm{log}$
operation did,
recalculating the new
$w$
and
$\mathbf{v}$
from the exponential vector. At least this is how it
works academically in an equation. Although Equation (8.8) is the official
mathematical definition and works elegantly in theory, direct translation into code is more
complicated than necessary. Listing 8.2 shows how we could compute the
value of
${\mathbf{q}}^{t}$
in C. Essentially, instead of working with a single exponentialmaplike
quantity as the formula tells us to, we break out the axis and halfangle separately.
// Quaternion (input and output)
float w,x,y,z;
// Input exponent
float exponent;
// Check for the case of an identity quaternion.
// This will protect against divide by zero
if (fabs(w) < .9999f) {
// Extract the half angle alpha (alpha = theta/2)
float alpha = acos(w);
// Compute new alpha value
float newAlpha = alpha * exponent;
// Compute new w value
w = cos(newAlpha);
// Compute new xyz values
float mult = sin(newAlpha) / sin(alpha);
x *= mult;
y *= mult;
z *= mult;
}
There are a few points to notice about this code. First, the check for the identity quaternion is
necessary since a value of
$w=\pm 1$
would cause the computation of mult to divide
by zero. Raising an identity quaternion to any power results in the identity quaternion, so if we
detect an identity quaternion on input, we simply ignore the exponent and return the original
quaternion.
Second, when we compute alpha, we use the
$\mathrm{arccos}$
function, which always returns a
positive angle. This does not create a loss of generality. Any quaternion can be interpreted as
having a positive angle of rotation, since negative rotation about an axis is the same as positive
rotation about the axis pointing in the opposite direction.
8.5.12Quaternion Interpolation, a.k.a. Slerp
The raison d'être of quaternions in games and graphics today is an operation known as
slerp, which stands for Spherical Linear interpolation. The slerp
operation is useful because it allows us to smoothly interpolate between two orientations. Slerp
avoids all the problems that plagued interpolation of Euler angles (see
Section 8.3.4).
Slerp is a ternary operator, meaning it accepts three operands. The first two operands to slerp are
the two quaternions between which we wish to interpolate. We'll assign these starting and ending
orientations to the variables
${\mathbf{q}}_{0}$
and
${\mathbf{q}}_{1}$
, respectively. The interpolation
parameter will be assigned to the variable
$t$
, and as
$t$
varies from
$0$
to
$1$
, the slerp
function
$\mathrm{s}\mathrm{l}\mathrm{e}\mathrm{r}\mathrm{p}({\mathbf{q}}_{0},{\mathbf{q}}_{1},t)$
returns an orientation that interpolates
from
${\mathbf{q}}_{0}$
to
${\mathbf{q}}_{1}$
.
Let's see if we can't derive the slerp formula by using the tools we have so far. If we were
interpolating between two scalar values
${a}_{0}$
and
${a}_{1}$
, we could use the standard linear
interpolation (lerp) formula:
Simple linear interpolation
$$\begin{array}{rl}\mathrm{\Delta}a& ={a}_{1}{a}_{0},\\ \mathrm{l}\mathrm{e}\mathrm{r}\mathrm{p}({a}_{0},{a}_{1},t)& ={a}_{0}+t{\textstyle \phantom{\rule{thinmathspace}{0ex}}}\mathrm{\Delta}a.\end{array}$$
The standard linear interpolation formula works by starting at
${a}_{0}$
and adding the fraction
$t$
of
the difference between
${a}_{1}$
and
${a}_{0}$
. This requires three basic steps:

Compute the difference between the two values.

Take a fraction of this difference.

Take the original value and adjust it by this fraction of the difference.
We can use the same basic idea to interpolate between orientations. (Again, remember that
quaternion multiplication reads righttoleft.)

Compute the difference between the two values. We showed how to do this in
Section 8.5.8. The angular displacement from
${\mathbf{q}}_{0}$
to
${\mathbf{q}}_{1}$
is given by
$$\mathrm{\Delta}\mathbf{q}={\mathbf{q}}_{1}{{\mathbf{q}}_{0}}^{1}.$$

Take a fraction of this difference. To do this, we use quaternion
exponentiation, which we discussed in Section 8.5.11. The
fraction of the difference is given by
$$(\mathrm{\Delta}\mathbf{q}{)}^{t}.$$

Take the original value and adjust it by this
fraction of the difference.
We “adjust” the initial value by
composing the angular displacements via quaternion multiplication:
$$(\mathrm{\Delta}\mathbf{q}{)}^{t}{\mathbf{q}}_{0}.$$
Thus, the equation for slerp is given by
Quaternion slerp in theory
$$\mathrm{s}\mathrm{l}\mathrm{e}\mathrm{r}\mathrm{p}({\mathbf{q}}_{0},{\mathbf{q}}_{1},t)=({\mathbf{q}}_{1}{{\mathbf{q}}_{0}}^{1}{)}^{t}{\mathbf{q}}_{0}.$$
This algebraic form is how slerp is computed in theory. In practice, we use a formulation that
is mathematically equivalent, but computationally more efficient. To derive this alternative
formula, we start by interpreting the quaternions as existing in a 4D Euclidian space. Since all
of the quaternions of interest are unit quaternions, they “live” on the surface of a
4D
hypersphere. The basic idea is to interpolate around the arc that connects the two quaternions,
along the surface of the 4D hypersphere. (Hence the name spherical linear interpolation.)
We can visualize this in the plane (see Figure 8.11). Imagine two 2D vectors
${\mathbf{v}}_{0}$
and
${\mathbf{v}}_{1}$
, both of unit length. We wish to compute the value of
${\mathbf{v}}_{t}$
, which is the result of smoothly interpolating around the arc by a fraction
$t$
of
the distance from
${\mathbf{v}}_{0}$
to
${\mathbf{v}}_{1}$
. If we let
$\omega $
be the angle intercepted by the arc from
${\mathbf{v}}_{0}$
to
${\mathbf{v}}_{1}$
, then
${\mathbf{v}}_{t}$
is the result of rotating
${\mathbf{v}}_{0}$
around this arc by
an angle of
$t\omega $
.
We can express
${\mathbf{v}}_{t}$
as a linear combination of
${\mathbf{v}}_{0}$
and
${\mathbf{v}}_{1}$
. In
other words, there exist nonnegative constants
${k}_{0}$
and
${k}_{1}$
such that
${\mathbf{v}}_{t}={k}_{0}{\mathbf{v}}_{0}+{k}_{1}{\mathbf{v}}_{1}$
. We can use elementary geometry to determine the values of
${k}_{0}$
and
${k}_{1}$
. Figure 8.12 shows how this can be done.
Applying some trig to the right triangle with
${k}_{1}{\mathbf{v}}_{1}$
as the hypotenuse (and recalling
that
${\mathbf{v}}_{1}$
is a unit vector), we see that
$$\begin{array}{rl}\mathrm{sin}\omega & =\frac{\mathrm{sin}t\omega}{{k}_{1}},\\ {k}_{1}& =\frac{\mathrm{sin}t\omega}{\mathrm{sin}\omega}.\end{array}$$
A similar technique to solve for
${k}_{0}$
yields the following result:
$${k}_{0}=\frac{\mathrm{sin}(1t)\omega}{\mathrm{sin}\omega}.$$
Thus,
${\mathbf{v}}_{t}$
can be expressed as
$${\mathbf{v}}_{t}={k}_{0}{\mathbf{v}}_{0}+{k}_{1}{\mathbf{v}}_{1}=\frac{\mathrm{sin}(1t)\omega}{\mathrm{sin}\omega}{\mathbf{v}}_{0}+\frac{\mathrm{sin}t\omega}{\mathrm{sin}\omega}{\mathbf{v}}_{1}.$$
The same basic idea can be extended into quaternion space, and we can reformulate the slerp as
Quaternion slerp in practice
$$\mathrm{s}\mathrm{l}\mathrm{e}\mathrm{r}\mathrm{p}({\mathbf{q}}_{0},{\mathbf{q}}_{1},t)=\frac{\mathrm{sin}(1t)\omega}{\mathrm{sin}\omega}{\mathbf{q}}_{0}+\frac{\mathrm{sin}t\omega}{\mathrm{sin}\omega}{\mathbf{q}}_{1}.$$
We just
need a way to compute
$\omega $
, the “angle” between the two quaternions. As it turns out, an
analogy from 2D vector math can be carried into quaternion space; we can think of the quaternion
dot product as returning
$\mathrm{cos}\omega $
.
There are two slight complications. First, the two quaternions
$\mathbf{q}$
and
$\mathbf{q}$
represent the same orientation, but may produce different results when used as an argument to
slerp. This problem doesn't happen in 2D or 3D, but the surface of a 4D hypersphere has a different
topology than Euclidian space. The solution is to choose the signs of
${\mathbf{q}}_{0}$
and
${\mathbf{q}}_{1}$
such that the dot product
${\mathbf{q}}_{0}\cdot {\mathbf{q}}_{1}$
is nonnegative. This
has the effect of always selecting the shortest rotational arc from
${\mathbf{q}}_{0}$
to
${\mathbf{q}}_{1}$
. The second complication is that if
${\mathbf{q}}_{0}$
and
${\mathbf{q}}_{1}$
are very
close, then
$\omega $
is very small, and thus
$\mathrm{sin}\omega $
is also very small, which will cause
problems with the division. To avoid this, if
$\mathrm{sin}\omega $
is very small, we will use simple
linear interpolation. The code snippet in Listing 8.3 applies all of this advice
to compute the quaternion slerp.
// The two input quaternions
float w0,x0,y0,z0;
float w1,x1,y1,z1;
// The interpolation parameter
float t;
// The output quaternion will be computed here
float w,x,y,z;
// Compute the "cosine of the angle" between the
// quaternions, using the dot product
float cosOmega = w0*w1 + x0*x1 + y0*y1 + z0*z1;
// If negative dot, negate one of the input
// quaternions, to take the shorter 4D "arc"
if (cosOmega < 0.0f) {
w1 = w1;
x1 = x1;
y1 = y1;
z1 = z1;
cosOmega = cosOmega;
}
// Check if they are very close together, to protect
// against dividebyzero
float k0, k1;
if (cosOmega > 0.9999f) {
// Very close  just use linear interpolation
k0 = 1.0ft;
k1 = t;
} else {
// Compute the sin of the angle using the
// trig identity sin^2(omega) + cos^2(omega) = 1
float sinOmega = sqrt(1.0f  cosOmega*cosOmega);
// Compute the angle from its sine and cosine
float omega = atan2(sinOmega, cosOmega);
// Compute inverse of denominator, so we only have
// to divide once
float oneOverSinOmega = 1.0f / sinOmega;
// Compute interpolation parameters
k0 = sin((1.0f  t) * omega) * oneOverSinOmega;
k1 = sin(t * omega) * oneOverSinOmega;
}
// Interpolate
w = w0*k0 + w1*k1;
x = x0*k0 + x1*k1;
y = y0*k0 + y1*k1;
z = z0*k0 + z1*k1;
8.5.13Advantages and Disadvantages of Quaternions
Quaternions offer a number of advantages over other methods of representing angular displacement:

Smooth interpolation. The interpolation provided by
slerp
provides smooth interpolation between
orientations. No other representation method provides for
smooth interpolation.

Fast concatenation and inversion of angular displacements. We can
concatenate a sequence of angular displacements into a single angular displacement by
using the quaternion cross product. The same operation using matrices involves more
scalar operations, although which one is actually faster on a given architectures is
not so cleancut: single instruction multiple data (SIMD) vector operations can make
very quick work of matrix multiplication. The quaternion conjugate provides a way to
compute the opposite angular displacement very efficiently. This can be done by
transposing a rotation matrix, but is not easy with Euler angles.

Fast conversion to and from matrix form. As we see in
Section 8.7, quaternions can be converted to and from matrix
form a bit faster than Euler angles.

Only four numbers. Since a quaternion contains four scalar
values, it is considerably more economical than a matrix, which uses
nine numbers. (However, it still is 33%larger than Euler angles.)
These advantages do come at some cost, however. Quaternions suffer from a few of the problems that
affect matrices, only to a lesser degree:

Slightly bigger than Euler angles. That one additional number may not seem
like much, but an extra 33%can make a difference when large amounts of angular
displacements are needed, for example, when storing animation data. And the values
inside a quaternion are not “evenly spaced” along the
$[1,+1]$
interval; the
component values do not interpolate smoothly, even if the orientation does. This makes
quaternions more difficult to pack into a fixedpoint number than Euler angles or an
exponential map.

Can become invalid. This can happen either through bad input data, or from
accumulated floating point roundoff error. (We can address this problem by normalizing
the quaternion to ensure that it has unit magnitude.)

Difficult for humans to work with. Of the three
representation methods, quaternions are the most difficult for
humans to work with directly.
8.5.14Quaternions as Complex Numbers
We end our discussion on quaternions in the place that most texts begin: a discussion of their
interpretation as complex numbers. If you are interested in quaternions solely for rotations, you
can safely skip this section. If you want a bit deeper understanding or are interested in the
mathematical heritage of quaternions and the circumstances that surrounded their invention, this
section will be interesting. We will be following an approach due to John McDonald of DePaul
University [9]. Among other things, this method is able to
explain two peculiarities of quaternions: the appearance of
$\theta /2$
rather than
$\theta $
and the
unusual mathematical form
$\mathbf{q}\mathbf{v}{\mathbf{q}}^{1}$
:
We begin by considering how we can embed the set of real numbers in the set of
$2\times 2$
matrices. For any given scalar
$a$
, we associate it with exactly one
$2\times 2$
matrix, namely
the matrix that has
$a$
on both of the diagonal elements:
Each real scalar maps to a
$\mathbf{2}\mathbf{\times}\mathbf{2}$
matrix
$$a\equiv \left[\begin{array}{cc}a& 0\\ 0& a\end{array}\right].$$
We have chosen a subset of the
$2\times 2$
matrices, and established a onetoone correspondence
between this smaller set of matrices and the set of all real numbers. We could have established
this onetoone relationship in other ways, but this particular way of doing it is important
because it preserves all the ordinary algebra laws of addition, subtraction, and multiplication:
the associative property, distributive property, nonfactorability of zero, and so on. (We can even
include division if we treat division as multiplication by the inverse.) For example,
Addition, subtraction, and multiplication work the same
$$\begin{array}{rl}\left[\begin{array}{cc}a& 0\\ 0& a\end{array}\right]+\left[\begin{array}{cc}b& 0\\ 0& b\end{array}\right]& =\left[\begin{array}{cc}a+b& 0\\ 0& a+b\end{array}\right],\\ \left[\begin{array}{cc}a& 0\\ 0& a\end{array}\right]\left[\begin{array}{cc}b& 0\\ 0& b\end{array}\right]& =\left[\begin{array}{cc}ab& 0\\ 0& ab\end{array}\right],\\ \left[\begin{array}{cc}a& 0\\ 0& a\end{array}\right]\left[\begin{array}{cc}b& 0\\ 0& b\end{array}\right]& =\left[\begin{array}{cc}ab& 0\\ 0& ab\end{array}\right].\end{array}$$
Now let's see if we can create a similar mapping for the set of complex numbers. You probably
already have been introduced to complex numbers; if so, you should remember that the complex pair
$(a,b)$
defines the number
$a+bi$
. The number
$i$
is a special number such that
${i}^{2}=1$
.
It's often called the imaginary number because no ordinary scalar (a “real” number) can
have this property. The word “imaginary” gives one the impression that the number doesn't
really exist; we're going avoid this term and instead stick with the more descriptive one:
“complex.”
Complex numbers can be added, subtracted, and multiplied. All we need to do is follow the
ordinary rules for arithmetic, and replace
${i}^{2}$
with
$1$
whenever it appears. This results in
the following identities:
Adding, subtracting, and multiplying complex numbers
$$\begin{array}{rl}(a+bi)+(c+di)& =(a+c)+(b+d)i,\\ (a+bi)(c+di)& =(ac)+(bd)i,\\ (a+bi)(c+di)& =ac+adi+bci+bd{i}^{2}\\ & =ac+(ad+bc)i+bd(1)\\ & =(acbd)+(ad+bc)i.\end{array}$$
Now, how can we extend our system of embedding numbers in the space of
$2\times 2$
matrices to
include complex numbers? Before, we only had one degree of freedom,
$a$
, and now we have two,
$a$
and
$b$
. The mapping we use is
Mapping each complex number to a
$\mathbf{2}\mathbf{\times}\mathbf{2}$
matrix
$$\begin{array}{}\text{(8.9)}& a+bi\equiv \left[\begin{array}{cc}a& b\\ b& a\end{array}\right].\end{array}$$
We can easily verify that the complex number on the left behaves exactly the same as the matrix
on the right. In a certain sense, they are just two notations for writing the same quantity:
Addition, subtraction, and multiplication in standard notation and our
$\mathbf{2}\mathbf{\times}\mathbf{2}$
form
$$\begin{array}{rl}(a+bi)+(c+di)& \equiv \left[\begin{array}{cc}a& b\\ b& a\end{array}\right]+\left[\begin{array}{cc}c& d\\ d& c\end{array}\right]=\left[\begin{array}{cc}a+c& (b+d)\\ b+d& a+c\end{array}\right]\\ & \equiv (a+c)+(b+d)i,\\ (a+bi)(c+di)& \equiv \left[\begin{array}{cc}a& b\\ b& a\end{array}\right]\left[\begin{array}{cc}c& d\\ d& c\end{array}\right]=\left[\begin{array}{cc}ac& (bd)\\ bd& ac\end{array}\right]\\ & \equiv (ac)+(bd)i,\\ (a+bi)(c+di)& \equiv \left[\begin{array}{cc}a& b\\ b& a\end{array}\right]\left[\begin{array}{cc}c& d\\ d& c\end{array}\right]=\left[\begin{array}{cc}acbd& (ad+bc)\\ ad+bc& acbd\end{array}\right]\\ & \equiv (acbd)+(ad+bc)i.\end{array}$$
We also verify that the equation
${i}^{2}=1$
still holds:
$\mathit{i}$
doesn't seem quite as “imaginary” in
$\mathbf{2}\mathbf{\times}\mathbf{2}$
form
$${i}^{2}\equiv {\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right]}^{2}=\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right]\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right]=\left[\begin{array}{cc}1& 0\\ 0& 1\end{array}\right]\equiv 1.$$
Let's apply the geometric perspective from Chapter 5. Interpreting the
columns
$[0,1]$
and
$[1,0]$
as the basis vectors of a coordinate space, we see that this matrix performs a
$90{}^{\mathrm{o}}$
rotation.
We can interpret multiplication by
$i$
as a
$90{}^{\mathrm{o}}$
rotation.
There's nothing “imaginary” about this. Instead of thinking of
$i$
as the square root of
$1$
,
think instead of the complex number
$a+bi$
as a mathematical entity with two degrees of freedom
that behaves in particular ways when multiplied. The part we usually call the “real” part,
$a$
,
is the main degree of freedom, and
$b$
measures some secondary degree of freedom. The two degrees
of freedom are in some sense “orthogonal” to one another.
Continuing this further, we see that we can represent rotations by any arbitrary angle
$\theta $
using this scheme. The basic
$2\times 2$
rotation matrix derived in
Section 5.1.1 happens to be in this special set of matrices that we
are mapping to the complex numbers. It maps to the complex number
$\mathrm{cos}\theta +i\mathrm{sin}\theta $
:
Unit complex numbers as rotations
$$\mathrm{cos}\theta +i\mathrm{sin}\theta \equiv \left[\begin{array}{cc}\mathrm{cos}\theta & \mathrm{sin}\theta \\ \mathrm{sin}\theta & \mathrm{cos}\theta \end{array}\right].$$
Notice how complex conjugation (negating the complex part) corresponds to matrix transposition.
This is particularly pleasing. Remember that the conjugate of a quaternion expresses the inverse
angular displacement. A corresponding fact is true for transposing rotation matrices: since they
are orthogonal, their transpose is equal to their inverse.
How do ordinary 2D vectors fit into this scheme? We interpret the vector
$[x,y]$
as the complex
number
$x+iy$
, and then we can interpret the multiplication of two complex numbers
$$\begin{array}{rl}(\mathrm{cos}\theta +i\mathrm{sin}\theta )(x+iy)& =x\mathrm{cos}\theta +iy\mathrm{cos}\theta +ix\mathrm{sin}\theta +{i}^{2}y\mathrm{sin}\theta \\ & =(x\mathrm{cos}\theta y\mathrm{sin}\theta )+i(x\mathrm{sin}\theta +y\mathrm{cos}\theta )\end{array}$$
as performing a rotation. This is equivalent to the matrix multiplication
$$\left[\begin{array}{cc}\mathrm{cos}\theta & \mathrm{sin}\theta \\ \mathrm{sin}\theta & \mathrm{cos}\theta \end{array}\right]\left[\begin{array}{c}x\\ y\end{array}\right]=\left[\begin{array}{c}x\mathrm{cos}\theta y\mathrm{sin}\theta \\ x\mathrm{sin}\theta +y\mathrm{cos}\theta \end{array}\right].$$
While this a not much more that mathematical trivia so far, our goal is to build up some
parallels that we can carry forward to quaternions, so let's repeat the key result.
In 2D, we can interpret the vector
$[x,y]$
as a complex number
$x+yi$
and rotate
it by using the complex multiplication
$(\mathrm{cos}\theta +i\mathrm{sin}\theta )(x+iy)$
.
A similar conversion from ordinary vectors to complex numbers is necessary in order to multiply
quaternions and 3D vectors.
Before we leave 2D, let's summarize what we've learned so far. Complex numbers are mathematical
objects with two degrees of freedom that obey certain rules when we multiply them. These objects
are usually written as
$a+bi$
, but can equivalently be written as a
$2\times 2$
matrix. When we
write complex numbers as matrices, it begs the geometric interpretation of multiplication by
$i$
as
a
$90{}^{\mathrm{o}}$
rotation. The rule
${i}^{2}=1$
has the interpretation that combining two
$90{}^{\mathrm{o}}$
rotations yields a
$180{}^{\mathrm{o}}$
rotation, and that leaves us with a warm fuzzy feeling. More
generally, any complex number with unit length can be written as
$\mathrm{cos}\theta +i\mathrm{sin}\theta $
and
interpreted as a rotation by the angle
$\theta $
. If we convert a 2D vector into complex form and
multiply it by
$\mathrm{cos}\theta +i\mathrm{sin}\theta $
, it has the effect of performing the rotation.
It's very tempting to extend this trick from 2D into 3D. Tempting, but alas not possible in the
straightforward way. The Irish mathematician William Hamilton (1805–1865) apparently fell victim
to just this temptation, and had looked for a way to extend complex numbers from 2D to 3D for
years. This new type of complex number, he thought, would have one real part and two imaginary
parts. However, Hamilton was unable to create a useful type of complex number with two imaginary
parts. Then, as the story goes, in 1843, on his way to a speech at the Royal Irish Academy, he
suddenly realized that three imaginary parts were needed rather than two. He carved the
equations that define the properties of this new type of complex number on the
Broom Bridge. His original marks have faded into legend, but a commemorative plaque holds their
place. Thus quaternions were invented.
Since we weren't on the Broom Bridge in 1843, we can't say for certain what made Hamilton realize
that a 3D system of complex numbers was no good, but we can show how such a set could not be
easily mapped to
$3\times 3$
matrices and rotations. A 3D complex number would have two complex
parts,
$i$
and
$j$
, with the properties
${i}^{2}={j}^{2}=1$
. (We would also need to define the values of
the products
$ij$
and
$ji$
. Exactly what these rules should be, we're not sure; perhaps Hamilton
realized this was a dead end. In any case, it doesn't matter for the present discussion.) Now, a
straightforward extension of the ideas from 2D would mean that we could somehow associate the
numbers 1,
$i$
, and
$j$
with the set of
$3\times 3$
matrices, such that all the usual algebra
laws hold. The number 1 must obviously map to the 3D identity matrix
${\mathbf{I}}_{3}$
. The number
$1$
should map to its negative,
${\mathbf{I}}_{3}$
, which has
$1$
s on the diagonal. But now we run
into a problem trying to a find matrices for
$i$
and
$j$
whose square is
${\mathbf{I}}_{3}$
. We can
quickly see that this is not possible because the
determinant of
${\mathbf{I}}_{3}$
is
$1$
. To be a root of this matrix,
$i$
or
$j$
must have a
determinant that is the square root of
$1$
because the determinant of a matrix product is the
product of the determinants. The only way this can work is for
$i$
and
$j$
to contain entries
that are complex. In short, there doesn't seem to be a coherent system of 3D complex numbers;
certainly there isn't one that maps elegantly to rotations analogously to standard complex
numbers and 2D rotations. For that, we need quaternions.
Quaternions extend the complex number system by having three imaginary numbers,
$i$
,
$j$
, and
$k$
, which are related by Hamilton's famous equations:
The rules for 4D complex numbers that Hamilton wrote on the Broom Bridge
$$\begin{array}{}\text{(8.10)}& \begin{array}{rl}& {i}^{2}={j}^{2}={k}^{2}=1\\ & ij=k,{\textstyle \phantom{\rule{1em}{0ex}}}ji=k,\\ & jk=i,{\textstyle \phantom{\rule{1em}{0ex}}}kj=i,\\ & ki=j,{\textstyle \phantom{\rule{1em}{0ex}}}ik=j.\end{array}\end{array}$$
The quaternion we have been denoting
$[w,(x,y,z)]$
corresponds to the complex number
$w+xi+yj+zk$
. The definition of the quaternion product given in
Section 8.5.7 follows from these rules. (Also see
Exercise 6.) The dot product, however, basically ignores all of
this complex
$i,j,k$
business and treats the operands as simple 4D vectors.
Now we return to matrices. Can we embed the set of quaternions into the set of matrices such that
Hamilton's rules in Equation (8.10) still hold? Yes, we can, although,
as you might expect, we map them to
$4\times 4$
matrices. Real numbers are mapped to a matrix
with the number on each entry of the diagonal as before,
$$a\equiv \left[\begin{array}{cccc}a& 0& 0& 0\\ 0& a& 0& 0\\ 0& 0& a& 0\\ 0& 0& 0& a\end{array}\right],$$
and the complex quantities are mapped to the matrices
Mapping the three complex quantities to
$\mathbf{4}\mathbf{\times}\mathbf{4}$
matrices
$$\begin{array}{}\text{(8.11)}& i& \equiv \left[\begin{array}{cccc}0& 0& 0& 1\\ 0& 0& 1& 0\\ 0& 1& 0& 0\\ 1& 0& 0& 0\end{array}\right],& j& \equiv \left[\begin{array}{cccc}0& 0& 1& 0\\ 0& 0& 0& 1\\ 1& 0& 0& 0\\ 0& 1& 0& 0\end{array}\right],& k& \equiv \left[\begin{array}{cccc}0& 1& 0& 0\\ 1& 0& 0& 0\\ 0& 0& 0& 1\\ 0& 0& 1& 0\end{array}\right].\end{array}$$
We encourage you to convince yourself that these mappings do preserve all of Hamilton's rules
before moving on.
Combining the above associations, we can map an arbitrary quaternion to a
$4\times 4$
matrix as
Encoding quaternions as
$\mathbf{4}\mathbf{\times}\mathbf{4}$
matrices
$$\begin{array}{}\text{(8.12)}& w+xi+yj+zk\equiv \left[\begin{array}{cccc}w& z& y& x\\ z& w& x& y\\ y& x& w& z\\ x& y& z& w\end{array}\right].\end{array}$$
Once again, we notice how complex conjugation (negating
$x$
,
$y$
, and
$z$
) corresponds to matrix
transposition.
Everything we've said so far applies to quaternions of any length. Now let's get back to
rotations. We can see that the
$i$
,
$j$
,
$k$
matrices in Equation (8.11) permute
and negate axes, so they bear some similarity to
$90{}^{\mathrm{o}}$
rotations or reflections. Let's see
if we can carry the simple ideas from 2D forward with these matrices. Notice how the upper left
$2\times 2$
portion of the
$k$
matrix is the same as the very first
$2\times 2$
matrix for
$i$
; in
other words, part of
$k$
is a
$90{}^{\mathrm{o}}$
rotation about
$z$
. By analogy with the 2D case, we
might reasonably expect the quaternion
$\mathrm{cos}\theta +k\mathrm{sin}\theta $
to represent a rotation about
the
$z$
axis by an arbitrary angle
$\theta $
. Let's multiply it by the vector
$[1,0,0]$
and see
what happens. As in the 2D case, we need to “promote” the vector into the complex domain; what's
different here is that quaternions have an extra number. We'll map
$[x,y,z]$
to the complex number
$0+xi+yj+zk$
, so the vector
$[1,0,0]$
is simply
$i$
. Expanding the multiplication, we have
$$\begin{array}{rl}(\mathrm{cos}\theta +k\mathrm{sin}\theta )i& =i\mathrm{cos}\theta +ki\mathrm{sin}\theta ,\\ & =i\mathrm{cos}\theta +j\mathrm{sin}\theta ,\end{array}$$
which corresponds to
$[\mathrm{cos}\theta ,\mathrm{sin}\theta ,0]$
, exactly what we would expect when rotating
the
$x$
axis about the
$z$
axis. So far, all is good. Let's try a slightly more general vector
$[1,0,1]$
, which is represented in the complex domain as
$i+k$
:
$$\begin{array}{rl}(\mathrm{cos}\theta +k\mathrm{sin}\theta )(i+k)& =i\mathrm{cos}\theta +k\mathrm{cos}\theta +ki\mathrm{sin}\theta +{k}^{2}\mathrm{sin}\theta \\ \text{(8.13)}& & =i\mathrm{cos}\theta +j\mathrm{sin}\theta +k\mathrm{cos}\theta \mathrm{sin}\theta .\end{array}$$
This result does not correspond to a vector at all, since it has a nonzero value for
$w$
. The
rotation in the
$xy$
plane worked as expected, but unfortunately, the
$z$
component did not come
out right. There is unwanted rotation in the
$zw$
hyperplane. This is made perfectly clear by
looking at how
$(\mathrm{cos}\theta +k\mathrm{sin}\theta )$
is represented as a
$4\times 4$
matrix:
$$\begin{array}{r}\mathrm{cos}\theta +k\mathrm{sin}\theta \equiv \left[\begin{array}{cccc}\mathrm{cos}\theta & \mathrm{sin}\theta & 0& 0\\ \mathrm{sin}\theta & \mathrm{cos}\theta & 0& 0\\ 0& 0& \mathrm{cos}\theta & \mathrm{sin}\theta \\ 0& 0& \mathrm{sin}\theta & \mathrm{cos}\theta \end{array}\right].\end{array}$$
The upperleft
$2\times 2$
rotation matrix is the one we want; the lowerright
$2\times 2$
rotation matrix is not wanted.
Now we are left wondering if maybe we did something wrong. Perhaps there are other
$4\times 4$
roots of
$1$
we could use for
$i$
,
$j$
, and
$k$
—alternative ways that we could embed the
quaternion set within the set of
$4\times 4$
matrices. In fact, there are other alternatives, and
this is a hint that something is a bit different from the 2D case. Unfortunately, all of these
alternatives exhibit variations of what is essentially the same behavior we are seeing here.
Perhaps, instead, our problem is that we did the multiplication in the wrong order. (After all,
multiplication of
$i$
,
$j$
, and
$k$
is not commutative.) Let's try putting the vector on the left
and the rotation quaternion on the right:
$$\begin{array}{rl}(i+k)(\mathrm{cos}\theta +k\mathrm{sin}\theta )& =i\mathrm{cos}\theta +ik\mathrm{sin}\theta +k\mathrm{cos}\theta +{k}^{2}\mathrm{sin}\theta \\ & =i\mathrm{cos}\theta j\mathrm{sin}\theta +k\mathrm{cos}\theta \mathrm{sin}\theta .\end{array}$$
Comparing this to Equation (8.13), when the operands were in the opposite
order, we see that the only difference is the sign of the
$y$
coordinate. At first glance, it looks
like this is actually worse. The rotation in the
$xz$
plane that we want got inverted; now we have
a rotation by
$\theta $
. Meanwhile, the extra rotation we didn't want is exactly the same as
before. But perhaps you can already see the solution. If we use the opposite rotation, which
corresponds to using the conjugate of the quaternion, we fix both problems:
$$(i+k)(\mathrm{cos}\theta k\mathrm{sin}\theta )=i\mathrm{cos}\theta +j\mathrm{sin}\theta k\mathrm{cos}\theta +\mathrm{sin}\theta .$$
So, multiplying on the left by
$(\mathrm{cos}\theta +k\mathrm{sin}\theta )$
produced the rotation we wanted,
plus some extra rotation we didn't want, and multiplication on the right by the conjugate achieved
the same rotation that was desired, with the opposite undesired rotation. If we combine these two
steps, the unwanted rotation is canceled out, and we are left with only the rotation we want. Well,
not quite, we are left with twice the rotation we want, but this is easily fixed by using
$\theta /2$
instead of
$\theta $
. Of course, we knew that
$\theta /2$
would appear somewhere, but now
we see the reason. Let's summarize our findings from the preceding paragraphs.
To extend the ideas about complex numbers and rotations from 2D to quaternions, we first
convert the vector
$[x,y,z]$
to quaternion form as
$\mathbf{v}=[0,(x,y,z)]$
.
A straightforward approach to rotate the vector by the angle
$\theta $
about the
axis
$\hat{\mathbf{n}}$
would be to create the quaternion
$\mathbf{q}=[\mathrm{cos}\theta ,\mathrm{sin}\theta {\textstyle \phantom{\rule{thinmathspace}{0ex}}}\hat{\mathbf{n}}]$
and then perform the multiplication
$\mathbf{q}\mathbf{v}$
. This, however,
does not work; while the result contains the rotation we want, it also contains an
unwanted rotation into
$w$
. The multiplication
$\mathbf{v}{\mathbf{q}}^{\ast}$
also
produces the rotation we want plus some unwanted rotation, but in this case
the unwanted rotation is exactly opposite of that produced by
$\mathbf{q}\mathbf{v}$
.
The solution is to use the half angle and set
$\mathbf{q}=[\mathrm{cos}(\theta /2),\mathrm{sin}(\theta /2){\textstyle \phantom{\rule{thinmathspace}{0ex}}}\hat{\mathbf{n}}]$
, and the rotation is accomplished by performing both multiplications:
$\mathbf{q}\mathbf{v}{\mathbf{q}}^{\ast}$
. The first rotation rotates halfway to the goal,
plus some unwanted rotation involving
$w$
. The second rotation completes the desired
rotation while also canceling the unwanted rotation.
Before we leave this section, let us go back and clear up one last finer point. We mentioned that
there are other ways we could embed the set of quaternions within the set of
$4\times 4$
matrices.
(Equations (8.11) and (8.12) aren't the
only way to do it.) McDonald [9] explores this idea in more
detail; here we merely want to note that this is another underlying cause of the need for
$\mathbf{q}\mathbf{v}{\mathbf{q}}^{1}$
. Using just a single multiplication, the variations in the
embedding would produce variations in the rotated result. When both multiplications are present,
the change from one style to another produces a change on the left that is exactly canceled by the
matching change on the right.
8.5.15Summary of Quaternions
Section 8.5 has covered a lot of math, and most of it isn't important to remember. The
facts that are important to remember about quaternions are summarized here.
A lot more has been written about quaternions than we have had the space to discuss here. The
technical report by Dam et al [2] is a good mathematical summary.
Kuiper's book [8] is written from an aerospace perspective and also does a
good job of connecting quaternions and Euler angles. Hanson's modestly titled Visualizing
Quaternions [6] analyzes quaternions using tools from several
different disciplines (Riemannian Geometry, complex numbers, lie algebra, moving frames) and is
sprinkled with interesting engineering and mathematical lore; it also discusses how to visualize
quaternions. A shorter presentation on visualizing quaternions is given by Hart et al.
[7].
8.6Comparison of Methods
Let's review the most important discoveries from the previous sections. Table 8.1
summarizes the differences among the three representation methods.

Rotating points between coordinate spaces (object and upright)
 Matrix: Possible; can often by highly optimized by SIMD instructions.
 Euler Angles: Impossible (must convert to rotation matrix).
 Exponential Map: Impossible (must convert to rotation matrix).

Quaternion: On a chalkboard, yes. Practically, in a computer, not really. You
might as well convert to rotation matrix.

Concatenation of multiple rotations

Matrix: Possible; can often be highly optimized by SIMD instructions.
Watch out for matrix creep.
 Euler Angles: Impossible.
 Exponential Map: Impossible.

Quaternion: Possible. Fewer scalar operations than matrix multiplication, but maybe not as easy
to take advantage of SIMD instructions.
Watch out for error creep.

Inversion of rotations
 Matrix: Easy and fast, using matrix transpose.
 Euler Angles: Not easy.
 Exponential Map: Easy and fast, using vector negation.
 Quaternion: Easy and fast, using quaternion conjugate.

Interpolation
 Matrix: Extremely problematic.
 Euler Angles: Possible, but Gimbal lock causes quirkiness.
 Exponential Map: Possible, with some singularities, but not as troublesome as Euler angles.
 Quaternion: Slerp provides smooth interpolation.

Direct human interpretation
 Matrix: Difficult.
 Euler Angles: Easiest.
 Exponential Map: Very difficult.
 Quaternion: Very difficult.

Storage efficiency in memory or in a file
 Matrix: Nine numbers.
 Euler Angles: Three numbers that can be easily quantized.
 Exponential Map: Three numbers that can be easily quantized.

Quaternion: Four numbers that do not quantize well; can be reduced to three
by assuming fourth component is always nonnegative and quaternion
has unit length.

Unique representation for a given rotation
 Matrix: Yes.
 Euler Angles: No, due to aliasing.

Exponential Map: No, due to aliasing, but not as complicated as Euler angles.

Quaternion: Exactly two distinct representations for any angular displacement,
and they are negatives of each other.

Possible to become invalid
 Matrix: Six degrees of redundancy inherent in orthogonal matrix. Matrix creep can occur.
 Euler Angles: Any three numbers can be interpreted unambiguously.

Exponential Map: Any three numbers can be interpreted unambiguously.

Quaternion: Error creep can occur.
Table 8.1Comparison of matrices, Euler angles, exponential maps, and quaternions
Some situations are better suited for one orientation format or another. The following advice
should aid you in selecting the best format:

Euler angles are easiest for humans to work with. Using Euler angles greatly
simplifies human interaction when specifying the orientation of objects in the world.
This includes direct keyboard entry of an orientation, specifying orientations directly
in the code (i.e., positioning the camera for rendering), and examination in the
debugger. This advantage should not be underestimated. Certainly don't sacrifice ease
of use in the name of “optimization” until you are certain that it will make a
difference.

Matrix form must eventually be used if vector coordinate space transformations are
needed. However, this doesn't mean you can't store the orientation in another format
and then generate a rotation matrix when you need it. A common strategy is to store
the “main copy” of an orientation in Euler angle or quaternion form, but also to
maintain a matrix for rotations, recomputing this matrix any time the Euler angles or
quaternion change.

For storage of large numbers of orientations (e.g., animation data), Euler angles,
exponential maps, and quaternions offer various tradeoffs. In general, the
components of Euler angles and exponential maps quantize better than quaternions. It
is possible to store a rotation quaternion in only three numbers. Before discarding
the fourth component, we check its sign; if it's negative, we negate the quaternion.
Then the discarded component can be recovered by assuming the quaternion has unit
length.

Reliable quality interpolation can be accomplished only by usingquaternions. Even if
you are using a different form, you can always convert to quaternions, perform the
interpolation, and then convert back to the original form. Direct interpolation using
exponential maps might be a viable alternative in some cases, as the points of
singularity are at very extreme orientations and in practice are often easily avoided.

For angular velocity or any other situation where “extra spins” need to be
represented, use the exponential map or axisangle.
We have established that different methods of representing orientation are appropriate in
different situations, and have also provided some guidelines for choosing the most appropriate
method. This section discusses how to convert an angular displacement from one format to another.
It is divided into six subsections:
For more on converting between representation forms, see the paper by James Diebel
[3].
8.7.1Converting Euler Angles to a Matrix
Euler angles define a sequence of three rotations. Each of these three rotations is a simple
rotation about a cardinal axis, so each is easy to convert to matrix form individually. We can
compute the matrix that defines the total angular displacement by concatenating the matrices for
each individual rotation. This exercise is carried out in numerous books and websites. If you've
ever tried to use one of these references, you may have been left wondering, “Exactly what
happens if I multiply a vector by this matrix?” The reason for your confusion is because they
forgot to mention whether the matrix rotates from object space to upright space or from upright
space to object space. In other words, there are actually two different matrices, not
just one. (Of course, they are transposes of each other, so, in a sense, there really is only
one matrix.) This section shows how to compute both.
Some readers might think that we are belaboring the point. Maybe you figured out how to use that
rotation matrix from that book or website, and now it's totally obvious to you. But we've seen
this be a stumbling block for too many programmers, so we have chosen to dwell on this point. One
common example will illustrate the confusion we've seen.
Consider a typical realtime situation with objects moving around. Assume that the orientation
of each object is maintained in Euler angle format as a state variable. One of these objects is
the camera, and naturally we use the same system of Euler angles to describe the orientation of
the camera as we do for any other object. Now, at some point we are going to need to communicate
those reference frames to the graphics API. This is where the confusion occurs: the matrix we
use to describe the orientation of the objects is not the same matrix we use to describe
the orientation of the camera! The graphics API needs two types of matrices. (We'll discuss
them in more detail in Section 10.3.1.)
The model transform is a matrix that transforms vectors from object space to world space.
The view transform transforms vectors from world space to the camera's object space. The
rotation portion of the model transform matrix is an objecttoupright matrix, but the rotation
portion of the view transform matrix is an uprighttoobject matrix. So to speak of the
Euler rotation matrix leaves out some important practical details.
Now to derive the matrices. We start by deriving the objecttoupright matrix, which rotates
points from object space to upright space. We will be using the simple rotation matrices
developed in Section 5.1.2, and those were developed
using the perspective of active transformation (see
Section 3.3.1 if you don't remember the difference between
activate and passive transformations). Thus, to visualize the task at hand, imagine an arbitrary
point on our object. The object starts out in the “identity” orientation, and the body
coordinates for our point, which are the coordinates we know, also happen to be the upright
coordinates at this time because the two spaces are aligned. We perform the sequence of Euler
rotations on the object, and the point moves in space until, after the third rotation, the object
has arrived in the orientation described by the Euler angles. All the while, our upright
coordinate space used to measure the coordinates remains fixed. So the final result of these
calculations is the upright coordinates for the point in its arbitrary orientation.
There's one last catch. The elementary rotation matrices that we wish to use as building blocks
each rotate about a cardinal axis. With Euler angles, the axes of rotation are the body axes,
which (after the first rotation) will be arbitrarily oriented. So instead of doing the Euler
rotations about the body axes, we do fixedaxis rotations, where the rotations are about the
upright axes. This means we actually do the rotations in the reverse order: first bank, then
pitch, and finally heading. See Section 8.3.2 if you don't remember
what fixedaxis rotations are.
In summary, the generation of the objecttoupright rotation matrix is a straightforward
concatenation of three simple rotation matrices,
$${\mathbf{M}}_{\mathit{o}bject\to upright}=\mathbf{B}\mathbf{P}\mathbf{H},$$
where
$\mathbf{B}$
,
$\mathbf{P}$
, and
$\mathbf{H}$
are the rotation matrices for bank, pitch, and
heading, which rotate about the
$z$
,
$x$
, and
$y$
axes, respectively. We learned how to compute
these elementary rotation matrices in Section 5.1.2.
Elementary rotation matrices for bank, pitch, and heading
$$\begin{array}{rl}\mathbf{B}={\mathbf{R}}_{z}(b)& =\left[\begin{array}{ccc}\mathrm{cos}b& \mathrm{sin}b& 0\\ \mathrm{sin}b& \mathrm{cos}b& 0\\ 0& 0& 1\end{array}\right],\\ \mathbf{P}={\mathbf{R}}_{x}(p)& =\left[\begin{array}{ccc}1& 0& 0\\ 0& \mathrm{cos}p& \mathrm{sin}p\\ 0& \mathrm{sin}p& \mathrm{cos}p\end{array}\right],\\ \mathbf{H}={\mathbf{R}}_{y}(h)& =\left[\begin{array}{ccc}\mathrm{cos}h& 0& \mathrm{sin}h\\ 0& 1& 0\\ \mathrm{sin}h& 0& \mathrm{cos}h\end{array}\right].\end{array}$$
Putting it all together (and leaving out the messy math to actually do the matrix
multiplications), we have
Objecttoupright rotation matrix from Euler angles