Sightations

Q & A: Recovering pose of a calibrated camera - Algebraic vs. Geometric method?

2015-03-29T00:00:00-07:00

This week I received an email with a question about recovering camera pose:

Q: I have images with a known intrinsic matrix, and corresponding points in world and image coordinates. What's the best technique to resolve the extrinsic matrix? Hartley and Zisserman cover geometric and algebraic approaches. What are the tradeoffs between the geometric and algebraic approaches? Under what applications would we choose one or the other?

This topic is covered in Section 7.3 of Multiple View Geometry in Computer Vision, "Restricted camera estimation." The authors describe a method for estimating a subset of camera parameters when the others are known beforehand. One common scenario is recovering pose (position and orientation) given intrinsic parameters.

Assume you have multiple 2D image points whose corresponding 3D position is known. The authors outline two different error functions for the camera: a geometric error function which measures the distance between the 3D point's projection and the 2D observation, and an algebraic error function, which is the residual of a homogeneous least-squares problem (constructed in section 7.1). The choice of error function can be seen as a trade-off between quality and speed. First I will describe why the geometric solution is better for quality and then why the algebraic solution is faster.

Let $X_i$ be a 3D point and $x_i$ be its observation. The plane $w$ contains $X_i$ and is parallel to the image plane. The algebraic error is $\Delta$, the distance between $X_i$ and the backprojection ray in the plane $w$. The geometric error $d$ is the distance between $x_i$ and projection of $X_i$ onto the image plane, $f$. Note that as the 3D point moves farther from the camera, the algebraic error increases, while the geometric error remains constant.

The geometric solution is generally considered the "right" solution, in the sense that the assumptions about noise are the most sensible in the majority of cases. Penalizing the squared distance between the 2D observation and the projection of the 3D point amounts to assuming noise arises from the imaging process (e.g. due to camera/lens/sensor imperfections) and is i.i.d. Gaussian distributed in the image plane. In contrast, roughly speaking, the algebraic error measures the distance between the known 3D point and the observation’s backprojection ray. This implies errors arise from noise in 3D points as opposed to the camera itself, and tends to overemphasize distant points when finding a solution. For this reason, Hartley and Zisserman call the solution with minimal geometric error the "gold standard" solution.

The geometric approach also has an advantage of letting you use different cost functions if necessary. For example, if your correspondences include outliers, they could wreak havok on your calibration under a squared-error cost function. Using the geometric approach, you could swap-in a robust cost function (e.g. the Huber function), which will minimize the influence of outliers.

The cost of doing the "right thing" is running time. Both solutions require costly iterative minimization, but the geometric solution's cost function grows linearly with the number of observations, whereas the algebraic cost function is constant (after an SVD operation in preprocessing). In Hartley and Zisserman's example, the two approaches give very similar results.

If speed isn't a concern (e.g. if calibration is performed off-line), the geometric solution is the way to go. The geometric approach may also be easier to implement -- just take an existing bundle adjustment routine like the one provided by Ceres Solver, and hold the 3D points and intrinsic parameters fixed. Also, if the number of observations is small, the algebraic approach loses its advantages, because the SVD required for preprocessing could eclipse the gains of its efficient cost function. So the geometric solution could be preferable, even in real time scenarios.

If speed is a concern and you have many observations, a two-pass approach might work well. First solve using the algebraic technique, then use it to initialize a few iterations of the geometric approach. Your mileage may vary. Finally, if you are recovering multiple poses of a moving camera, you will likely want to run bundle adjustment as a final step anyway, which jointly minimizes the geometric error of all camera poses and the 3D point locations. In this case, the algebraic solution is almost certainly a "good enough" first pass.

I hope that helps!

Compiling ELSD (Ellipse and Line Segment Detector) on OS X

2014-04-28T00:00:00-07:00

Input image

ELSD results

ELSD is a new program for detecting line segments and elliptical curves in images. It gives very impressive results by using a novel model selection criterion to distinguish noise curves from foreground, as detailed in the author's ECCV 2012 paper. Most impressive, it works out of the box with no parameter tuning.

The authors have generously released their code under Affero GPL, but it requires a few tweaks to compile on OSX.

First, in process_curve.c, replace this line:

#include <clapack.h>

with this:

#ifdef __APPLE__
#include <Accelerate/Accelerate.h>
#else
#include <clapack.h>
#endif

Second, in makefile, change this line

cc -o elsd elsd.c valid_curve.c process_curve.c process_line.c write_svg.c -llapack_LINUX -lblas_LINUX -llibf2c -lm

to this:

cc -o elsd -framework accelerate  elsd.c valid_curve.c process_curve.c process_line.c write_svg.c -lf2c -lm

Thanks to authors Viorica Pătrăucean, Pierre Gurdjos, and Rafael Grompone von Gioi for sharing this valuable new tool!

Update: I've written a python script to convert ELSD's output into polylines, check out the code page

Dissecting the Camera Matrix, Part 3: The Intrinsic Matrix

2013-08-13T00:00:00-07:00

Credit: Dave6163 (via Flickr)

Today we'll study the intrinsic camera matrix in our third and final chapter in the trilogy "Dissecting the Camera Matrix." In the first article, we learned how to split the full camera matrix into the intrinsic and extrinsic matrices and how to properly handle ambiguities that arise in that process. The second article examined the extrinsic matrix in greater detail, looking into several different interpretations of its 3D rotations and translations. Today we'll give the same treatment to the intrinsic matrix, examining two equivalent interpretations: as a description of the virtual camera's geometry and as a sequence of simple 2D transformations. Afterward, you'll see an interactive demo illustrating both interpretations.

If you're not interested in delving into the theory and just want to use your intrinsic matrix with OpenGL, check out the articles Calibrated Cameras in OpenGL without glFrustum and Calibrated Cameras and gluPerspective.

All of these articles are part of the series "The Perspective Camera, an Interactive Tour." To read the other entries in the series, head over to the table of contents.

The Pinhole Camera

The intrinsic matrix transforms 3D camera cooordinates to 2D homogeneous image coordinates. This perspective projection is modeled by the ideal pinhole camera, illustrated below.

The intrinsic matrix is parameterized by Hartley and Zisserman as

\[ K = \left ( \begin{array}{ c c c} f_x & s & x_0 \\ 0 & f_y & y_0 \\ 0 & 0 & 1 \\ \end{array} \right ) \]

Each intrinsic parameter describes a geometric property of the camera. Let's examine each of these properties in detail.

Focal Length, $f_x$, $f_y$

The focal length is the distance between the pinhole and the film (a.k.a. image plane). For reasons we'll discuss later, the focal length is measured in pixels. In a true pinhole camera, both $f_x$ and $f_y$ have the same value, which is illustrated as $f$ below.

In practice, $f_x$ and $f_y$ can differ for a number of reasons:

Flaws in the digital camera sensor.
The image has been non-uniformly scaled in post-processing.
The camera's lens introduces unintentional distortion.
The camera uses an anamorphic format, where the lens compresses a widescreen scene into a standard-sized sensor.
Errors in camera calibration.

In all of these cases, the resulting image has non-square pixels.

Having two different focal lengths isn't terribly intuitive, so some texts (e.g. Forsyth and Ponce) use a single focal length and an "aspect ratio" that describes the amount of deviation from a perfectly square pixel. Such a parameterization nicely separates the camera geometry (i.e. focal length) from distortion (aspect ratio).

Principal Point Offset, $x_0$, $y_0$

The camera's "principal axis" is the line perpendicular to the image plane that passes through the pinhole. Its itersection with the image plane is referred to as the "principal point," illustrated below.

The "principal point offset" is the location of the principal point relative to the film's origin. The exact definition depends on which convention is used for the location of the origin; the illustration below assumes it's at the bottom-left of the film.

Increasing $x_0$ shifts the pinhole to the right:

This is equivalent to shifting the film to the left and leaving the pinhole unchanged.

Notice that the box surrounding the camera is irrelevant, only the pinhole's position relative to the film matters.

Axis Skew, $s$

Axis skew causes shear distortion in the projected image. As far as I know, there isn't any analogue to axis skew a true pinhole camera, but apparently some digitization processes can cause nonzero skew. We'll examine skew more later.

Other Geometric Properties

The focal length and principal point offset amount to simple translations of the film relative to the pinhole. There must be other ways to transform the camera, right? What about rotating or scaling the film?

Rotating the film around the pinhole is equivalent to rotating the camera itself, which is handled by the extrinsic matrix. Rotating the film around any other fixed point $x$ is equivalent to rotating around the pinhole $P$, then translating by $(x-P)$.

What about scaling? It should be obvious that doubling all camera dimensions (film size and focal length) has no effect on the captured scene. If instead, you double the film size and not the focal length, it is equivalent to doubling both (a no-op) and then halving the focal length. Thus, representing the film's scale explicitly would be redundant; it is captured by the focal length.

Focal Length - From Pixels to World Units

This discussion of camera-scaling shows that there are an infinite number of pinhole cameras that produce the same image. The intrinsic matrix is only concerned with the relationship between camera coordinates and image coordinates, so the absolute camera dimensions are irrelevant. Using pixel units for focal length and principal point offset allows us to represent the relative dimensions of the camera, namely, the film's position relative to its size in pixels.

Another way to say this is that the intrinsic camera transformation is invariant to uniform scaling of the camera geometry. By representing dimensions in pixel units, we naturally capture this invariance.

You can use similar triangles to convert pixel units to world units (e.g. mm) if you know at least one camera dimension in world units. For example, if you know the camera's film (or digital sensor) has a width $W$ in millimiters, and the image width in pixels is $w$, you can convert the focal length $f_x$ to world units using:

\[ F_x = f_x \frac{W}{w} \]

Other parameters $f_y$, $x_0$, and $y_0$ can be converted to their world-unit counterparts $F_y$, $X_0$, and $Y_0$ using similar equations:

\[ \begin{array}{ccc} F_y = f_y \frac{H}{h} \qquad X_0 = x_0 \frac{W}{w} \qquad Y_0 = y_0 \frac{H}{h} \end{array} \]

The Camera Frustum - A Pinhole Camera Made Simple

As we discussed earlier, only the arrangement of the pinhole and the film matter, so the physical box surrounding the camera is irrelevant. For this reason, many discussion of camera geometry use a simpler visual representation: the camera frustum.

The camera's viewable region is pyramid shaped, and is sometimes called the "visibility cone." Lets add some 3D spheres to our scene and show how they fall within the visibility cone and create an image.

Since the camera's "box" is irrelevant, let's remove it. Also, note that the film's image depicts a mirrored version of reality. To fix this, we'll use a "virtual image" instead of the film itself. The virtual image has the same properties as the film image, but unlike the true image, the virtual image appears in front of the camera, and the projected image is unflipped.

Note that the position and size of the virtual image plane is arbitrary — we could have doubled its size as long as we also doubled its distance from the pinhole.

After removing the true image we're left with the "viewing frustum" representation of our pinhole camera.

The pinhole has been replaced by the tip of the visibility cone, and the film is now represented by the virtual image plane. We'll use this representation for our demo later.

Intrinsic parameters as 2D transformations

In the previous sections, we interpreted our incoming 3-vectors as 3D image coordinates, which are transformed to homogeneous 2D image coordinates. Alternatively, we can interpret these 3-vectors as 2D homogeneous coordinates which are transformed to a new set of 2D points. This gives us a new view of the intrinsic matrix: a sequence of 2D affine transformations.

We can decompose the intrinsic matrix into a sequence of shear, scaling, and translation transformations, corresponding to axis skew, focal length, and principal point offset, respectively:

\[ \begin{align} K &= \left ( \begin{array}{ c c c} f_x & s & x_0 \\ 0 & f_y & y_0 \\ 0 & 0 & 1 \\ \end{array} \right ) \\[0.5em] &= \underbrace{ \left ( \begin{array}{ c c c} 1 & 0 & x_0 \\ 0 & 1 & y_0 \\ 0 & 0 & 1 \end{array} \right ) }_\text{2D Translation} \times \underbrace{ \left ( \begin{array}{ c c c} f_x & 0 & 0 \\ 0 & f_y & 0 \\ 0 & 0 & 1 \end{array} \right ) }_\text{2D Scaling} \times \underbrace{ \left ( \begin{array}{ c c c} 1 & s/f_x & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right ) }_\text{2D Shear} \end{align} \]

An equivalent decomposition places shear after scaling:

\[ \begin{align} K &= \underbrace{ \left ( \begin{array}{ c c c} 1 & 0 & x_0 \\ 0 & 1 & y_0 \\ 0 & 0 & 1 \end{array} \right ) }_\text{2D Translation} \times \underbrace{ \left ( \begin{array}{ c c c} 1 & s/f_y & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right ) }_\text{2D Shear} \times \underbrace{ \left ( \begin{array}{ c c c} f_x & 0 & 0 \\ 0 & f_y & 0 \\ 0 & 0 & 1 \end{array} \right ) }_\text{2D Scaling} \end{align} \]

This interpretation nicely separates the extrinsic and intrinsic parameters into the realms of 3D and 2D, respactively. It also emphasizes that the intrinsic camera transformation occurs post-projection. One notable result of this is that intrinsic parameters cannot affect visibility — occluded objects cannot be revealed by simple 2D transformations in image space.

Demo

The demo below illustrates both interpretations of the intrinsic matrix. On the left is the "camera-geometry" interpretation. Notice how the pinhole moves relative to the image plane as $x_0$ and $y_0$ are adjusted.

On the right is the "2D transformation" interpretation. Notice how changing focal length results causes the projected image to be scaled and changing principal point results in pure translation.

Javascript is required for this demo.

Dissecting the Camera Matrix, A Summary

Over the course of this series of articles we've seen how to decompose

the full camera matrix into intrinsic and extrinsic matrices,
the extrinsic matrix into 3D rotation followed by translation, and
the intrinsic matrix into three basic 2D transformations.

We summarize this full decomposition below.

\[ \begin{align} P &= \overbrace{K}^\text{Intrinsic Matrix} \times \overbrace{[R \mid \mathbf{t}]}^\text{Extrinsic Matrix} \\[0.5em] &= \overbrace{ \underbrace{ \left ( \begin{array}{ c c c} 1 & 0 & x_0 \\ 0 & 1 & y_0 \\ 0 & 0 & 1 \end{array} \right ) }_\text{2D Translation} \times \underbrace{ \left ( \begin{array}{ c c c} f_x & 0 & 0 \\ 0 & f_y & 0 \\ 0 & 0 & 1 \end{array} \right ) }_\text{2D Scaling} \times \underbrace{ \left ( \begin{array}{ c c c} 1 & s/f_x & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right ) }_\text{2D Shear} }^\text{Intrinsic Matrix} \times \overbrace{ \underbrace{ \left( \begin{array}{c | c} I & \mathbf{t} \end{array}\right) }_\text{3D Translation} \times \underbrace{ \left( \begin{array}{c | c} R & 0 \\ \hline 0 & 1 \end{array}\right) }_\text{3D Rotation} }^\text{Extrinsic Matrix} \end{align} \]

To see all of these transformations in action, head over to my Perpective Camera Toy page for an interactive demo of the full perspective camera.

Do you have other ways of interpreting the intrinsic camera matrix? Leave a comment or drop me a line!

Next time, we'll show how to prepare your calibrated camera to generate stereo image pairs. See you then!

Calibrated Cameras and gluPerspective

2013-06-18T00:00:00-07:00

After posting my last article relating glFrustum to the intrinsic camera matrix, I receieved some emails asking how the (now deprecated) gluPerspective function relates to the intrinsic matrix. We can show a similar result with gluPerspective as we did with glFrustum, namely that it is the product of a glOrtho matrix and a (modified) intrinsic camera matrix, but in this case the intrinsic matrix has different constraints. I'll be re-using notation and concepts from the previous article, so if you aren't familiar with them, I recommend reading it first.

Decomposing gluPerspective

The matrix generated by gluPerspective is

\[ \begin{align} \left ( \begin{array}{cccc} \frac{f}{\text{aspect}} & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & C' & D' \\ 0 & 0 & -1 & 0 \end{array} \right ) \end{align} \]

where

\[ \begin{align} f &= \cot(fovy/2) \\ C' &= -\frac{far + near}{far - near} \\ D' &= -\frac{2 \; far \; near}{far - near} \\ \end{align} \]

Like with glFrustum, gluPerspective permits no axis skew, but it also restricts the viewing volume to be centered around the camera's principal (viewing) axis. This means that the principal point offsets $x_0$ and $y_0$ must be zero, and the matrix generated by glOrtho must be centered, i.e. bottom = -top and left = -right. The Persp matrix corresponding to the intrinsic matrix is:

\[ Persp = \left( \begin{array}{cccc} \alpha & 0 & 0 & 0 \\ 0 & \beta & 0 & 0 \\ 0 & 0 & A & B \\ 0 & 0 & -1 & 0 \end{array} \right) \]

where

\[ \begin{align} A &= near + far \\ B &= near * far \end{align} \]

and the NDC matrix is

\[ \begin{align} NDC &= \left( \begin{array}{cccc} \frac{2}{right - left} & 0 & 0 & t_x \\ 0 & \frac{2}{top - bottom} & 0 & t_y \\ 0 & 0 & -\frac{2}{far - near} & t_z \\ 0 & 0 & 0 & 1 \end{array} \right) \\[1.5em] &= \left( \begin{array}{cccc} \frac{2}{width} & 0 & 0 & 0 \\ 0 & \frac{2}{height} & 0 & 0 \\ 0 & 0 & -\frac{2}{far - near} & t_z \\ 0 & 0 & 0 & 1 \end{array} \right) \end{align} \]

where

\[ \begin{align} t_x &= -\frac{right + left}{right - left} \\ t_y &= -\frac{top + bottom}{top - bottom} \\ t_z &= -\frac{far + near}{far - near} \end{align} \]

It is easy to show that the product $(NDC \times Persp)$ is equivalent to the matrix generated by gluPerspective(fovy, aspect, near, far) with

\[ \begin{align} \text{fovy} &= 2 \text{arctan}\left (\frac{\text{height}}{2 \beta} \right ) \\ \text{aspect} &= \frac{\beta}{\alpha} \frac{\text{width}}{\text{height}}. \end{align} \]

glFrustum vs. gluPerpsective

In my experience, the zero-skew assumption is usually reasonable, so glFrustum can provide a decent approximation to the full intrinsic matrix. However there is quite often a non-negligible principal point offset (~ 2% of the image size), even in high-quality cameras. For this reason, gluPerspective might be a good choice for quick-and-dirty demos, but for the most accurate simulation, you should use the full camera matrix like I described previously.

Calibrated Cameras in OpenGL without glFrustum

2013-06-03T00:00:00-07:00

Simulating a calibrated camera for augmented reality.

Credit: thp4

You've calibrated your camera. You've decomposed it into intrinsic and extrinsic camera matrices. Now you need to use it to render a synthetic scene in OpenGL. You know the extrinsic matrix corresponds to the modelview matrix and the intrinsic is the projection matrix, but beyond that you're stumped. You remember something about gluPerspective, but it only permits two degrees of freedom, and your intrinsic camera matrix has five. glFrustum looks promising, but the mapping between its parameters and the camera matrix aren't obvious and it looks like you'll have to ignore your camera's axis skew. You may be asking yourself, "I have a matrix, why can't I just use it?"

You can. And you don't have to jettison your axis skew, either. In this article, I'll show how to use your intrinsic camera matrix in OpenGL with minimal modification. For illustration, I'll use OpenGL 2.1 API calls, but the same matrices can be sent to your shaders in modern OpenGL.

glFrustum: Two Transforms in One

To better understand perspective projection in OpenGL, let's examine glFrustum. According to the OpenGL documentation,

glFrustum describes a perspective matrix that produces a perspective projection.

While this is true, it only tells half of the story.

In reality, glFrustum does two things: first it performs perspective projection, and then it converts to normalized device coordinates (NDC). The former is a common operation in projective geometry, while the latter is OpenGL arcana, an implementation detail.

To give us finer-grained control over these operations, we'll separate projection matrix into two matrices Persp and NDC:

\[ Proj = NDC \times Persp \]

Our intrinsic camera matrix describes a perspective projection, so it will be the key to the Persp matrix. For the NDC matrix, we'll (ab)use OpenGL's glOrtho routine.

Step 1: Projective Transform

Our 3x3 intrinsic camera matrix K needs two modifications before it's ready to use in OpenGL. First, for proper clipping, the (3,3) element of K must be -1. OpenGL's camera looks down the negative z-axis, so if $K_{33}$ is positive, vertices in front of the camera will have a negative w coordinate after projection. In principle, this is okay, but because of how OpenGL performs clipping, all of these points will be clipped.

If $K_{33}$ isn't -1, your intrinsic and extrinsic matrices need some modifications. Getting the camera decomposition right isn't trivial, so I'll refer the reader to my earlier article on camera decomposition, which will walk you through the steps. Part of the result will be the negation of the third column of the intrinsic matrix, so you'll see those elements negated below.

\[ K = \left( \begin{array}{ccc} \alpha & s & -x_0 \\ 0 & \beta & -y_0 \\ 0 & 0 & -1 \end{array} \right) \]

For the second modification, we need to prevent losing Z-depth information, so we'll add an extra row and column to the intrinsic matrix.

\[ Persp = \left( \begin{array}{cccc} \alpha & s & -x_0 & 0 \\ 0 & \beta & -y_0 & 0 \\ 0 & 0 & A & B \\ 0 & 0 & -1 & 0 \end{array} \right) \]

where

\[ \begin{align} A &= near + far \\ B &= near * far \end{align} \]

The new third row preserve the ordering of Z-values while mapping -near and -far onto themselves (after normalizing by w, proof left as an exercise). The result is that points between the clipping planes remain between clipping planes after multiplication by Persp.

Step 2: Transform to NDC

The NDC matrix is (perhaps surprisingly) provided by glOrtho. The Persp matrix converts a frustum-shaped space into a cuboid-shaped shape, while glOrtho converts the cuboid space to normalized device coordinates. A call to glOrtho(left, right, bottom, top, near, far) constructs the matrix:

\[ \text{glOrtho} = \left( \begin{array}{cccc} \frac{2}{right - left} & 0 & 0 & t_x \\ 0 & \frac{2}{top - bottom} & 0 & t_y \\ 0 & 0 & -\frac{2}{far - near} & t_z \\ 0 & 0 & 0 & 1 \end{array} \right) \]

where

\[ \begin{align} t_x &= -\frac{right + left}{right - left} \\ t_y &= -\frac{top + bottom}{top - bottom} \\ t_z &= -\frac{far + near}{far - near} \end{align} \]

When calling glOrtho, the near and far parameters should be the same as those used to compute A and B above. The choice of top, bottom, left, and right clipping planes correspond to the dimensions of the original image and the coordinate conventions used during calibration. For example, if your camera was calibrated from an image with dimensions $W \times H$ and its origin at the top-left, your OpenGL 2.1 code would be

glLoadIdentity();
glOrtho(0, W, H, 0, near, far);
glMultMatrix(persp);

Note that H is used as the "bottom" parameter and 0 is the "top," indicating a y-downward axis convention.

If you calibrated using a coordinate system with the y-axis pointing upward and the origin at the center of the image,

glLoadIdentity();
glOrtho(-W/2, W/2, -H/2, H/2, near, far);
glMultMatrix(persp);

Note that there is a strong relationship between the glOrtho parameters and the perspective matrix. For example, shifting the viewing volume left by X is equivalent to shifting the principal point right by X. Doubling $\alpha$ is equivalent to dividing left and right by two. This is the same relationship that exists in a pinhole camera between the camera's geometry and the geometry of its film--shifting the pinhole right is equivalent to shifting the film left; doubling the focal length is equivalent to halving the dimensions of the film. Clearly the two-matrix representation of projection is redundant, but keeping these matrices separate allows us to maintain the logical separation between the camera geometry and the image geometry.

Equivalence to glFrustum

We can show that the two-matrix approach above reduces to a single call to glFrustum when $\alpha$ and $\beta$ are set to near and $s$, $x_0$ and $y_0$ are zero. The resulting matrix is:

\[ \begin{align} Proj &= NDC * Persp \\[1.5em] &= \left( \begin{array}{cccc} \frac{2}{right - left} & 0 & 0 & t_x \\ 0 & \frac{2}{top - bottom} & 0 & t_y \\ 0 & 0 & -\frac{2}{far - near} & t_z \\ 0 & 0 & 0 & 1 \end{array} \right) * \left( \begin{array}{cccc} near & 0 & 0 & 0 \\ 0 & near & 0 & 0 \\ 0 & 0 & A & B \\ 0 & 0 & -1 & 0 \end{array} \right) \\[1.5em] &= \left( \begin{array}{cccc} \frac{2 near}{right - left} & 0 & A' & 0 \\ 0 & \frac{2 near}{top - bottom} & B' & 0 \\ 0 & 0 & C' & D' \\ 0 & 0 & -1 & 0 \end{array} \right) \end{align} \]

where

\[ \begin{align} A' &= \frac{right + left}{right - left} \\ B' &= \frac{top + bottom}{top - bottom} \\ C' &= -\frac{far + near}{far - near} \\ D' &= -\frac{2 \; far \; near}{far - near} \\ \end{align} \]

This is equivalent to the matrix produced by glFrustum.

By tweaking the frame bounds we can relax the constraints imposed above. We can implement focal lengths other than near by scaling the frame:

\[ \begin{align} left' &= \left( \frac{near}{\alpha} \right) left \\ right' &= \left( \frac{near}{\alpha} \right) right \\ top' &= \left( \frac{near}{\beta} \right) top \\ bottom' &= \left( \frac{near}{\beta} \right) bottom \end{align} \]

Non-zero principal point offsets are achieved by shifting the frame window:

\[ \begin{align} left'' &= left' - x_0 \\ right'' &= right' - x_0 \\ top'' &= top' - y_0 \\ bottom'' &= bottom' - y_0 \end{align} \]

Thus, with a little massaging, glFrustum can simulate a general intrinsic camera matrix with zero axis skew.

The Extrinsic Matrix

The extrinsic matrix can be used as the modelview matrix without modification, just convert it to a 4x4 matrix by adding an extra row of (0,0,0,1), and pass it to glLoadMatrix or send it to your shader. If lighting or back-face culling are acting strangely, it's likely that your rotation matrix has a determinant of -1. This results in the geometry rendering in the right place, but with normal-vectors reversed so your scene is inside-out. The previous article on camera decomposition should help you prevent this.

Alternatively, you can convert your rotation matrix to axis-angle form and use glRotate. Remember that the fourth column of the extrinsic matrix is the translation after rotating, so your call to glTranslate should come before glRotate. Check out this previous article for a longer discussion of the extrinsic matrix, including how to it with glLookAt.

Conclusion

We've seen two different ways to simulate a calibrated camera in OpenGL, one using glFrustum and one using the intrinsic camera matrix directly. If you need to implement radial distortion, it should be possible with a vertex shader, but you'll probably want a high poly count so the curved distortions appear smooth--does anyone have experience with this? In a future article, I'll cover how to accomplish stereo and head-tracked rendering using simple modifications to your intrinsic camera parameters.

Dissecting the Camera Matrix, Part 2: The Extrinsic Matrix

2012-08-22T00:00:00-07:00

Welcome to the third post in the series "The Perspecive Camera - An Interactive Tour." In the last post, we learned how to decompose the camera matrix into a product of intrinsic and extrinsic matrices. In the next two posts, we'll explore the extrinsic and intrinsic matrices in greater detail. First we'll explore various ways of looking at the extrinsic matrix, with an interactive demo at the end.

The Extrinsic Camera Matrix

The camera's extrinsic matrix describes the camera's location in the world, and what direction it's pointing. Those familiar with OpenGL know this as the "view matrix" (or rolled into the "modelview matrix"). It has two components: a rotation matrix, R, and a translation vector t, but as we'll soon see, these don't exactly correspond to the camera's rotation and translation. First we'll examine the parts of the extrinsic matrix, and later we'll look at alternative ways of describing the camera's pose that are more intuitive.

The extrinsic matrix takes the form of a rigid transformation matrix: a 3x3 rotation matrix in the left-block, and 3x1 translation column-vector in the right:

\[ [ R \, |\, \boldsymbol{t}] = \left[ \begin{array}{ccc|c} r_{1,1} & r_{1,2} & r_{1,3} & t_1 \\ r_{2,1} & r_{2,2} & r_{2,3} & t_2 \\ r_{3,1} & r_{3,2} & r_{3,3} & t_3 \\ \end{array} \right] \]

It's common to see a version of this matrix with extra row of (0,0,0,1) added to the bottom. This makes the matrix square, which allows us to further decompose this matrix into a rotation followed by translation:

\[ \begin{align} \left [ \begin{array}{c|c} R & \boldsymbol{t} \\ \hline \boldsymbol{0} & 1 \end{array} \right ] &= \left [ \begin{array}{c|c} I & \boldsymbol{t} \\ \hline \boldsymbol{0} & 1 \end{array} \right ] \times \left [ \begin{array}{c|c} R & \boldsymbol{0} \\ \hline \boldsymbol{0} & 1 \end{array} \right ] \\ &= \left[ \begin{array}{ccc|c} 1 & 0 & 0 & t_1 \\ 0 & 1 & 0 & t_2 \\ 0 & 0 & 1 & t_3 \\ \hline 0 & 0 & 0 & 1 \end{array} \right] \times \left[ \begin{array}{ccc|c} r_{1,1} & r_{1,2} & r_{1,3} & 0 \\ r_{2,1} & r_{2,2} & r_{2,3} & 0 \\ r_{3,1} & r_{3,2} & r_{3,3} & 0 \\ \hline 0 & 0 & 0 & 1 \end{array} \right] \end{align} \]

This matrix describes how to transform points in world coordinates to camera coordinates. The vector t can be interpreted as the position of the world origin in camera coordinates, and the columns of R represent represent the directions of the world-axes in camera coordinates.

The important thing to remember about the extrinsic matrix is that it describes how the world is transformed relative to the camera. This is often counter-intuitive, because we usually want to specify how the camera is transformed relative to the world. Next, we'll examine two alternative ways to describe the camera's extrinsic parameters that are more intuitive and how to convert them into the form of an extrinsic matrix.

Building the Extrinsic Matrix from Camera Pose

It's often more natural to specify the camera's pose directly rather than specifying how world points should transform to camera coordinates. Luckily, building an extrinsic camera matrix this way is easy: just build a rigid transformation matrix that describes the camera's pose and then take it's inverse.

Let C be a column vector describing the location of the camera-center in world coordinates, and let $R_c$ be the rotation matrix describing the camera's orientation with respect to the world coordinate axes. The transformation matrix that describes the camera's pose is then $[R_c \,|\, C ]$. Like before, we make the matrix square by adding an extra row of (0,0,0,1). Then the extrinsic matrix is obtained by inverting the camera's pose matrix:

\begin{align} \left[ \begin{array}{c|c} R & \boldsymbol{t} \\ \hline \boldsymbol{0} & 1 \\ \end{array} \right] &= \left[ \begin{array}{c|c} R_c & C \\ \hline \boldsymbol{0} & 1 \\ \end{array} \right]^{-1} \\ &= \left[ \left[ \begin{array}{c|c} I & C \\ \hline \boldsymbol{0} & 1 \\ \end{array} \right] \left[ \begin{array}{c|c} R_c & 0 \\ \hline \boldsymbol{0} & 1 \\ \end{array} \right] \right]^{-1} & \text{(decomposing rigid transform)} \\ &= \left[ \begin{array}{c|c} R_c & 0 \\ \hline \boldsymbol{0} & 1 \\ \end{array} \right]^{-1} \left[ \begin{array}{c|c} I & C \\ \hline \boldsymbol{0} & 1 \\ \end{array} \right]^{-1} & \text{(distributing the inverse)}\\ &= \left[ \begin{array}{c|c} R_c^T & 0 \\ \hline \boldsymbol{0} & 1 \\ \end{array} \right] \left[ \begin{array}{c|c} I & -C \\ \hline \boldsymbol{0} & 1 \\ \end{array} \right] & \text{(applying the inverse)}\\ &= \left[ \begin{array}{c|c} R_c^T & -R_c^TC \\ \hline \boldsymbol{0} & 1 \\ \end{array} \right] & \text{(matrix multiplication)} \end{align}

When applying the inverse, we use the fact that the inverse of a rotation matrix is it's transpose, and inverting a translation matrix simply negates the translation vector. Thus, we see that the relationship between the extrinsic matrix parameters and the camera's pose is straightforward:

\[ \begin{align} R &= R_c^T \\ \boldsymbol{t} &= -RC \end{align} \]

Some texts write the extrinsic matrix substituting -RC for t, which mixes a world transform (R) and camera transform notation (C).

The "Look-At" Camera

Readers familiar with OpenGL might prefer a third way of specifying the camera's pose using (a) the camera's position, (b) what it's looking at, and (c) the "up" direction. In legacy OpenGL, this is accomplished by the gluLookAt() function, so we'll call this the "look-at" camera. Let C be the camera center, p be the target point, and u be up-direction. The algorithm for computing the rotation matrix is (paraphrased from the OpenGL documentation):

Compute L = p - C.
Normalize L.
Compute s = L x u. (cross product)
Normalize s.
Compute u' = s x L.

The extrinsic rotation matrix is then given by:

\[ R = \left[ \begin{array}{ccc} s_1 & s_2 & s_3 \\ u_1' & u_2' & u_3' \\ -L_1 & -L_2 & -L_3 \end{array} \right] \]

(Updated May 21, 2014 -- transposed matrix)

You can get the translation vector the same way as before, t = -RC.

Try it out!

Below is an interactive demonstration of the three different ways of parameterizing a camera's extrinsic parameters. Note how the camera moves differently as you switch between the three parameterizations.

This requires a WebGL-enabled browser with Javascript enabled.

Javascript is required for this demo.

Scene

Image

Left: scene with camera and viewing volume. Virtual image plane is shown in yellow. Right: camera's image.

Extrinsic (World)
Extr. (Camera)
Extr. ("Look-at")
Intrinsic

$\boldsymbol{t}_x$

$\boldsymbol{t}_y$

$\boldsymbol{t}_z$

x-Rotation

y-Rotation

z-Rotation

Adjust extrinsic parameters above.

This is a "world-centric" parameterization. These parameters describe how the world changes relative to the camera. These parameters correspond directly to entries in the extrinsic camera matrix.

As you adjust these parameters, note how the camera moves in the world (left pane) and contrast with the "camera-centric" parameterization:

Rotating affects the camera's position (the blue box).
The direction of camera motion depends on the current rotation.
Positive rotations move the camera clockwise (or equivalently, rotate the world counter-clockwise).

Also note how the image is affected (right pane):

Rotating never moves the world origin (red ball).
Changing $t_x$ always moves the spheres horizontally, regardless of rotation.
Increasing $t_z$ always moves the camera closer to the world origin.

$C_x$

$C_y$

$C_z$

x-Rotation

y-Rotation

z-Rotation

Adjust extrinsic parameters above.

This is a "camera-centric" parameterization, which describes how the camera changes relative to the world. These parameters correspond to elements of the inverse extrinsic camera matrix.

As you adjust these parameters, note how the camera moves in the world (left pane) and contrast with the "world-centric" parameterization:

Rotation occurs about the camera's position (the blue box).
The direction of camera motion is independent of the current rotation.
A positive rotation rotates the camera counter-clockwise (or equivalently, rotates the world clockwise).
Increasing $C_y$ always moves the camera toward the sky, regardless of rotation.

Also note how the image is affected (right pane):

Rotating around y moves both spheres horizontally.
With different rotations, changing $C_x$ moves the spheres in different directions.

$C_x$

$C_y$

$C_z$

$p_x$

$p_y$

$p_z$

Adjust extrinsic parameters above.

This is a "look-at" parameterization, which describes the camera's orientation in terms of what it is looking at. Adjust $p_x$, $p_y$, and $p_z$ to change where the camera is looking (orange dot). The up vector is fixed at (0,1,0)'. Notice that moving the camera center, *C*, causes the camera to rotate.

Focal Length

Axis Skew

$x_0$

$y_0$

Adjust intrinsic parameters above. As you adjust these parameters, observe how the viewing volume changes in the left pane:

Changing the focal length moves the yellow focal plane, which chainges the field-of-view angle of the viewing volume.
Changing the principal point affects where the green center-line intersects the focal plane.
Setting skew to non-zero causes the focal plane to be non-rectangular

Intrinsic parameters result in 2D transformations only; the depth of objects are ignored. To see this, observe how the image in the right pane is affected by changing intrinsic parameters:

Changing the focal length scales the near sphere and the far sphere equally.
Changing the principal point has no affect on parallax.
No combination of intrinsic parameters will reveal occluded parts of an object.

Conclusion

We've just explored three different ways of parameterizing a camera's extrinsic state. Which parameterization you prefer to use will depend on your application. If you're writing a Wolfenstein-style FPS, you might like the world-centric parameterization, because moving along (t_z) always corresponds to walking forward. Or you might be interpolating a camera through waypoints in your scene, in which case, the camera-centric parameterization is preferred, since you can specify the position of your camera directly. If you aren't sure which you prefer, play with the tool above and decide which approach feels the most natural.

Join us next time when we explore the intrinsic matrix, and we'll learn why hidden parts of your scene can never be revealed by zooming your camera. See you then!

Dissecting the Camera Matrix, Part 1: Extrinsic/Intrinsic Decomposition

2012-08-14T00:00:00-07:00

Not this kind of decomposition.

Credit: Daniel Hollister

So, you've been playing around a new computer vision library, and you've managed to calibrate your camera... now what do you do with it? It would be a lot more useful if you could get at the camera's position or find out it's field-of view. You crack open your trusty copy of Hartley and Zisserman, which tells you how to decompose your camera into an intrinsic and extrinsic matrix --- great! But when you look at the results, something isn't quite right. Maybe your rotation matrix has a determinant of -1, causing your matrix-to-quaternion function to barf. Maybe your focal-length is negative, and you can't understand why. Maybe your translation vector mistakenly claims that the world origin in behind the camera. Or worst of all, everything looks fine, but when you plug it into OpenGL, you just don't see anything.

Today we'll cover the process of decomposing a camera matrix into intrinsic and extrinsic matrices, and we'll try to untangle the issues that can crop-up with different coordinate conventions. In later articles, we'll study the intrinsic and extrinsic matrices in more detail, and I'll cover how to convert them into a form usable by OpenGL.

This is the second article in the series, "The Perspective Camera, an Interactive Tour." To read other article in this series, head over to the introduction page.

Prologue: Getting a Camera Matrix

I'll assume you've already obtained your camera matrix beforehand, but if you're looking for help with camera calibration, I recommend looking into the Camera Calibration Toolbox for Matlab. OpenCV also seems to have some useful routines for automatic camera calibration from a sequences of chessboard images, although I haven't personally used them. As usual, Hartley and Zisserman's has a nice treatment of the topic.

Cut 'em Up: Camera Decomposition [?]

To start, we'll assume your camera matrix is 3x4, which transforms homogeneous 3D world coordinates to homogeneous 2D image coordinates. Following Hartley and Zisserman, we'll denote the matrix as P, and occasionally it will be useful to use the block-form:

\[ P = [M \,| -MC] \]

where M is an invertible 3x3 matrix, and C is a column-vector representing the camera's position in world coordinates. Some calibration software provides a 4x4 matrix, which adds an extra row to preserve the z-coordinate. In this case, just drop the third row to get a 3x4 matrix.

The camera matrix by itself is useful for projecting 3D points into 2D, but it has several drawbacks:

It doesn't tell you where the camera's pose.
It doesn't tell you about the camera's internal geometry.
Specular lighting isn't possible, since you can't get surface normals in camera coordinates.

To address these drawbacks, a camera matrix can be decomposed into the product of two matrices: an intrinsic matrix, K, and an extrinsic matrix, $[R \, |\, -RC ]$:

\[P = K [R \,| -RC ] \]

The matrix K is a 3x3 upper-triangular matrix that describes the camera's internal parameters like focal length. R is a 3x3 rotation matrix whose columns are the directions of the world axes in the camera's reference frame. The vector C is the camera center in world coordinates; the vector t = -RC gives the position of the world origin in camera coordinates. We'll study each of these matrices in more detail in later articles, today we'll just discuss how to get them from P.

Recovering the camera center, C, is straightforward. Note that the last column of P is -MC, so just left-multiply it by $-M^{-1}$.

Before You RQ-ze Me... [?]

To recover R and K, we note that R is orthogonal by virtue of being a rotation matrix, and K is upper-triangular. Any full-rank matrix can be decomposed into the product of an upper-triangular matrix and an orthogonal matrix by using RQ-decomposition. Unfortunately RQ-decomposition isn't available in many libraries including Matlab, but luckily, it's friend QR-decomposition usually is. Solem's vision blog has a nice article implementing the missing function using a few matrix flips; here's a Matlab version (thanks to Solem for letting me repost this!):

function [R Q] = rq(M)
    [Q,R] = qr(flipud(M)')
    R = flipud(R');
    R = fliplr(R);

    Q = Q';   
    Q = flipud(Q);

Easy!

I'm seeing double... FOUR decompositions! [?]

There's only one problem: the result of RQ-decomposition isn't unique. To see this, try negating any column of K and the corresponding row of R: the resulting camera matrix is unchanged. Most people simply force the diagonal elements of K to be positive, which is the correct approach if two conditions are true:

your image's X/Y axes point in the same direction as your camera's X/Y axes.
your camera looks in the positive-z direction.

Solem's blog elegantly gives us positive diagonal entries in three lines of code:

# make diagonal of K positive
T = diag(sign(diag(K)));

K = K * T;
R = T * R; # (T is its own inverse)

In practice, the camera and image axes won't agree, and the diagonal elements of K shouldn't be positive. Forcing them to be positive can result in nasty side-effect, including:

The objects appear on the wrong side of the camera.
The rotation matrix has a determinant of -1 instead of 1.
Incorrect specular lighting.
Visible geometry won't render due to a having negative w coordinate.

Hartley and Zisserman's coordinate conventions. Note that camera and image x-axes point left when viewed from the camera's POV.

From "Multiple View Geometry in Computer Vision"

In this case, you've got some fixing to do. Start by making sure that your camera and world coordinates both have the same handedness. Then take note of the axis conventions you used when you calibrated your camera. What direction did the image y-axis point, up or down? The x-axis? Now consider your camera's coordinate axes. Does your camera look down the negative-z axis (OpenGL-style)? Positive-z (like Hartley and Zisserman)? Does the x-axis point left or right? The y-axis? Okay, okay, you get the idea.

Starting from an all-positive diagonal, follow these four steps:

If the image x-axis and camera x-axis point in opposite directions, negate the first column of K and the first row of R.
If the image y-axis and camera y-axis point in opposite directions, negate the second column of K and the second row of R.
If the camera looks down the negative-z axis, negate the third column of K. Leave R unchanged. Edit: Also negate the third column of R.
If the determinant of R is -1, negate it.

Note that each of these steps leaves the combined camera matrix unchanged. The last step is equivalent to multiplying the entire camera matrix, P, by -1. Since P operates on homogeneous coordinates, multiplying it by any constant has no effect.

Regarding step 3, Hartley and Zisserman's camera looks down the positive-z direction, but in some real-world systems, (e.g. OpenGL) the camera looks down the negative-z axis. This allows the x and y axis to point right and up, resulting in a coordinate system that feels natural while still being right-handed. Step 3 above corrects for this, by causing w to be positive when z is negative. You may balk at the fact that $K_{3,3}$ is negative, but OpenGL requires this for proper clipping. We'll discuss OpenGL more in a future article.

You can double-check the result by inspecting the vector $\mathbf{t} = -RC$, which is the location of the world origin in camera coordinates. If everything is correct, the sign of $t_x, t_y, t_z$ should reflect where the world origin appears in the camera (left/right of center, above/below center, in front/behind camera, respectively).

Who Flipped my Axes?

Until now, our discussion of 2D coordinate conventions have referred to the coordinates used during calibration. If your application uses a different 2D coordinate convention, you'll need to transform K using 2D translation and reflection.

For example, consider a camera matrix that was calibrated with the origin in the top-left and the y-axis pointing downward, but you prefer a bottom-left origin with the y-axis pointing upward. To convert, you'll first negate the image y-coordinate and then translate upward by the image height, h. The resulting intrinsic matrix K' is given by:

\[ K' = \begin{bmatrix}1 & 0 & 0 \\ 0 & 1 & h \\ 0 & 0 & 1 \end{bmatrix} \times \begin{bmatrix}1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \; K \]

Summary

The procedure above should give you a correct camera decomposition regardless of the coordinate conventions you use. I've tested it in a handful of scenarios in my own research, and it has worked so far. Of course, if you have any problems with this approach, I'm eager to hear about them, just leave a message in the comments, or email me.

In the next article, we'll investigate the extrinsic matrix in more detail, with interactive demos.

The Perspective Camera - An Interactive Tour

2012-08-13T00:00:00-07:00

The "1st and Ten" system, one of the first successful applications of augmented reality in sports.

howstuffworks.com

On September 27, 1998 a yellow line appeared across the gridiron during an otherwise ordinary football game between the Cincinnati Bengals and the Baltimore Ravens. It had been added by a computer that analyzed the camera's position and the shape of the ground in real-time in order to overlay thin yellow strip onto the field. The line marked marked the position of the next first-down, but it also marked the beginning of a new era of computer vision in live sports, from computerized pitch analysis in baseball to automatic line-refs in tennis.

In 2006, researchers from Microsoft and the University of Washington automatically constructed a 3D tour of the Trevi Fountain in Rome using only images obtained by searching Flickr for "trevi AND rome."

In 2007, Carnegie Mellon PhD student Johnny Lee hacked a $40 Nintento Wii-mote into an impressive head-tracking virtual reality interface.

In 2010, Microsoft released the Kinect, a consumer stereo camera that rivaled the functionality of competitors sold for ten times its price, which continues to disrupt the worlds of both gaming and computer vision.

What do all of these technologies have in common? They all require a precise understanding of how the pixels in a 2D image relate to the 3D world they represent. In other words, they all hinge on a strong camera model. This is the first in a series of articles that explores one of the most important camera models in computer vision: the pinhole perspective camera. We'll start by deconstructing the perspective camera to show how each of its parts affect the rendering of a 3D scene. Next, we'll describe how to import your calibrated camera into OpenGL to render virtual objects into a real image. Finally, we'll show how to use your perspective camera to implement rendering in a virtual-reality system, complete with stereo rendering and head-tracking.

These articles won't cover everything. This book does.

This series of articles is intended as a supplement to a more rigorous treatment available in several excellent textbooks. I will focus on providing what textbooks generally don't provide: interactive demos, runnable code, and practical advice on implementation. I will assume the reader has a basic understanding of 3D graphics and OpenGL, as well as some background in computer vision. In other words, if you've never heard of homogeneous coordinates or a camera matrix, you might want to start with an introductory book on computer vision. I highly recommend Multiple View Geometry in Computer Vision by Hartley and Zisserman, from which I borrow mathematical notation and conventions (e.g. column vectors, right-handed coordinates, etc.)

Technical Requirements

Equations in these articles are typeset using MathJax, which won't display if you've disabled JavaScript or are using a browser that is woefully out of date (sorry IE 5 users). If everything is working, you should see a matrix below:

\[ \left ( \begin{array}{c c c} a^2 & b^2 & c^2 \\ d^2 & e^2 & f^2 \\ g^2 & h^2 & i^2 \end{array} \right ) \]

3D interactive demos are provided by three.js, which also needs JavaScript and prefers a browser that supports WebGL ( Google Chrome works great, as does the latest version of Firefox). Older browsers will render using canvas, which will run slowly, look ugly, and hurl vicious insults at you. But it should work. If you see two spheres below, you're in business.

3D demo. Drag to move camera.

Below is a list of all the articles in this series. New articles will be added to this list as I post them, so you can always return to this page for an up-to-date listing.

Dissecting the Camera Matrix, Part 1: Intrinsic/Extrinsic Decomposition
Dissecting the Camera Matrix, Part 2: The Extrinsic Matrix
Simulating your Calibrated Camera in OpenGL - part 1, part 2
Dissecting the Camera Matrix, Part 3: The Intrinsic Matrix
Stereo Rendering using a Calibrated Camera
Head-tracked Display using a Calibrated Camera

Happy reading!

Sightations

Q & A: Recovering pose of a calibrated camera - Algebraic vs. Geometric method?

Compiling ELSD (Ellipse and Line Segment Detector) on OS X

Dissecting the Camera Matrix, Part 3: The Intrinsic Matrix

The Pinhole Camera

Focal Length, \(f_x\), \(f_y\)

Principal Point Offset, \(x_0\), \(y_0\)

Axis Skew, \(s\)

Other Geometric Properties

Focal Length - From Pixels to World Units

The Camera Frustum - A Pinhole Camera Made Simple

Intrinsic parameters as 2D transformations

Demo

Dissecting the Camera Matrix, A Summary

Calibrated Cameras and gluPerspective

Decomposing gluPerspective

glFrustum vs. gluPerpsective

Calibrated Cameras in OpenGL without glFrustum

glFrustum: Two Transforms in One

Step 1: Projective Transform

Step 2: Transform to NDC

Equivalence to glFrustum

The Extrinsic Matrix

Conclusion

Dissecting the Camera Matrix, Part 2: The Extrinsic Matrix

The Extrinsic Camera Matrix

Building the Extrinsic Matrix from Camera Pose

The "Look-At" Camera

Try it out!

Conclusion

Dissecting the Camera Matrix, Part 1: Extrinsic/Intrinsic Decomposition

Prologue: Getting a Camera Matrix

Cut 'em Up: Camera Decomposition [?]

Before You RQ-ze Me... [?]

I'm seeing double... FOUR decompositions! [?]

Who Flipped my Axes?

Summary

The Perspective Camera - An Interactive Tour

Technical Requirements

Table of Contents