Implementation Notes: Runtime Environment Map Filtering for Image Based Lighting

In this post I will cover implementing and optimizing runtime environment map filtering for image based lighting (IBL). Before we get started, this approach is covered in depth in Real Shading in Unreal Engine 4 and Moving Frostbite to Physically Based Rendering, you should really check them out if you want more background and information than I provide here. Doing runtime filtering of environment maps has some benefits over offline filtered maps (using something like cubemapgen). One is that you can have more dynamic elements in your cubemaps like dynamic weather and time of day. At GDC this year Steve McAuley mentioned in Rendering the World of Far Cry 4 that they perform environment map re-lighting and composite a dynamic sky box prior to filtering. Additionally by filtering at runtime you might be able to use smaller textures (r10g11b10 vs r16b16g16a16) since you can use your current camera exposure information to pre-expose the contents into a more suitable range.

Overview

Image Based Lighting involves using the contents of an image (in this case an enviornment map) as a source of light. In the brute force implementation every pixel is considered a light source and is evaluated using a bidirectional reflectance distribution function (BRDF). The full integral that represents this is:

Numerical integration of environment lighting

The variant of filtering that we will be implementing is the Split-sum approach taken by Karis. This approach approximates the integral above by integrating the lighting contribution and the rest of the BRDF separately. By doing this the integral can be approximated using a cubemap and a tex2D lookup.

Split-sum approximation

Initial implementation

We will start with the implementation provided in Real Shading in Unreal 4 since it is well documented and complete, I will be referring to this as the reference implementation. To assist in reducing the number of samples the authors of the mentioned papers utilized importance sampling to redistribute the samples into the important parts of the BRDF. Importance sampling is a very interesting topic so if you want to learn more about the derivation check out Notes on importance sampling. For the BRDF integration we can stop with this implementation since this won’t be integrated every frame. Also, I’m choosing to ignore the extensions that the frostbite guys did to add the Disney Diffuse BRDF term into this texture but this can apply to that too. The environment map filtering is where we want to focus our optimization efforts.

The first step is to render out a new cubemap to filter. For comparison purposes this article uses the cubemap from the unreal presentation, found here. However; this is the point where you could do some cool stuff like re-lighting the environment map like the Far Cry guys do. As a note, some people use a geometry shader here to render to all faces of a cube in a single pass, for this example I just set up the camera for the appropriate face and rendered them out one at a time.

Foreach Face in Cube Faces
Setup camera
Render envmap

Next we implement the split-sum reference implementation provided by Karis. Initially the sample count is kept as provided in the course notes but that is going to be an area we focus on when we optimize.

float fTotalWeight = 0.0f;
const uint iNumSamples = 1024;
for (uint i = 0; i < iNumSamples; i++)
{
float2 vXi = Hammersley(i, iNumSamples);

float3 vHalf = ImportanceSampleGGX(vXi, fRoughness, vNormal);
float3 vLight = 2.0f * dot(vView, vHalf) * vHalf - vView;

float fNdotL = saturate(dot(vNormal, vLight));
if (fNdotL > 0.0f)
{
vPrefilteredColor += EnvironmentCubemapTexture.SampleLevel(LinearSamplerWrap, vLight, 0.0f).rgb * fNdotL;
fTotalWeight += fNdotL;
}
}
return vPrefilteredColor / fTotalWeight;

The code side looks something like this.

Foreach Face in Cube Faces
Setup camera
Bind Envmap
For i = 0; i < MipLevels; ++i
Execute filtering

To verify that I didn’t screw anything up I like to reproduce and compare at least one of the diagrams.

Comparison with course notes from Real Shading in Unreal Engine 4.

There is some slight darkening along the rim of the middle gloss spheres due to my angle of view being slightly off, a difference in exposure, and a different tonemapping operator is used but otherwise we are at a good starting point.

Areas for optimization

I’m not as crazy as some people when it comes to optimization but here are some questions I ask myself when I’m looking at a bit of code like this.

• What can be done to reduce the amount of work done globally?
• What can be done to reduce the number of samples?
• What can be done to reduce the amount of work per sample?
• How can use any assumptions to our advantage?
• How can we make sure that we are getting the most out of every sample?
• What parts of this can scale?

Luckily the Frostbite guys have already looked into a few of the points above so we can pull from the implementation they provide.

Reduce the amount of work done globally

This one is easy and basically just a footnote in the Moving Frostbite to pbr. Since the first mip is essentially our mirror reflection we can just skip processing that mip level all together and just copy the generated environment map into it. Note that we still keep the generated texture around and copy into our final texture, this becomes more important in the next step. The code now looks like:

Foreach Face in Cube Faces
Setup camera
Bind Envmap
Copy face into mip 0
For i = 1; i < MipLevels; ++i
Execute filtering

Reduce the number of samples

So 1024 samples is certainly more than we want to be doing per pixel in our shader. I would say that 16-32 is a more reasonable number of samples to take. Let’s see what happens if we just drop the number of samples in our shader:

32 Samples

It’s obvious as the roughness increases that 32 samples isn’t going to be enough with out some help. This is where filtered importance sampling comes into play. I’m not an expert on Importance Sampling in general but it was pretty easy to follow along with FIS implementations in the Frostbite course notes. The basic idea is that, because of importance sampling, less samples are taken in certain directions due to the distribution of the brdf. To combat the undersampling the mip chain of the source environment map is sampled. The mip is determined by the ratio between the solid angle that the sample represents (OmegaS) and the solid angle that a single texel is at our highest resolution (OmegaP) covers. Implementing this in the example means that the environment map generation gets the second step of generating mips, additionally for each sample we have to compute OmegaS and OmegaP then finally the required mip.

Foreach Face in Cube Faces
Setup camera
Render envmap

GenerateMips envmap
float fTotalWeight = 0.0f;
const uint iNumSamples = 32;
for (uint i = 0; i < iNumSamples; i++)
{
float2 vXi = Hammersley(i, iNumSamples);

float3 vHalf = ImportanceSampleGGX(vXi, fRoughness, vNormal);
float3 vLight = 2.0f * dot(vView, vHalf) * vHalf - vView;

float fNdotL = saturate(dot(vNormal, vLight));
if (fNdotL > 0.0f)
{
// Vectors to evaluate pdf
float fNdotH = saturate(dot(vNormal, vHalf));
float fVdotH = saturate(dot(vView, vHalf));

// Probability Distribution Function
float fPdf = D_GGX(fNdotH) * fNdotH / (4.0f * fVdotH);

// Solid angle represented by this sample
float fOmegaS = 1.0 / (iNumSamples * fPdf);

// Solid angle covered by 1 pixel with 6 faces that are EnvMapSize X EnvMapSize
float fOmegaP = 4.0 * fPI / (6.0 * EnvMapSize * EnvMapSize);
// Original paper suggest biasing the mip to improve the results
float fMipBias = 1.0f;
float fMipLevel = max(0.5 * log2(fOmegaS / fOmegaP) + fMipBias, 0.0f);
vPrefilteredColor += EnvironmentCubemapTexture.SampleLevel(LinearSamplerWrap, vLight, fMipLevel).rgb * fNdotL;
fTotalWeight += fNdotL;
}
}
return vPrefilteredColor / fTotalWeight;

This gives us a much better result for our high roughness spheres, although it is still a little chunky.

32 Samples With Filtered Importance Sampling

Precompute as much as possible

The previous change had the unfortunate side effect of adding lots of math to the inner loop. So lets focus the attention there and see if anything can be pre-computed. First, it should be easy to see that we don’t need to be computing the Hammersley sequence for every single texel in the output, they are the same for every pixel. With this knowledge let’s see how the sequence random value is used:

float3 vHalf = ImportanceSampleGGX(vXi, fRoughness, vNormal);

...

float3 ImportanceSampleGGX(float2 vXi, float fRoughness, float3 vNoral)
{
// Compute the local half vector
float fA = fRoughness * fRoughness;
float fPhi = 2.0f * fPI * vXi.x;
float fCosTheta = sqrt((1.0f - vXi.y) / (1.0f + (fA*fA - 1.0f) * vXi.y));
float fSinTheta = sqrt(1.0f - fCosTheta * fCosTheta);
float3 vHalf;
vHalf.x = fSinTheta * cos(fPhi);
vHalf.y = fSinTheta * sin(fPhi);
vHalf.z = fCosTheta;

// Compute a tangent frame and rotate the half vector to world space
float3 vUp = abs(vNormal.z) < 0.999f ? float3(0.0f, 0.0f, 1.0f) : float3(1.0f, 0.0f, 0.0f);
float3 vTangentX = normalize(cross(vUp, vNormal));
float3 vTangentY = cross(vNormal, vTangentX);
// Tangent to world space
return vTangentX * vHalf.x + vTangentY * vHalf.y + vNormal * vHalf.z;
}

This function can be broken up into two distinct parts shown colored. The first part computes a local space half vector based off the sampling function and the roughness. The second part generates a tangent frame for the normal and rotates the half vector into it that frame. Since the first part is a function of the random value (from the hammersley sequence) and the roughness the local space sample directions can be precomputed. It should also be clear now that we are wasting lots of cycles recomputing the tangent frame for every sample. So we can float that outside of the update loop and just to be complete we can put it into a matrix and rotate the incoming local space sample directions into world space (in reality the shader compiler also noticed this and optimized the math out of the loop).

// Compute a matrix to rotate the samples
float3 vTangentY = abs(vNormal.z) < 0.999f ? float3(0.0f, 0.0f, 1.0f) : float3(1.0f, 0.0f, 0.0f);
float3 vTangentX = normalize(cross(vTangentY, vNormal));
vTangentY = cross(vNormal, vTangentX);

float3x3 mTangentToWorld = float3x3(
vTangentX,
vTangentY,
vNormal );

for (int i = 0; i < NumSamples; i++)
{
float3 vHalf = mul(vSampleDirections[i], mTangentToWorld);
float3 vLight = 2.0f * dot(vView, vHalf) * vHalf - vView;
....

The next bit of code in the shader reflects the view vector around the half vector to determine the representative light direction for the sample. Following the isotropic assumption made in the split-sum technique we can state that view == normal, and further that in local space normal == 0,0,1. So what we can do here is reflect the vector in local space and instead of rotating the half vector to world space, we can rotate the light vector into world space. There is a problem though, later in the code it assumes that we have the dot product of the half vector and the normal vector to compute the mip-level for the sample. We can once again take advantage of the isotropic assumption and realize that the dot product between two vectors in local space is the same as the dot product of those two vectors in some other space. So ndoth = vdoth = dot((0,0,1),half) or ndoth = vdoth = half.z, therefore this can be precompute per sample now too. Inspecting the code again we see that the mip selection math is based entirely on global variables and those two per sample values, so we can pre-compute that too (see a trend?). Moving all of that computation to the cpu and simply uploading it to the shader leaves us with this:

float fTotalWeight = 0.0f;
const uint iNumSamples = 32;
for (uint i = 0; i < iNumSamples; i++)
{
float3 vLight = mul(vSampleDirections[i], mTangentToWorld);

float fNdotL = saturate(dot(vNormal, vLight));
if (fNdotL > 0.0f)
{
vPrefilteredColor += EnvironmentCubemapTexture.SampleLevel(LinearSamplerWrap, vLight, fSampleMipLevels[i]).rgb * fNdotL;
fTotalWeight += fNdotL;
}
}

We are back to having less math per sample than before we added FIS and we can still do better. There is still the dot product with the light vector sitting there and the conditional. Once again leverage that we have pre-computed the light vector in local space and say that ndotl = dot((0,0,1), light) = light.z. Additionally, we can check if the weight of a sample is going to be 0 before we even upload it, this lets us just skip samples entirely that will fail the check. At this point the inner loop just rotates the the local light vector into world space and does a madd to accumulate the sample.

for (uint i = 0; i < iNumSamples; i++)
{
float3 vLight = mul(vSampleDirections[i], mTangentToWorld);
vPrefilteredColor += EnvironmentCubemapTexture.SampleLevel(LinearSamplerWrap, vLight, fSampleMipLevels[i]).rgb * fSampleWeights[i];
}

return vPrefilteredColor * fInvTotalWeight;

Comparison between original and the final optimized version.

Here are some timings for the various approaches and stages of optimizing presented, take them with a grain of salt since getting numbers on PC, especially laptops, is not super accurate.

 Device Time(ms) Size Approach Iris 6100 54.0 256×256 1024 Samples Reference (UE4) Iris 6100 2.6 256×256 32 Samples FIS Original (Frostbite) Iris 6100 1.5 256×256 32 Samples FIS Optimized Iris 6100 1.2 256×256 Varying Sample Count Iris 6100 0.5 128×128 Varying Sample Count GTX 980 9.4 256×256 1024 Samples Reference (UE4) GTX 980 0.48 256×256 32 Samples FIS Original (Frostbite) GTX 980 0.36 256×256 32 Samples FIS Optimized GTX 980 0.36 256×256 Varying Sample Count GTX 980 0.16 128×128 Varying Sample Count

Improvements

Ensure certain number of samples have weight > 0

At high roughness values nearly half of the samples produced by the Hammersley sequence result in a light vector that fails the N dot L check. I noticed a nice improvement by just doing a simple linear search to find how many sequence numbers are needed to generate the right number of valid samples.

Comparison of 32 samples where some have weight 0 (top) and 32 samples where all of them have a weight > 0 (bottom)

Varying the sample count per mip level

This is more or less an extension of the Frostbite statement that the first mip should represent a mirror reflection and therefore require only one sample. If we extend that to the entire mip chain we can simultaneously reduce the number of samples at low roughness values (when we are processing larger cube faces) and increase the number of samples at high roughness values (when the faces are smaller). In theory this will reduce the number of samples needed and result in an overall noise reduction.

Varying sample count on a 256×256 prefiltered cubemap. From mip 0->5: 1, 8, 32, 64, 128, 128

This did result in a nice quality improvement, and dropped the total number of samples taken from the environment map from ~400k to ~200k but I didn’t notice a large performance increase, in fact in some cases I saw the opposite. This is likely due to the small number of pixels requiring lots of work as you get down the mip chain and that increasing the sample count reduces the effectiveness of filtered importance sampling. This approach is still worth investigating though and may be bound by the fact that I render each face individually.

Primary direction lookup

This is straight from the frostbite presentation (See the course notes for details). Basically you modify the reflection vector based on the roughness when retrieving the values from both the environment map and the brdf texture. This makes it look quite different from the initial implementation but if we look back at the Unreal course notes we can see that it looks more like the full reference implementation:

Comparison between the Initial Reference for this article (top), the Reflection Adjusment proposed by the Frostbite team (middle), and the Original Reference in the Real Shading course notes (bottom).

float3 vReflection = 2.0f * vNormal * dot(vViewDirection, vNormal) - vViewDirection;
float fA = fRoughness * fRoughness;
vReflection = lerp(vNormal, vReflection, (1.0f - fA) * (sqrt(1.0f - fA) + fA));

Ideas

These are things that I haven’t really fleshed out yet but could potentially be a thing.

Different distribution for filtering

Up to this point we have been using the GGX distribution for the importance sampling but the we are weighting the samples by their respective N dot L terms. This leads to the workarounds in the improvement section to ignore samples that were going to fail the second tier of weighting. It seems that a new probablity distribution could be created that factors in the N dot L weighting so we don’t have to modify and renormalize the distribution later. I briefly fooled around with finding the analytical version of that distribution but I’m not a mathematica guy and ran out of time. Only after did I realize that you could do this with out having an analytical probability distribution function, and instead you can just numerically integration for a fixed roughness value and a set of psuedo-random variables. I plan on investigating this soon.

Different quasi-random sampling set

After doing some research it became apparent that the Hammersley set doesn’t do the best job when you have low sample counts. An alternative sequence may yield lower noise for the same number of samples, here is a potential one that Pixar recently published: Correlated Multi-Jittered Sampling. I tried this out and the results didn’t look exceptionally different, the one thing that did stand out is that you can pass a seed value in to generate any number of 32 sample sequences which might be good if you want to do any sort of progressive filtering of the environment map.

Progressive filtering

Since you know the total weights in the frame it would be easy to pass that data forward to the next frame and combine the results. To do this with the Hammersley sequence you would want to add a rotation into the matrix that rotates the light from local to world space, then you would just filter as usual and add the total weight and result times the total weight from the previous frame. When it comes time to invalidate the environment map you can just start clear the total weight. This may help wrangle the last bit of performance needed for previous gen platforms. Additionally you can track when you get a certain total weight and stop processing all together until you know that the contents have changed, for instance the time of day progresses past a certain threshold.

Conclusion

Hopefully this article was useful to someone out there. It would be great to hear from people if they do other things to optimize this process or have tried any of the things that I proposed.

Resources

Correlated Multi-Jittered Sampling: http://graphics.pixar.com/library/MultiJitteredSampling/paper.pdf

Hammersley Points on the Hemisphere: http://holger.dammertz.org/stuff/notes_HammersleyOnHemisphere.html

Moving Frostbite To PBR: http://www.frostbite.com/2014/11/moving-frostbite-to-pbr/

Notes on Important Sampling: http://blog.tobias-franke.eu/2014/03/30/notes_on_importance_sampling.html

GPU Based Importance Sampling: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch20.html

Rendering the world of Far Cry 4: http://www.gdcvault.com/play/1022235/Rendering-the-World-of-Far

Physically Based Shading at Disney: https://disney-animation.s3.amazonaws.com/library/s2012_pbs_disney_brdf_notes_v2.pdf

Implementation Notes: Physically Based Lens Flares

The Implementation Notes series of blog posts is an attempt to outline the process of implementing a research paper or technique from a presentation. I find that more often than not when trying to recreate others results that not all the dots are connected. It takes some time and digging to get the whole picture, this is my attempt at sharing that background work.

In this edition of Implementation Notes (actually the first edition) we will be looking at two papers relating to rendering lens flares.
Physically-Based Real-Time Lens Flare Rendering1
Practical Real-Time Lens-Flare Rendering2
Some of the foundation work from the first paper will be presented and then an implementation based on the second paper will be outlined.

Background

Cameras are comprised of many complex systems, each of which can introduce visual artifacts. If these visual artifacts are not modeled to a certain extent in our rendered images then they will produce suspiciously clean results and look artificial. However, if we introduce some of these artifacts then the images can appear more natural to the viewer. One such artifact is internal reflections, sometimes referred to as lens flares. These two papers explore modeling lens flares as the result of light reflecting off the surfaces of the optics inside of the lens, hence the “Physically-Based”.

Lens flares from an old Vivitar zoom lens

Lens flares from an old Vivitar zoom lens

The lens configuration inside most lenses is surprisingly complex and can consist of dozens of individual lens elements. As the light from the scene passes through each optical interface a portion is reflected and the rest is refracted. The reflected light then moves backwards through the optical system striking other optical interfaces along the way, the process then repeats. For every even number of bounces that occurs the reflected light is then directed back towards the image sensor; however, the light rays are no longer focused. This light represents a single lens flare, or ghost as referred to by the papers.

Optical Systems

This technique is heavily based off on the representation of the lens elements in an Optical System.  This referrers to the combination of lenses and empty spaces designed in such a way that light is focused on to an image plane. We begin with the assumption that the image plane I represents standard film (24x36mm) with an image circle diagonal of 43.3mm.

Example Optical System (USPN5835272)

The optical axis of the system is the axis that runs through the center of the lens elements and intersects the center of the image plane. Lens elements are arranged along the optical axis either directly next to each other or separated by a small amount of air. The standard description of an optical system is a table of data that defines the interfaces between air and lens elements. The data looks something like this:

 # r d n 1 30.810 7.700 1.652 2 -89.350 1.850 1.603 … … … …

Here r is the radius of curvature for the interface, d is the distance to the next interface, and n is the refractive index of the lens element. The radius can be positive, indicating a convex surface, negative, indicating a concave surface, or 0, indicating a flat surface. Using this definition of an optical system allows us to use existing optical system data. The best place to find optical system descriptions is the patent database, be warned though I don’t know the legality of using the optical design data to reproduce the lens digitally.

When reading a lens patent there are a couple things to note. The first is that there is a lot of information in the patent! Most of it you don’t need and shouldn’t care about. The main thing you are looking for is the tabular data describing the lens system. In the table there is usually a header that describes the focal length of the lens and the f-number. This represents the field of view of the lens and essentially how much light the lens can let in respectfully. If you are a real ambitious go-getter you can use this to drive the settings on your camera model.

It took me a while to find a zoom lenses that had a focal length that represented a normal game field of view (~70 degrees) and had a low f-number, meaning that the lens allows in a lot of light. So I can save you the pain patent searching and point you at a couple options:

I have found that there is some information missing when reading the patent, namely the heights of the various elements. Luckily the height that matters the most is the aperture stop and this can be calculated using the effective aperture diameter ($D$)3. The effective aperture represents what the physical aperture looks like if you were looking through the front of the lens. Since the elements in front of the physical aperture often have a magnification property it will differ from the physical aperture stop. The formula for this is based only on the focal length, $f$ and the f-number $N$:

$N=\frac{f}{D}$

So for the Nikon zoom lens we can determine that at a focal length of 28mm and an f-number of roughly 2.8 that the effective aperture diameter should be 10mm. Once we are able to cast rays through the optical system this will be used to determine the size of the physical aperture.

Quick note: On more modern lenses (1980s and onward) there are often aspherical lens elements in the optical system. Since the intent here is not to rely on the precise qualities of the optical system I chose to ignore these elements and instead treat them as spherical.

Ray tracing an Optical System

The first step towards modeling lens flares is to establish the optical system we will be evaluating. This can be validated by simply drawing (in 2D) the profile of the optical system. To perform the drawing you simply walk over the elements of the optical system, drawing each and then increment the draw cursor down the optical axis by the d value for that element. Each interface is drawn as an angular patch using the radius of curvature (r) and angle $\pm a$ given by $a = asin(\frac{h}{r})$. For the first pass I just took a guess at the various element heights making sure they didn’t intersect with the next elements. As mentioned before, we only care about the height of the aperture which will be computed. Here is what it the result should look like:

In Engine Drawing of the optical system. Red – Optical Axis, Green – Aperture Stop, Blue – Optical Interfaces defined by the tabular patent data.

Overlay of in engine representation and the patent diagram.

Next we will trace a single ray through the optical system considering only refraction, the pseudo code for this is:

OpticalInterface i = first;
while (i)
{
col = intersect(ray, i);
if (!col.valid)
break;

ray.pos += ray.dir * col.dist;
ray.dir = refract(ray.dir, col.norm, i.n1/i.n2);
i = i.next;
}

The result should look something like this:

Single ray making it’s way through the optical system

Be mindful that there can be elements with radius 0, in these cases you should just do a ray plane intersection. Once we have a single ray going through we can cast a bunch of rays to see if they all converge on the focal plane. It is a good time to mention that the patents are specified assuming that you are focusing on an object at infinity. This means that all rays entering the lens are travelling in parallel.

Rays focusing at the image plane

As you can see, the rays that make it through the optical system are focusing to a single point. The reason many rays are terminated prior to reaching the surface is that the effective aperture is much smaller than the first element. Recalling that we can compute the effective aperture by using the formula $N=\frac{f}{D}$, and that for the Nikon lens we found the $D \approx 10mm$. Now if we cast rays 5mm above and 5mm below the optical axis we can find determine the height of the aperture (green line) as the height of the ray when it passes through the aperture stop, in this case ~22mm diameter.

Rays cast at the edge of the effective aperture to compute the physical aperture size.

To further validate the optical system, we can sweep rays through the full FOV range specified by the patent4. Remember it isn’t going to be perfect since we ignore aspherical elements but it should be relatively close. On a side note, this an interesting experiment because you can see that fewer rays make it through at steeper angles causing Vignetting5.

Rays from a direction focusing on the focal plane (from the canon zoom lens)

Using this as the final test we can consider the optical system validated. If you have something horribly wrong with it you would notice it at this point. From here we can start to trace the actual flares.

Flares/Ghosts

The first thing we need to do related to rendering flares is to enumerate the different paths that can end up back at the image plane. Since we lose a significant amount of intensity each time a ray reflects at an optical surface we will follow the papers and only consider rays which have bounced exactly twice. An additional consideration we can make is that a significant amount of light is lost when the light passes through the aperture stop. To factor this into the enumeration process we can say that we only care about light that passes the aperture stop exactly one time and bounces exactly two times. Another way of thinking about this is that we are only considering paths of light that bounce both times before the aperture or after the aperture. The pseudo code for enumerating the flares/ghosts is below:

a = getApertureIndex()
n = getNumElements()
for (i = 0; i < a; ++i)
for (j = i + 1; j < a; ++j)
OpticalGhost ghost;
ghost._reflections[0] = j;
ghost._reflections[1] = i;
ghosts.push_back(ghost);

for (i = a + 1; i < n; ++i)
for (int j = i + 1; j < n; ++j)
OpticalGhost ghost;
ghost._reflections[0] = j;
ghost._reflections[1] = i;
ghosts.push_back(ghost);

Now that we have a list of all possible ghosts in the system we need to modify the ray tracing to account for reflecting at a certain interface. I’ll leave that as an exercise for the reader since it is fairly straight forward. The result of tracing a single ghost should look like this:

This result suggests that we will have a flare at the center of the screen and it will be clipped by the aperture plane (thus aperture shaped). This will be the end of the ray tracing approach presented by the first paper but we will use it to confirm an alternative approach, Matrix Method.

Matrix Method

The matrix method is a general optics technique that approximates the effects of an optical system as a linear system. Since each optical interface is represented as a linear system the results can be concatenated into a single matrix describing the entire system. At the core approach of this is the paraxial approximation6. This approximation states that for small angles (< 10 degrees) of $\theta$:

$\sin \theta \approx \theta$
$\tan \theta \approx \theta$
$\cos \theta \approx 1$

Using this approximation we can represent the transforms of our optical system, namely translation(T), reflection(L), and refraction(R), as 2×2 matrices functioning on the height and angle of a ray: $\left[ \begin{array}{cc} H_{0}\\\theta_{0}\end{array} \right]$. For full derivations of these matrices see Matrix Methods in Paraxial Optics.

$T = \left[ \begin{array}{cc} 1&d \\ 0&1\end{array} \right]$

$R_{curved} = \left[ \begin{array}{cc} 1&0 \\ \frac{n_1 - n_2}{Rn_2}&\frac{n_1}{n_2}\end{array} \right]$

$R_{flat} = \left[ \begin{array}{cc} 1&0 \\ 0&\frac{n_1}{n_2}\end{array} \right]$

$L_{curved} = \left[ \begin{array}{cc} 1&0 \\ \frac{2}{R}&1\end{array} \right]$

$L_{flat} = \left[ \begin{array}{cc} 1&0 \\ 0&1\end{array} \right]$

So now instead of tracing N rays for each ghost we will simply concatenate the matrices that represent each optical interface and project a flare quad. Generally there will be pairs of transforms that translate to then refract by the interface, since we are talking about matrices the order is reverse $R_n T_n$. When we hit the interface that we should reflect at we multiply by $L_{n}$ and then as we progress backwards through the system we apply the inverse of the above pairs, $T_n R_{n}^{-1}$ (Translation is unaffected by the change in direction), until we hit the second reflection $L_{n}^{-1}$, at which point you resume concatenating the remaining pairs using the original $R_n T_n$. A simple example would be: $M = R_2 T_2 R_1 T_1 L_{0}^{-1} T_1 L_1 T_1 R_0 T_0$ representing a ray entering a 3 interface system translating to and refracting through the first element ($R_0 T_0$), translating and reflecting off the second ($L_1 T_1$), traveling back through the second ($T_1$), reflecting off the first and then propagating out of the system ($R_2 T_2 R_1 T_1 L_{0}^{-1}$).

Important! The inverse in this case doesn’t refer to the matrix inverse but rather the operation in the reverse direction i.e. $R(n_1, n_2, r)^{-1} = R(n_2, n_1, -r)$, $L(r)^{-1} = L(-r)$, and $T(d)^{-1} = T(d)$. This isn’t mentioned in the original paper and if you use the standard matrix inverse the results will be quite a bit off.

While the matrix method (yellow) isn’t as accurate as ray tracing (white) it is pretty close in most situations we care about.

Placing the ghosts

Using the matrices that describe the system, we generate a flare quad for the final flare rendering. To do this we actually need 2 matrices, the first is the matrix that describes the optical system up to the aperture stop ($M_a$), the second describes the optical system from the aperture stop to the sensor plane ($M_s$). $M_a$ will be used to find the entrance pupil of the ghost and $M_s$ will be used to find the projected aperture. If you recall from earlier, the entrance pupil is the aperture as seen from the front of the lens following the path of light that we care about. Essentially we want solve for the height of the entrance pupil $H_{e_1}$ and $H_{e_2}$ using the direction of the light $\theta_e$, the height of the aperture $h_a$, and the optical system matrix of the lens prior to the aperture $M_a$.

$\left[ \begin{array}{cc} H_{a_1} \\ \theta_{a_1}\end{array} \right] = \left[ \begin{array}{cc} H_{e_1} \\ \theta_{e}\end{array} \right] \times M_{a}$

Disregarding $\theta_{a_1}$ since we only care about the height of the entrance pupil we can expand the matrix multiply:

$H_{a_1} = H_{e_1} M_{a00} + \theta_{e} M_{a10}$ (column major)

solving for $H_{e_1}$:

$H_{e_1} = \frac{H_{a_1} - \theta_{e} M_{a10}}{M_{a00}}$

Using this equation and plugging in $\pm H_{a}$ we can find tightly fitting bounds for the aperture. These bounds are then projected through the entire optical system ($M_s \times M_a$) to find the final flare quad position on the sensor. As a final step the positions are projected on the sensor’s image circle, which for 35mm film is $\approx 43.3mm$ diameter, to get the final normalized coordinates and size.

These values represent offets along the vector from the center of the screen to the light position (in my case the sun). I found the screen space position of the sun by projecting a point a far distance from the camera in the direction of the sun although I’m sure there are more accurate ways. The starting angle $\theta_e$ is simply $\theta_e = \cos^{-1} (E \cdot L)$ where $E$ is the forward vector of the camera in world space and $L$ is the direction of the sun. The resulting pseudo code looks like this:

Theta_e = acosf(dot(E, L));
Axis = Project(CameraPos + L * 10000.0).Normalize();
H_a = GetApertureHeight();
foreach ghost
// Aperture projected onto the entrance
H_e1 = (H_a - ghost.M_a[1][0] * Theta_e) / ghost.M_a[0][0];
H_e2 = (-H_a - ghost.M_a[1][0] * Theta_e) / ghost.M_a[0][0];

// Flare quad bounds projected through the system
H_p1 = ((ghost.M_s * ghost.M_a) * Vector2(H_e1, Theta_e)).x;
H_p2 = ((ghost.M_s * ghost.M_a) * Vector2(H_e2, Theta_e)).x;

// Project on to image circle
H_p1 /= 21.65;
H_p2 /= 21.65;

Center = (H_p1 + H_p2) / 2;
Radius = abs(H_p1 - H_p2) / 2;

Transform *= CreateTranslation(Axis * Center);

RenderFlare(Transform);

Intensity and Color

Once we have the placement of the ghost we need to determine what the intensity and color should be. I chose to deviate from the papers slightly for both of these values. For the intensity I chose to make the intensity relative to that of the optical system’s effective aperture. I calculate it by determining $M_a$ for the optical system with out introducing any reflections. Then I calculate the projected size of the aperture with $\theta_e = 0$, since it is symmetrical you only need to do this once:

$e = \pm \frac{H_{a}}{M_{a00}}$

Then intensity, $I$, is initialized to the ratio between surface area of the ghost’s effective pupil vs the surface area of the system’s effective pupil. The intensity is then scaled by the final projected area of the flare quad. The final intensity represents the relative intensity compared to an in-focus pixel so you would multiply the light color by it to get your final pre-exposed color.

entrancePupil = H_a / system.M_a[0][0];
Intensity = Sqr(H_e1 - H_e2) / Sqr(2 * entrancePupil);
Intensity /= Sqr(2 * Radius);

Comparison of flare intensity at f/8.0 and f/16.0. The perceived brightness of the flares increase at the higher f-number at the same exposure even though the aperture is smaller. This is represented by the ratio of flare/entrance pupil.

With the intensity calculated the color must be determined. Once again I deviated slightly from the papers to do this. The main contributor to color for a flare is the anti-reflective coating on the lens. These coatings are designed to create an interference pattern and cancel out light that is reflected at each interface similar to noise cancelling headphones. The shift in color occurs because the coatings are designed to minimize reflections at a specific wavelength, generally a mid length wavelength like 500nm, this results in non-uniform reflection across the various wavelengths. I found the best explanation and reference here, this is the same as is presented in the papers. I deviate from the authors because rather than trace a single ray to determine the color, I chose to evaluate the reflectance at random angle per ghost (once for red, green, and blue). This produces a stable color per flare and makes it easier to ensure that the angle doesn’t exceed the critical angle of the system, resulting in invalid reflectance values.

// http://www.gen-opt.com/lamda_admin/Lamda_Edit/UploadFile/2011222112937198.pdf
// lambda	- Wavelength of light being tested for reflectance
// d		- Thickness of the anti-reflectance coating (lambda0 / 4.0f / n1) where
//				lambda0 is the wavelenght of light the coating was designed for, typically
//				a midrange wavelenght like green (550 nm)
// theta1	- Angle of incidence at the edge of the coating
// n1		- The Index of Refraction for the incoming material
// n2		- The Index of Refraction for the coating, max(sqrt(n1 * n2), 1.38) where 1.38
//				is the index of refraction for a common magnesium floride coating.
// n3		- The Index of Refraction for the outgoing material
static float Reflectance(float lambda, float d, float theta1, float n1, float n2, float n3)
{
// Apply Snell's law to get the other angles
float theta2 = asinf(n1 * sinf(theta1) / n2);
float theta3 = asinf(n1 * sinf(theta1) / n3);

float cos1 = cosf(theta1);
float cos2 = cosf(theta2);
float cos3 = cosf(theta3);

float beta = (2.0f * MATH_PI) / lambda * n2 * d * cos2;

// Compute the fresnel terms for the first and second interfaces for both s and p polarized
// light
float r12p = (n2 * cos1 - n1 * cos2) / (n2 * cos1 + n1 * cos2);
float r12p2 = r12p * r12p;

float r23p = (n3 * cos2 - n2 * cos3) / (n3 * cos2 + n2 * cos3);
float r23p2 = r23p * r23p;

float rp = (r12p2 + r23p2 + 2.0f * r12p * r23p * cosf(2.0f * beta)) /
(1.0f + r12p2 * r23p2 + 2.0f * r12p * r23p * cosf(2.0f * beta));

float r12s = (n1 * cos1 - n2 * cos2) / (n1 * cos1 + n2 * cos2);
float r12s2 = r12s * r12s;

float r23s = (n2 * cos2 - n3 * cos3) / (n2 * cos2 + n3 * cos3);
float r23s2 = r23s * r23s;

float rs = (r12s2 + r23s2 + 2.0f * r12s * r23s * cosf(2.0f * beta)) /
(1.0f + r12s2 * r23s2 + 2.0f * r12s * r23s * cosf(2.0f * beta));

return (rs + rp) * 0.5f;
}


Anti-reflection coatings is one of the areas where lens making has made significant strides in the past 20 years. Below is a comparison between a lens from the 80s and a lens from today.

Top: 1980’s Vivitar Zoom
Boom: 2014 Sony FE 28-70
One of the major differences between legacy and modern lenses is the quality and characteristics of anti-reflection coatings. Both images were taken with similar settings.

Flare Texture

Now that we have a color, intensity, and placement for the flare we need to generate a texture. This would be an ideal place to hand it over to an artist to whip something up but I don’t have that luxury at the moment so we will just have to make due. The flare texture should be a projection of the aperture’s shape. On most lenses this would be an n-gon, where n is the number of aperture blades. On higher end lenses a lot of effort goes in to maintaining a circular aperture shape at all f-numbers. The simplest way to do either of these is to generate a small distance field texture, 64×64 in my implementation. If you simply want a circular aperture then the distance is just:

distance = length(In.Texcoord * 2.0f - 1.0f);

If you want to have an n-gon then you can calculate the distance as the max distance of the coordinate dotted with each axis, where the axis is a vector that is perpendicular to the aperture edge:

distance = 0
coord = In.Texcoord * 2.0f - 1.0f;
foreach axis
distance = max(distance, dot(axis, coord));

This is the most flexible representation but you could also exploit symmetry to execute less dot products. For example, an octagon can be represented with 1 dot and 2 max operators instead of 8 dot products and 8 max operators.

 coord = abs(Texcoord * 2.0f - 1.0f);
distance = max(coord.x, coord.y);
distance = max(distance, dot(coord, float2(0.70710678118f, 0.70710678118f)));

Smoothing the result

Application of smoothing removes the peaks at the corners in the faded version of the aperture. k = 0.1 for the right hand side.

Directly using this shape will give acceptable results but when applying a falloff based on the distance field some peaks at aperture edges can be seen. To eliminate these I simply applied the smooth max operator, adapted from Inigo Quilez’s Smooth Min article and the identity $max(a,b) = -min(-a, -b)$.

// polynomial smooth min
// from: http://www.iquilezles.org/www/articles/smin/smin.htm
float smin(float a, float b, float k)
{
float diff = b - a;
float h = saturate(0.5 + 0.5 * diff / k);
return b - h * (diff + k * (1.0f - h));
}

float smax(float a, float b, float k)
{
float diff = a - b;
float h = saturate(0.5 + 0.5 * diff / k);
return b + h * (diff + k * (1.0f - h));
}


Using this technique you can get lots of different source aperture shapes.

Various aperture shapes obtained using the smoothed distance field approach

Entrance Clipping

To get the final shape of the flare we need to account for the case when our effective pupil lies outside the physical entrance heights. In this case we should be clipping the aperture shape with the entrance shape. Incorporating this effect is relatively simple and can break up the otherwise repeating effect. Pseudo code follows:

...
H_e = GetEntranceHeight();
foreach ghost
...
// Entrance projected on to the aperture
E_a1 = (ghost->_ma * Vector2(H_e, thetaE)).x;
E_a2 = (ghost->_ma * Vector2(-H_e, thetaE)).x;

// Move the projection into aperture space as a position and radius
float C_e = ((e_a1 + e_a2) * 0.5) / H_a - Center;
float R_e = (abs(E_a1 - E_a2) * 0.5) / H_a;

These two parameters are uploaded along with the screen space light direction and are combined with the distance field using the same smooth max approach as before:

 // Sample the distance field
float fDistance = FlareTexture.SampleLevel(LinearSampler, Texcoord, 0.0f).x;

// Clip against the entrance in aperture space
float2 coord = Texcoord * 2.0f - 1.0f;

float fApertureMask = smoothstep(0.95, 0.80, fDistance);


Effects of entrance clipping on a single flare.

Final Results

This brings us to the end of what I decided to implement from the papers. Here are some final results using the info presented in the post.

This slideshow requires JavaScript.

Optimizations

The authors present some recommendations for performance such as culling flares whose intensity falls below a certain value or only rendering the top 40% flares. These ideas could be extended with rendering out the flares at multiple resolutions to avoid the fill rate cost of the larger flares that cover a majority of the screen.On the CPU side the system matrices only need to be computed once so there is no need to precompute their values.

Conclusion

The papers present an interesting method for representing a “Physically Based” lens flare model. If I were to implement it in a production engine I would leave the flare shape and colors as artist configurable and simply implement the flare placement and entrance clipping. The method outlined in this post has it’s limitations since it’s still representing the entire optical system with a linear approximation. Perhaps an interesting approach would be to apply this method to a majority of the flares in a system and then use the more expensive ray-tracing technique proposed by the first paper to represent complex flares. Well, I hope you enjoyed the first installment of Implementation Notes. If you liked it let me know and if you do this type of thing in your own time you should write about it!

Implementing a Physically Based Camera: Automatic Exposure

Implementing a Physically Based Camera:

1. Understanding Exposure
2. Manual Exposure
3. Automatic Exposure

Automatic exposure is the process of adjusting camera settings in response to a given scene to achieve a well exposed image. This is sometimes referred to as eye-adaptation in games.  However, the goal here is to obtain settings that will be used by other rendering effects such as Depth of Field and Motion Blur. As you will see, instead of directly adjusting the exposure, we will move through the camera settings in a couple different ways to reach our desired exposure. Assuming that we already have a linear HDR render target containing the scene, the automatic exposure process starts with metering.

Metering

Metering is the process of the camera measuring the current average scene luminance. In the most basic implementation it would simply calculate the log average of the scene; however, there are a few other methods of metering a scene. We will look at Spot, Center Weighted, and Multizone metering modes in addition to Average. Each of these methods weight the luminance contribution of the scene using a weighting function $w(x, y)$ resulting in a weighted average luminance calculation in the form:

$w_{total} = \sum_{x=0}^{Width} \sum_{y=0}^{Height} \mathrm{w}(x, y)$

$\log(L_{avg}) = \frac{\sum_{x=0}^{Width} \sum_{y=0}^{Height} \log(L_{x,y}) \times \mathrm{w}(x, y)}{w_{total}}$

Average metering

Average metering is the most simple, but also a very effective method, of metering. To perform average metering we simply assign every pixel in the scene the same weight, making our weighting function:

$\mathrm{w}(x,y) = {1}$

Spot metering

Spot metering involves measuring only the scene luminance that overlaps with a chosen region of the scene. Typically this is done in a small circle in the center of the image (1-5% of the image) but it can be performed by using any window of data. This type of metering is useful in photography when you wish to make a targeted object or region well exposed without worrying about the exposure of the rest of the scene.

$\mathrm{w}(x,y) = \begin{cases} 1 & \|P_{x,y} - S_{x,y}\| \leq S_{r}\\ 0 & \|P_{x,y} - S_{x,y}\| > S_r\end{cases}$

This slideshow requires JavaScript.

Center weighted metering

Center weighted metering gives more influence to scene luminance values located in the center of the screen. This type of metering was popular in the film days and has been carried over into digital.

$\mathrm{w}(x,y) = \mathrm{smooth}(\frac{\|P_{x,y} - C\|}{\frac{Width}{2}})$

where smooth(x) is a smoothing function such as smoothstep.

This slideshow requires JavaScript.

Matrix/Multizone metering

The final type of metering is called matrix or multizone metering depending on the manufacturer of the camera. Each manufacturer has a different proprietary formula for this type of metering, but the intention is more or less the same: prioritize exposure for the most “important” parts of the scene. Another way of thinking of this type of metering is informed. First, picture the scene broken up into a grid. Then processing is performed on each cell giving it a classification; i.e. by a function of its focus and min/max luminance. The results of all of the cells are read back and used to index into a database of exposure patterns to obtain a final exposure target. For the sake of this article, we will define our informed weighting function for multizone metering as a simple function of the distance from the focal plane and bypass the grid based approach:

$\mathrm{w}(x, y) = \frac{1}{0.1 + |z(x,y) - z_{f}|}$

where z(x,y) is the viewspace z at pixel <x, y> and $z_{f}$ is the viewspace z of the focal plane.

This slideshow requires JavaScript.

Comparison

The described metering modes result in a very different exposure from the same simple scene. There is no correct answer for what type of metering you should use but Average seems to be the most used currently. The Multi (informed) metering may be a good fit for certain types of games because the focal plane of the camera is often changed to emphasize objectives, or simply follows where the player is aiming. Here is a final comparison of the different modes.

Results using the mentioned metering modes in a simple rendering engine.

Comparing these results to that of a camera we can see that they behave similarly (with the exception of Multi due to proprietary weighting functions).

Real world results from various metering modes for a back lit scene.

Smoothing the results of metering

To create a smooth transition and avoid large changes in exposure in a small period of time, it is a good idea to smooth the metering results. This can be done by either recording N frames of average scene luminance and then taking average of averages. Alternatively you can use a simple exponential feedback loop to smooth the results of scene metering. I first saw this used in the Average luminance calculation using a compute shader1 post by MJP, which was originally from a SIGGRAPH paper about visual adaption models Time-Dependent Visual Adaptation For Fast Realistic Image Display2. It is really easy to implement and doesn’t involve storing a histogram or large buffer of previous values:

$L_{avg} = L_{avg} + (L_{new} - L_{avg}) * (1 - e^{-\Delta t \tau}))$

where $\Delta t$ is the delta time from the previous frame and $\tau$ is a constant that controls how fast the model adapts to a new target luminance (higher is faster). For my implementation I use $\tau=1$.

Applying the results from metering

As originally stated we want to use our metered scene value to change the camera settings in our physical camera model. We will do this through the use of the $EV_{100}$, or exposure value at ISO 100. To calculate the $EV_{100}$ from the metered scene luminance we will borrow some calculations from light meters.3

$EV_{100} = \log_{2}(\frac{L \times {100}}{K})$

where K is the reflected-light meter calibration constant, assumed to be 12.5.3 $EV_{100}$ relates to camera settings using the following formula:

$EV_{100}= \log_{2}(\frac{N^2 \times 100}{t \times S})$

We can use these functions to calculate our desired camera settings by fixing some of the variables to a user defined input or supplying an initial guess. Depending on the settings that become fixed, we will be putting the camera into one of three modes: Aperture Priority, Shutter Speed Priority, or Program Auto. For the following calculations to be valid, we first state that for each camera setting there exists a minimum and maximum value (either physical in the case of aperture or subjective in the case of ISO) that the camera model is bound to.

Exposure Compensation

Almost all cameras have an additional feature in them called exposure compensation($EV_{comp}$). This is a simple bias to correct the automatic exposure system. This compensation, specified in EV, adjusts the the $EV_{100}$ prior to executing the settings determination stage. To the end user, this allows them to inform the metering system that there is a discrepancy between what they are seeing and what the camera is capturing. Obviously the camera doesn’t know the intended subject, so it might be fooled into exposing for a neon sign rather than a person standing in front of it. By dialing in the exposure compensation, the user can get the image as they see it. To factor in exposure compensation we just have to modify the original $EV_{100}$ to reach the final target exposure value, $EV_{t}$:

$EV_{t}=EV_{100}-EV_{comp}$

It makes sense to subtract the exposure compensation because for a positive adjustment we are telling the exposure adjustment system that average scene is darker than it actually is, resulting in a higher overall exposure. This target EV is then fed in to one of the adjustment modes. One interesting thing to note in the series of images below, is that we are applying a modification of 1 EV which represents a change in the amount of light by a factor of 2. This results in the ISO setting changing between each shot, in the -1 EV we are at ISO 100, 0 – ISO 200, +1 ISO 400. This confirms that we are capturing twice as much light between each image.

Comparison of the same scene shot with different exposure compensation values.

Aperture Priority

Aperture priority mode allows the user to select an aperture for aesthetic reasons and the camera will adjust the remaining variables to obtain a well exposed image. Since we only have one equation and two remaining unknowns we will have to make an initial guess. There is a general rule that it is preferable to have a shutter speed that is faster than $\frac{1}{f}$, where f is the focal length of the lens in mm. The intended effect of this rule is to negate the camera shake that is introduced by the hands of an average photographer. If you look back at the exposure compensation images (shot at 70mm) the shutter speed chosen by the camera was $\frac{1}{80}$ which is the closest shutter speed, price is right rules, that satisfies $\frac{1}{f}$. This rule is the basis for the initial shutter speed, $t_{i}$:

$t_{i} = \frac{1}{f}$

Given the initial shutter speed and user defined final aperture, $N_{f}$, we can determine the desired ISO:

$S_{d} = \frac{N_{f}^2 \times 100}{t_{i} \times 2^{EV_{t}}}$

The ISO is then clamped to the range provided by the camera to find the final ISO value.

$S_{f} = \mathrm{clamp}(S_{d}, S_{min}, S_{max})$

With the final aperture and the final ISO known we can adjust our initial shutter speed guess to arrive at the final camera settings. To do this we compute the exposure value obtained with the current settings:

$EV_{c} = \log_{2}(\frac{N_{f}^2 \times S_{f}}{t_{i} \times 100})$

We use this to figure out the difference from the original target exposure value:

$EV_{diff} = EV_{t} - EV_{c}$

Recalling that leaving the shutter open for twice as long will allow twice as much light into the sensor (i.e. represent a 1 EV shift), we can calculate the desired shutter speed.

$t_{d} = t_{i} \times 2^{-EV_{diff}}$

Once again we have to clamp the desired value to the range that the camera is capable of:

$t_{f} = \mathrm{clamp}(t_{d}, t_{min}, t_{max})$

Shutter Speed Priority

Similarly to the aperture priority mode, the user provides the final value for the shutter speed ($t_f$). This time we need to make an initial guess for the aperture. Through inspecting the behavior of my personal camera f / 4.0 is a good place to start for the initial aperture ($N_i$). Following the aperture priority example, we first find the final ISO, and then the difference in exposure value from our target. Finally, we adjust our initial aperture guess to arrive at our final settings. Once again, recalling that increasing the surface area of the aperture opening by a factor of 2 will double the amount of light hitting the sensor (1 EV shift), and assuming that the aperture is round, it can be figured out that the amount the radius of the aperture by a factor of $\sqrt{2}$. Using this relationship, we apply the adjustment and clamp the aperture in similar fashion:

$N_{d} = N_{i} \times {\sqrt{2}}^{EV_{diff}}$

$N_{f} = \mathrm{clamp}(N_{d}, N_{min}, N_{max})$

Note: The sign changes on the adjustment factor because the f-number (N) is actually the ratio of the focal length to the aperture diameter $N = \frac{f}{D}$.

Program Auto

Program auto mode is a hybrid of the first two approaches. It aims to keep the shutter speed at a sufficient speed and set the aperture for a reasonable depth of field. We will follow the same approach as the previous two examples except that we will guess both the aperture, and the shutter speed, using the same best guesses as before. This time, when we go to apply $EV_{diff}$ we will divide the adjustment in half and apply it to the aperture. We then recompute the EV_{diff} with the adjusted aperture and apply the remaining adjustment to the shutter speed. This can be seen in the code example below.

Implementing automatic exposure in code

As explained in the first post, the aperture, shutter speed, and ISO are properties of the lens, camera body, and sensor respectively. The following values will be used:

$t_{min} = \frac{1}{4000}s$
$t_{max} = \frac{1}{30}s$
$N_{min} = 1.8$
$N_{max} = 22.0$
$S_{min} = 100$
$S_{max} = 6400$

// References:
// http://en.wikipedia.org/wiki/Film_speed
// http://en.wikipedia.org/wiki/Exposure_value
// http://en.wikipedia.org/wiki/Light_meter

// Notes:
// EV below refers to EV at ISO 100

// Given an aperture, shutter speed, and exposure value compute the required ISO value
float ComputeISO(float aperture, float shutterSpeed, float ev)
{
return (Sqr(aperture) * 100.0f) / (shutterSpeed * powf(2.0f, ev));
}

// Given the camera settings compute the current exposure value
float ComputeEV(float aperture, float shutterSpeed, float iso)
{
return Log2((Sqr(aperture) * 100.0f) / (shutterSpeed * iso));
}

// Using the light metering equation compute the target exposure value
float ComputeTargetEV(float averageLuminance)
{
// K is a light meter calibration constant
static const float K = 12.5f;
return Log2(averageLuminance * 100.0f / K);
}

void ApplyAperturePriority(float focalLength,
float targetEV,
float& aperture,
float& shutterSpeed,
float& iso)
{
// Start with the assumption that we want a shutter speed of 1/f
shutterSpeed = 1.0f / (focalLength * 1000.0f);

// Compute the resulting ISO if we left the shutter speed here
iso = Clamp(ComputeISO(aperture, shutterSpeed, targetEV), MIN_ISO, MAX_ISO);

// Figure out how far we were from the target exposure value
float evDiff = targetEV - ComputeEV(aperture, shutterSpeed, iso);

// Compute the final shutter speed
shutterSpeed = Clamp(shutterSpeed * powf(2.0f, -evDiff), MIN_SHUTTER, MAX_SHUTTER);
}

void ApplyShutterPriority(float focalLength,
float targetEV,
float& aperture,
float& shutterSpeed,
float& iso)
{
// Start with the assumption that we want an aperture of 4.0
aperture = 4.0f;

// Compute the resulting ISO if we left the aperture here
iso = Clamp(ComputeISO(aperture, shutterSpeed, targetEV), MIN_ISO, MAX_ISO);

// Figure out how far we were from the target exposure value
float evDiff = targetEV - ComputeEV(aperture, shutterSpeed, iso);

// Compute the final aperture
aperture = Clamp(aperture * powf(Sqrt(2.0f), evDiff), MIN_APERTURE, MIN_APERTURE);
}

void ApplyProgramAuto(float focalLength,
float targetEV,
float& aperture,
float& shutterSpeed,
float& iso)
{
// Start with the assumption that we want an aperture of 4.0
aperture = 4.0f;

// Start with the assumption that we want a shutter speed of 1/f
shutterSpeed = 1.0f / (focalLength * 1000.0f);

// Compute the resulting ISO if we left both shutter and aperture here
iso = Clamp(ComputeISO(aperture, shutterSpeed, targetEV), MIN_ISO, MAX_ISO);

// Apply half the difference in EV to the aperture
float evDiff = targetEV - ComputeEV(aperture, shutterSpeed, iso);
aperture = Clamp(aperture * powf(Sqrt(2.0f), evDiff * 0.5f), MIN_APERTURE, MIN_APERTURE);

// Apply the remaining difference to the shutter speed
evDiff = targetEV - ComputeEV(aperture, shutterSpeed, iso);
shutterSpeed = Clamp(shutterSpeed * powf(2.0f, -evDiff), MIN_SHUTTER, MAX_SHUTTER);
}


Other Considerations for Exposure

Filters

If a photographer is presented with a tricky exposure, they always have the option of using filters on their lens. Filters serve many different purposes: from decreasing the incoming light to reducing reflections through polarization. A popular type of filter used in sunny situations is the Natural Density, or ND filter. This type of filter blocks a percentage of light as it enters the camera, usually specified in stops; so a 3 stop ND filter would reduce the scene brightness by a factor of $2^3$. These are useful when the light is so bright that it becomes impossible to open the aperture past a certain point. To factor these into the camera model, we would simply multiply the rendered scene before it goes into metering or the rest of the post processing pipeline. Another type of filter that is used in landscape photography is a graduated ND filter. Think of this filter as a gradient, from bottom to top, where the effect of the ND filter is weaker at the bottom and stronger at the top. These are used to reduce the brightness of the sky so the values do not clip to white when exposing for foreground objects.

Looking at a light through an ND filter brings it back into a range that the camera is able to capture

Valid settings

We make the assumption above that the camera settings can be anything between the min and max settings. On a still camera this is often not the case. Most of the time the shutter speed will have a number of preset speeds that it can operate at, moving in a fixed EV increment like +-0.3. Similarly, aperture will have discrete settings following 0.3 or 0.5 EV. One exception to this is on lenses designed for cinema. These have a “De-Clicked” aperture ring, which means, the aperture can be any value between min and max, allowing the operator to adjust the exposure smoothly with-in a single take. This doesn’t need to be explicitly considered but it may suggest that placing the camera in shutter priority may make the most sense.

Conclusion

Automatic exposure in games is likely here to stay. It would be extremely time consuming, and in some cases, completely impossible to provide camera settings and exposure values for every situation in a game. Hopefully this article provided some insight into how a camera deals with auto-exposure and how it relates back to the camera model settings. In the very least, it may give you some additional ideas for constructing a better interface to control the look and feel of the scene through the use of exposure compensation, setting priority modes, and scene metering modes. Now that we have a well exposed scene that can adjust to various lighting conditions we will start to use the camera settings in some post effects. Stay tuned for the next post!

Implementing a Physically Based Camera: Manual Exposure

Implementing a Physically Based Camera

1. Understanding Exposure
2. Manual Exposure

The first post in the series described the three key properties of exposure: Aperture (N), Shutter Speed (t), and ISO (S). In this post we will implement manual exposure, meaning that all 3 values are provided by the application or user. Once again this series of posts is heavily inspired by “Moving Frostbite to Physically Based Rendering” 1 where they cover almost everything presented here in the course notes.

From camera settings to pixel value

To convert from the provided camera settings to a final pixel value we need to know two things: The photometric exposure (H) (amount of light reaching the sensor) and the ISO rating that results in a well exposed image. The photometric exposure is a function of the incoming scene luminance (L), the optics in the lens, the aperture of the lens, and the time that the sensor is exposed to to the light.

$H = \frac{q \times L \times t}{N^2}$

q represents the properties of lens optics and is defined as follows:

$q = \frac{\pi}{4} \times T \times v(\theta) \times cos^4(\theta)$

Where T is the transmission of the lens (the percent of light that is not absorbed by the optics in the lens), v is the vignetting factor (the characteristic of light falloff from the center of the image to the edges), and $\theta$ is the angle relative to the axis of the lens. To simplify the following equations we will assume the standard q value of 0.65 ($T = 0.9, v = 0.98, \theta = 10^{\circ}$).

Once the photometric exposure is known the sensors ISO rating can be measured. The ISO standards define five different ways of measuring the ISO rating of a digital sensor. This article will use two of these methods: Saturation-based Speed($S_{sat}$) and Standard Output Sensitivity($S_{sos}$). These methods will be used to solve backwards from a known ISO value (S) to the representative scene luminance and then into an exposure scalar that we can render with.

Saturation-based speed

Saturation-based speed rates the ISO using the maximum photometric exposure ($H_{sat}$) that doesn’t lead to pixel values greater than 255/255 (i.e. saturated).

$S_{sat} = \frac{78}{H_{sat}}$

The value 78 was chosen represent an average pixel value of $\frac{18\%}{\sqrt{2}}$ where the $\sqrt{2}$ allows for headroom for highlights2. As stated, the saturation-based speed method is solving for the maximum luminance, making the substitution with the photometric exposure we have:

$H_{sat} = \frac{0.65 \times L_{max} \times t}{N^2}$

$S_{sat} = \frac{7800}{65} \times \frac{N^2}{L_{max} \times t}$

Solving for $L_{max}$:

$L_{max} = \frac{7800}{65} \times \frac{N^2}{S_{sat} \times t}$

We would solve for an exposure (E) by scaling the luminance such that $L_{max} \times E \rightarrow 1.0$

$E = \frac{1.0}{L_{max}}$

Standard output sensitivity

The standard output sensitivity method is similar the the saturation based method but solves for the exposure that results in pixel values of 118/255 (0.18 encoded with sRGB into an 8-bit pixel).3

$S_{sos} = \frac{10}{H_{sos}}$

Similar to the previous method we will substitute in the photometric exposure but we will solve for $L_{avg}$ instead.

$H_{sos} = \frac{0.65 \times L_{avg} \times t}{N^2}$

$S_{sos} = \frac{1000}{65} \times \frac{N^2}{L_{avg} \times t}$

$L_{avg} = \frac{1000}{65} \times \frac{N^2}{S \times t}$

This time when solving for the exposure we determine what exposure would yield 18% ($L_{avg} \times E \rightarrow 0.18$).

$E = \frac{0.18}{L_{avg}}$

As you can see the two equations are nearly identical in how they measure the ISO speed rating. In fact, if we were to state that $\frac{18\%}{\sqrt{2}}$ should represent the average luminance instead of $18\%$ and we assumed that we were simply applying a gamma curve to a linear input then they are equivalent!

$L_{avg} = 0.127 * L_{max}$

$0.127 * L_{max} = \frac{1000}{65} \times \frac{N^2}{S \times t}$

$L_{max} \approx \frac{7800}{65} \times \frac{N^2}{S \times t}$

Tonemapping Considerations

The intention of these exposures is to end up with a well exposed image (i.e. an average pixel value of 118).18% is chosen because it is assumed that a standard gamma curve is being applied: $\frac{118}{255}^{2.2} = 0.18$. However; this is not true in most games. Instead we map the scene luminance values using a tonemapping operator (I suggest checking out John Hable’s post if you need some background information). This means that we should modify our exposure calculations to compensate for that. This can be done to both SBS and SOS measurements but it is easier to see how it would integrate with the SOS method.

As the last step of converting the luminance value into an exposure we use 18% as the middle grey value. To adjust for your tonemapping operator we just need to solve for the input value that results in a pixel value of 118/255.

$T(x) = \frac{118}{255}$

We do this by either applying the inverse tonemapping operator (if your tonemapper can be inverted), or use some graphing software to solve for the intersection of $y=\frac{118}{255}$.

Using the optimized tonemapper by Jim Hejl and Richard Burgess-Dawson 4 $x \approx 0.148$. Substituting this value in to find the exposure we end up with:

$E = \frac{0.148}{L_{avg}}$

These equations are extremely straightforward to implement in code:

/*
* Get an exposure using the Saturation-based Speed method.
*/
float getSaturationBasedExposure(float aperture,
float shutterSpeed,
float iso)
{
float l_max = (7800.0f / 65.0f) * Sqr(aperture) / (iso * shutterSpeed);
return 1.0f / l_max;
}

/*
* Get an exposure using the Standard Output Sensitivity method.
* Accepts an additional parameter of the target middle grey.
*/
float getStandardOutputBasedExposure(float aperture,
float shutterSpeed,
float iso,
float middleGrey = 0.18f)
{
float l_avg = (1000.0f / 65.0f) * Sqr(aperture) / (iso * shutterSpeed);
return middleGrey / l_avg;
}


Validating the results

We haven’t talked about converting lighting units to photometric units yet (but we will) so to validate that we have decent exposure settings we will turn to the Sunny 165 rule. This rule states that when the aperture is set to f/16.0 and the shutter speed equals the inverse of the ISO we should have a good exposure for a typical sunny day. From tabular data we find that the expected EV for a sunny day respecting this rule would be around 15. Again from tabular data a standard lightmeter reading of 15 EVs would represent an average luminance of $\approx 4096$.6 Substituting into:

$L_{avg} = \frac{1000}{65} \times \frac{N^2}{S \times t}$

$L_{avg} = \frac{1000}{65} \times \frac{16^2}{100 \times \frac{1}{100}}$

$L_{avg} \approx {3938}$

This value is very close to what we would expect to expose for so we are in the ball park. And here is a test scene exposed using the Sunny 16 rule. The average scene luminance is $\approx 4000$ and we end up with a well exposed image.

In the next post in the series we will cover the various methods for implementing automatic exposure.

Series Introduction

This series of posts are inspired by the excellent “Moving Frostbite to Physically Based Rendering” 1 presentation from SIGGRAPH 2014. In the presentation, they describe the decision to move to a physically based camera to support and validate the use of real-world lighting values.

Implementing a physically based camera provides familiarity to the player and content creators. Cameras function as a system of interdependent variables acting in unison to create the resulting image. If your camera system and post effects do not act coherently, it will strike the player as abnormal in a way that will just seem “off.”

This decision would also allow for working with photometric values and leveraging the years of photographic knowledge and experience widely available.

Exposure

To implement a camera in code it is essential that you have a good understanding of the settings of a camera that affect the exposure of the final image. The exposure of an image describes the amount of light that is captured. There are three things that can influence the exposure of a image: Shutter Speed, Aperture, and ISO. Each of these can be adjusted to allow more or less light in the final image. The combination of these settings is often referred to as the Exposure Value, or EV.2 EV has a $log_{2}$ relationship with the amount of light in the exposed image so an increase of 1 in the EV would represent a doubling of the amount of light being captured ( 1 EV is commonly referred to as 1 stop of light).  All of the camera settings that will be covered in this post can be adjusted up or down by a number of stops of light. If the photographer wishes to allow the same amount of light in but change one of the settings they must compensate by the inverse on another setting to make sure the final EV matches.

Shutter Speed

The shutter speed indicates how long the sensor or film of a camera is actively collecting light. Typically when pressing the shutter button on a film camera three things will happen:

1. The mirror that redirects the image to the view finder will flip up and out of the way.
2. The shutter will open exposing the film.
3. The shutter will close completing the capture.

Here it is in action on an old Konica film camera.

Things change slightly when talking about digital cameras. For instance, the first only exists for standard dslr cameras and not mirrorless cameras. The second step may also  be represented by an electronic shutter instead of a physical one.  This is basically the camera performing a clear on the sensor. The last step for most cameras is the same: the amount of time it takes to read the results back from the sensor requires that no additional light is being collected during that time. Shutter speed,$S$, is typically expressed in fractions of a second (1/60th, 1/200th) but can some times be referred to by just the denominator 60 or 200.

The amount of light in the resulting image is linearly proportional to the time the shutter is open, i.e. if you double the shutter speed from 1/30th to 1/15th of a second you will collect double the amount of light. The effect of increasing the shutter time will also influence how much motion is visible in an image. In games we use motion blur to represent this sub frame motion. If you don’t have motion blur, then you are essentially capturing the scene with an infinitely fast shutter speed. This type of look might not appear too unrealistic in a bright outdoor scene but may appear abnormal in lower light shots where it is hard to have the shutter speed that high.

Aperture

The aperture, commonly referred to as f-number($N$), represents the ratio of the focal length($f$) of the lens to the diameter of the lens opening($D$), $N=\frac{f}{D}$. The larger the opening, the more light enters the camera and is captured by the film or sensor. Since the aperture is assumed round, the amount of light doubles when the diameter of the opening increases by a factor of $\sqrt{2}$.3

The choice of aperture gives the photographer the ability to vary the depth of field of the image. Just to clarify, the depth of field is referring to the portion of the scene that is in focus, while the shapes and blur of the area out of focus is called bokeh.4 Also in focus is completely subjective and is based on viewing a photograph from a normal distance.5 To control the amount of light and depth of field, lenses have manual or electronic diaphragms whose size can be adjusted via an aperture ring. On older and manual lenses, the f-number is labeled on the aperture ring and common values are f/1.4, f/2.0, f/2.8, f/4.0, f/5.6, f/8, f/11, and f/16 (these numbers represent increases of one stop of light).

In games, the aperture has a direct impact on the settings supplied to the depth of field post effect (which we will get into in a later entry). The presence of an aperture diaphragm also influences the aesthetic look of the post process (although many companies are now designing apertures that remain circular at every f-number so one could argue that a circular bokeh effect would actually simulate a much higher quality lens).

This slideshow requires JavaScript.

ISO

The final piece of the image exposure is the ISO or sensitivity of the sensor to incoming light. For digital cameras ISO is actually referring to the standard ISO 12232:2006 published by the *International Organization for Standardization”.6 This standard outlines different techniques for classifying the sensitivity of the sensor at various ratings. Once again the ISO value has a linear relationship with the amount of light in the final image. Typically ISO numbers will begin at 100 and will increase from there by a factor of 2 (100, 200, 400, 800, and so on). On a digital camera you can think of increasing the ISO as increasing the gain of the digital circuit that captures the light. And as with other analog to digital converters, this increase will introduce more noise into the image. In games we are lucky enough that we don’t have this problem and can have an arbitrarily high ISO (although once again you have to be careful not to produce something that the viewer is not accustomed to).

In the next post we will take a look at implementing manual exposure control.