Implementation Notes: Runtime Environment Map Filtering for Image Based Lighting

In this post I will cover implementing and optimizing runtime environment map filtering for image based lighting (IBL). Before we get started, this approach is covered in depth in Real Shading in Unreal Engine 4 and Moving Frostbite to Physically Based Rendering, you should really check them out if you want more background and information than I provide here. Doing runtime filtering of environment maps has some benefits over offline filtered maps (using something like cubemapgen). One is that you can have more dynamic elements in your cubemaps like dynamic weather and time of day. At GDC this year Steve McAuley mentioned in Rendering the World of Far Cry 4 that they perform environment map re-lighting and composite a dynamic sky box prior to filtering. Additionally by filtering at runtime you might be able to use smaller textures (r10g11b10 vs r16b16g16a16) since you can use your current camera exposure information to pre-expose the contents into a more suitable range.

Overview


Image Based Lighting involves using the contents of an image (in this case an enviornment map) as a source of light. In the brute force implementation every pixel is considered a light source and is evaluated using a bidirectional reflectance distribution function (BRDF). The full integral that represents this is:

Integral

Numerical integration of environment lighting

The variant of filtering that we will be implementing is the Split-sum approach taken by Karis. This approach approximates the integral above by integrating the lighting contribution and the rest of the BRDF separately. By doing this the integral can be approximated using a cubemap and a tex2D lookup.

Splitsum Approximation

Split-sum approximation

Initial implementation


We will start with the implementation provided in Real Shading in Unreal 4 since it is well documented and complete, I will be referring to this as the reference implementation. To assist in reducing the number of samples the authors of the mentioned papers utilized importance sampling to redistribute the samples into the important parts of the BRDF. Importance sampling is a very interesting topic so if you want to learn more about the derivation check out Notes on importance sampling. For the BRDF integration we can stop with this implementation since this won’t be integrated every frame. Also, I’m choosing to ignore the extensions that the frostbite guys did to add the Disney Diffuse BRDF term into this texture but this can apply to that too. The environment map filtering is where we want to focus our optimization efforts.

The first step is to render out a new cubemap to filter. For comparison purposes this article uses the cubemap from the unreal presentation, found here. However; this is the point where you could do some cool stuff like re-lighting the environment map like the Far Cry guys do. As a note, some people use a geometry shader here to render to all faces of a cube in a single pass, for this example I just set up the camera for the appropriate face and rendered them out one at a time.

Foreach Face in Cube Faces
   Setup camera
   Render envmap

Next we implement the split-sum reference implementation provided by Karis. Initially the sample count is kept as provided in the course notes but that is going to be an area we focus on when we optimize.

float fTotalWeight = 0.0f;
const uint iNumSamples = 1024;
for (uint i = 0; i < iNumSamples; i++)
{
    float2 vXi = Hammersley(i, iNumSamples);

    float3 vHalf = ImportanceSampleGGX(vXi, fRoughness, vNormal);
    float3 vLight = 2.0f * dot(vView, vHalf) * vHalf - vView;

    float fNdotL = saturate(dot(vNormal, vLight));
    if (fNdotL > 0.0f)
    {
        vPrefilteredColor += EnvironmentCubemapTexture.SampleLevel(LinearSamplerWrap, vLight, 0.0f).rgb * fNdotL;
        fTotalWeight += fNdotL;
    }
}
return vPrefilteredColor / fTotalWeight;

The code side looks something like this.

Foreach Face in Cube Faces
    Setup camera
    Bind Envmap
    For i = 0; i < MipLevels; ++i
        Set roughness for shader
        Execute filtering

To verify that I didn’t screw anything up I like to reproduce and compare at least one of the diagrams.

Comparison with course notes from Real Shading in Unreal Engine 4.

Comparison with course notes from Real Shading in Unreal Engine 4.

There is some slight darkening along the rim of the middle gloss spheres due to my angle of view being slightly off, a difference in exposure, and a different tonemapping operator is used but otherwise we are at a good starting point.

Areas for optimization


I’m not as crazy as some people when it comes to optimization but here are some questions I ask myself when I’m looking at a bit of code like this.

  • What can be done to reduce the amount of work done globally?
  • What can be done to reduce the number of samples?
  • What can be done to reduce the amount of work per sample?
  • How can use any assumptions to our advantage?
  • How can we make sure that we are getting the most out of every sample?
  • What parts of this can scale?

Luckily the Frostbite guys have already looked into a few of the points above so we can pull from the implementation they provide.

Reduce the amount of work done globally

This one is easy and basically just a footnote in the Moving Frostbite to pbr. Since the first mip is essentially our mirror reflection we can just skip processing that mip level all together and just copy the generated environment map into it. Note that we still keep the generated texture around and copy into our final texture, this becomes more important in the next step. The code now looks like:

Foreach Face in Cube Faces
    Setup camera
    Bind Envmap
    Copy face into mip 0
    For i = 1; i < MipLevels; ++i
        Set roughness for shader
        Execute filtering

Reduce the number of samples

So 1024 samples is certainly more than we want to be doing per pixel in our shader. I would say that 16-32 is a more reasonable number of samples to take. Let’s see what happens if we just drop the number of samples in our shader:

32 Samples

32 Samples

It’s obvious as the roughness increases that 32 samples isn’t going to be enough with out some help. This is where filtered importance sampling comes into play. I’m not an expert on Importance Sampling in general but it was pretty easy to follow along with FIS implementations in the Frostbite course notes. The basic idea is that, because of importance sampling, less samples are taken in certain directions due to the distribution of the brdf. To combat the undersampling the mip chain of the source environment map is sampled. The mip is determined by the ratio between the solid angle that the sample represents (OmegaS) and the solid angle that a single texel is at our highest resolution (OmegaP) covers. Implementing this in the example means that the environment map generation gets the second step of generating mips, additionally for each sample we have to compute OmegaS and OmegaP then finally the required mip.

Foreach Face in Cube Faces
   Setup camera
   Render envmap

GenerateMips envmap
float fTotalWeight = 0.0f;
const uint iNumSamples = 32;
for (uint i = 0; i < iNumSamples; i++)
{
    float2 vXi = Hammersley(i, iNumSamples);

    float3 vHalf = ImportanceSampleGGX(vXi, fRoughness, vNormal);
    float3 vLight = 2.0f * dot(vView, vHalf) * vHalf - vView;

    float fNdotL = saturate(dot(vNormal, vLight));
    if (fNdotL > 0.0f)
    {
        // Vectors to evaluate pdf
        float fNdotH = saturate(dot(vNormal, vHalf));
        float fVdotH = saturate(dot(vView, vHalf));

        // Probability Distribution Function
        float fPdf = D_GGX(fNdotH) * fNdotH / (4.0f * fVdotH);

        // Solid angle represented by this sample
        float fOmegaS = 1.0 / (iNumSamples * fPdf);

        // Solid angle covered by 1 pixel with 6 faces that are EnvMapSize X EnvMapSize
        float fOmegaP = 4.0 * fPI / (6.0 * EnvMapSize * EnvMapSize);
        // Original paper suggest biasing the mip to improve the results
        float fMipBias = 1.0f;
        float fMipLevel = max(0.5 * log2(fOmegaS / fOmegaP) + fMipBias, 0.0f);
        vPrefilteredColor += EnvironmentCubemapTexture.SampleLevel(LinearSamplerWrap, vLight, fMipLevel).rgb * fNdotL;
        fTotalWeight += fNdotL;
    }
}
return vPrefilteredColor / fTotalWeight;

This gives us a much better result for our high roughness spheres, although it is still a little chunky.

32 Samples With Filtered Importance Sampling

32 Samples With Filtered Importance Sampling

Precompute as much as possible

The previous change had the unfortunate side effect of adding lots of math to the inner loop. So lets focus the attention there and see if anything can be pre-computed. First, it should be easy to see that we don’t need to be computing the Hammersley sequence for every single texel in the output, they are the same for every pixel. With this knowledge let’s see how the sequence random value is used:

float3 vHalf = ImportanceSampleGGX(vXi, fRoughness, vNormal);

...

float3 ImportanceSampleGGX(float2 vXi, float fRoughness, float3 vNoral)
{
    // Compute the local half vector
    float fA = fRoughness * fRoughness;
    float fPhi = 2.0f * fPI * vXi.x;
    float fCosTheta = sqrt((1.0f - vXi.y) / (1.0f + (fA*fA - 1.0f) * vXi.y));
    float fSinTheta = sqrt(1.0f - fCosTheta * fCosTheta);
    float3 vHalf;
    vHalf.x = fSinTheta * cos(fPhi);
    vHalf.y = fSinTheta * sin(fPhi);
    vHalf.z = fCosTheta;

    // Compute a tangent frame and rotate the half vector to world space
    float3 vUp = abs(vNormal.z) < 0.999f ? float3(0.0f, 0.0f, 1.0f) : float3(1.0f, 0.0f, 0.0f);
    float3 vTangentX = normalize(cross(vUp, vNormal));
    float3 vTangentY = cross(vNormal, vTangentX);
    // Tangent to world space
    return vTangentX * vHalf.x + vTangentY * vHalf.y + vNormal * vHalf.z;
}

This function can be broken up into two distinct parts shown colored. The first part computes a local space half vector based off the sampling function and the roughness. The second part generates a tangent frame for the normal and rotates the half vector into it that frame. Since the first part is a function of the random value (from the hammersley sequence) and the roughness the local space sample directions can be precomputed. It should also be clear now that we are wasting lots of cycles recomputing the tangent frame for every sample. So we can float that outside of the update loop and just to be complete we can put it into a matrix and rotate the incoming local space sample directions into world space (in reality the shader compiler also noticed this and optimized the math out of the loop).

// Compute a matrix to rotate the samples
float3 vTangentY = abs(vNormal.z) < 0.999f ? float3(0.0f, 0.0f, 1.0f) : float3(1.0f, 0.0f, 0.0f);
float3 vTangentX = normalize(cross(vTangentY, vNormal));
vTangentY = cross(vNormal, vTangentX);

float3x3 mTangentToWorld = float3x3(
    vTangentX,
    vTangentY,
    vNormal );

for (int i = 0; i < NumSamples; i++)
{
    float3 vHalf = mul(vSampleDirections[i], mTangentToWorld);
    float3 vLight = 2.0f * dot(vView, vHalf) * vHalf - vView;
    ....

The next bit of code in the shader reflects the view vector around the half vector to determine the representative light direction for the sample. Following the isotropic assumption made in the split-sum technique we can state that view == normal, and further that in local space normal == 0,0,1. So what we can do here is reflect the vector in local space and instead of rotating the half vector to world space, we can rotate the light vector into world space. There is a problem though, later in the code it assumes that we have the dot product of the half vector and the normal vector to compute the mip-level for the sample. We can once again take advantage of the isotropic assumption and realize that the dot product between two vectors in local space is the same as the dot product of those two vectors in some other space. So ndoth = vdoth = dot((0,0,1),half) or ndoth = vdoth = half.z, therefore this can be precompute per sample now too. Inspecting the code again we see that the mip selection math is based entirely on global variables and those two per sample values, so we can pre-compute that too (see a trend?). Moving all of that computation to the cpu and simply uploading it to the shader leaves us with this:

float fTotalWeight = 0.0f;
const uint iNumSamples = 32;
for (uint i = 0; i < iNumSamples; i++)
{
    float3 vLight = mul(vSampleDirections[i], mTangentToWorld);

    float fNdotL = saturate(dot(vNormal, vLight));
    if (fNdotL > 0.0f)
    {
        vPrefilteredColor += EnvironmentCubemapTexture.SampleLevel(LinearSamplerWrap, vLight, fSampleMipLevels[i]).rgb * fNdotL;
         fTotalWeight += fNdotL;
    }
}

We are back to having less math per sample than before we added FIS and we can still do better. There is still the dot product with the light vector sitting there and the conditional. Once again leverage that we have pre-computed the light vector in local space and say that ndotl = dot((0,0,1), light) = light.z. Additionally, we can check if the weight of a sample is going to be 0 before we even upload it, this lets us just skip samples entirely that will fail the check. At this point the inner loop just rotates the the local light vector into world space and does a madd to accumulate the sample.

for (uint i = 0; i < iNumSamples; i++)
{ 
    float3 vLight = mul(vSampleDirections[i], mTangentToWorld);
    vPrefilteredColor += EnvironmentCubemapTexture.SampleLevel(LinearSamplerWrap, vLight, fSampleMipLevels[i]).rgb * fSampleWeights[i];
}

return vPrefilteredColor * fInvTotalWeight;
Comparison between original and the final optimized version.

Comparison between original and the final optimized version.

Here are some timings for the various approaches and stages of optimizing presented, take them with a grain of salt since getting numbers on PC, especially laptops, is not super accurate.

Device Time(ms) Size Approach
Iris 6100 54.0 256×256 1024 Samples Reference (UE4)
Iris 6100 2.6 256×256 32 Samples FIS Original (Frostbite)
Iris 6100 1.5 256×256 32 Samples FIS Optimized
Iris 6100 1.2 256×256 Varying Sample Count
Iris 6100 0.5 128×128 Varying Sample Count
GTX 980 9.4 256×256 1024 Samples Reference (UE4)
GTX 980 0.48 256×256 32 Samples FIS Original (Frostbite)
GTX 980 0.36 256×256 32 Samples FIS Optimized
GTX 980 0.36 256×256 Varying Sample Count
GTX 980 0.16 128×128 Varying Sample Count

Improvements


 Ensure certain number of samples have weight > 0

At high roughness values nearly half of the samples produced by the Hammersley sequence result in a light vector that fails the N dot L check. I noticed a nice improvement by just doing a simple linear search to find how many sequence numbers are needed to generate the right number of valid samples.

Comparison of 32 samples where some have weight 0 (top) and 32 samples where all of them have a weight > 0 (bottom)

Varying the sample count per mip level

This is more or less an extension of the Frostbite statement that the first mip should represent a mirror reflection and therefore require only one sample. If we extend that to the entire mip chain we can simultaneously reduce the number of samples at low roughness values (when we are processing larger cube faces) and increase the number of samples at high roughness values (when the faces are smaller). In theory this will reduce the number of samples needed and result in an overall noise reduction.

Varying sample count on a 256x256 prefiltered cubemap. From mip 0->5: 1, 8, 32, 64, 128, 128

Varying sample count on a 256×256 prefiltered cubemap. From mip 0->5: 1, 8, 32, 64, 128, 128

This did result in a nice quality improvement, and dropped the total number of samples taken from the environment map from ~400k to ~200k but I didn’t notice a large performance increase, in fact in some cases I saw the opposite. This is likely due to the small number of pixels requiring lots of work as you get down the mip chain and that increasing the sample count reduces the effectiveness of filtered importance sampling. This approach is still worth investigating though and may be bound by the fact that I render each face individually.

Primary direction lookup

This is straight from the frostbite presentation (See the course notes for details). Basically you modify the reflection vector based on the roughness when retrieving the values from both the environment map and the brdf texture. This makes it look quite different from the initial implementation but if we look back at the Unreal course notes we can see that it looks more like the full reference implementation:

Comparison between the Initial Reference for this article (top), the Reflection Adjusment proposed by the Frostbite team (middle), and the Original Reference in the Real Shading course notes.

Comparison between the Initial Reference for this article (top), the Reflection Adjusment proposed by the Frostbite team (middle), and the Original Reference in the Real Shading course notes (bottom).

float3 vReflection = 2.0f * vNormal * dot(vViewDirection, vNormal) - vViewDirection;
float fA = fRoughness * fRoughness;
vReflection = lerp(vNormal, vReflection, (1.0f - fA) * (sqrt(1.0f - fA) + fA));

Ideas


These are things that I haven’t really fleshed out yet but could potentially be a thing.

Different distribution for filtering

Up to this point we have been using the GGX distribution for the importance sampling but the we are weighting the samples by their respective N dot L terms. This leads to the workarounds in the improvement section to ignore samples that were going to fail the second tier of weighting. It seems that a new probablity distribution could be created that factors in the N dot L weighting so we don’t have to modify and renormalize the distribution later. I briefly fooled around with finding the analytical version of that distribution but I’m not a mathematica guy and ran out of time. Only after did I realize that you could do this with out having an analytical probability distribution function, and instead you can just numerically integration for a fixed roughness value and a set of psuedo-random variables. I plan on investigating this soon.

Different quasi-random sampling set

After doing some research it became apparent that the Hammersley set doesn’t do the best job when you have low sample counts. An alternative sequence may yield lower noise for the same number of samples, here is a potential one that Pixar recently published: Correlated Multi-Jittered Sampling. I tried this out and the results didn’t look exceptionally different, the one thing that did stand out is that you can pass a seed value in to generate any number of 32 sample sequences which might be good if you want to do any sort of progressive filtering of the environment map.

Progressive filtering

Since you know the total weights in the frame it would be easy to pass that data forward to the next frame and combine the results. To do this with the Hammersley sequence you would want to add a rotation into the matrix that rotates the light from local to world space, then you would just filter as usual and add the total weight and result times the total weight from the previous frame. When it comes time to invalidate the environment map you can just start clear the total weight. This may help wrangle the last bit of performance needed for previous gen platforms. Additionally you can track when you get a certain total weight and stop processing all together until you know that the contents have changed, for instance the time of day progresses past a certain threshold.

Conclusion

Hopefully this article was useful to someone out there. It would be great to hear from people if they do other things to optimize this process or have tried any of the things that I proposed.

Resources


Light probe used in this article from http://gl.ict.usc.edu/Data/HighResProbes/

Correlated Multi-Jittered Sampling: http://graphics.pixar.com/library/MultiJitteredSampling/paper.pdf

Hammersley Points on the Hemisphere: http://holger.dammertz.org/stuff/notes_HammersleyOnHemisphere.html

Real Shading in Unreal Engine 4: http://blog.selfshadow.com/publications/s2013-shading-course/karis/s2013_pbs_epic_notes_v2.pdf

Moving Frostbite To PBR: http://www.frostbite.com/2014/11/moving-frostbite-to-pbr/

Notes on Important Sampling: http://blog.tobias-franke.eu/2014/03/30/notes_on_importance_sampling.html

GPU Based Importance Sampling: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch20.html

Rendering the world of Far Cry 4: http://www.gdcvault.com/play/1022235/Rendering-the-World-of-Far

Physically Based Shading at Disney: https://disney-animation.s3.amazonaws.com/library/s2012_pbs_disney_brdf_notes_v2.pdf

Advertisements

Implementation Notes: Physically Based Lens Flares

The Implementation Notes series of blog posts is an attempt to outline the process of implementing a research paper or technique from a presentation. I find that more often than not when trying to recreate others results that not all the dots are connected. It takes some time and digging to get the whole picture, this is my attempt at sharing that background work.

In this edition of Implementation Notes (actually the first edition) we will be looking at two papers relating to rendering lens flares.
Physically-Based Real-Time Lens Flare Rendering1
Practical Real-Time Lens-Flare Rendering2
Some of the foundation work from the first paper will be presented and then an implementation based on the second paper will be outlined.

Background

Cameras are comprised of many complex systems, each of which can introduce visual artifacts. If these visual artifacts are not modeled to a certain extent in our rendered images then they will produce suspiciously clean results and look artificial. However, if we introduce some of these artifacts then the images can appear more natural to the viewer. One such artifact is internal reflections, sometimes referred to as lens flares. These two papers explore modeling lens flares as the result of light reflecting off the surfaces of the optics inside of the lens, hence the “Physically-Based”.

Lens flares from an old Vivitar zoom lens

Lens flares from an old Vivitar zoom lens

Lens flares from an old Vivitar zoom lens

Lens flares from an old Vivitar zoom lens

The lens configuration inside most lenses is surprisingly complex and can consist of dozens of individual lens elements. As the light from the scene passes through each optical interface a portion is reflected and the rest is refracted. The reflected light then moves backwards through the optical system striking other optical interfaces along the way, the process then repeats. For every even number of bounces that occurs the reflected light is then directed back towards the image sensor; however, the light rays are no longer focused. This light represents a single lens flare, or ghost as referred to by the papers.

Optical Systems

This technique is heavily based off on the representation of the lens elements in an Optical System.  This referrers to the combination of lenses and empty spaces designed in such a way that light is focused on to an image plane. We begin with the assumption that the image plane I represents standard film (24x36mm) with an image circle diagonal of 43.3mm.

Example Optical System (USPN5835272)

The optical axis of the system is the axis that runs through the center of the lens elements and intersects the center of the image plane. Lens elements are arranged along the optical axis either directly next to each other or separated by a small amount of air. The standard description of an optical system is a table of data that defines the interfaces between air and lens elements. The data looks something like this:

# r d n
1 30.810 7.700 1.652
2 -89.350 1.850 1.603

Here r is the radius of curvature for the interface, d is the distance to the next interface, and n is the refractive index of the lens element. The radius can be positive, indicating a convex surface, negative, indicating a concave surface, or 0, indicating a flat surface. Using this definition of an optical system allows us to use existing optical system data. The best place to find optical system descriptions is the patent database, be warned though I don’t know the legality of using the optical design data to reproduce the lens digitally.

When reading a lens patent there are a couple things to note. The first is that there is a lot of information in the patent! Most of it you don’t need and shouldn’t care about. The main thing you are looking for is the tabular data describing the lens system. In the table there is usually a header that describes the focal length of the lens and the f-number. This represents the field of view of the lens and essentially how much light the lens can let in respectfully. If you are a real ambitious go-getter you can use this to drive the settings on your camera model.

It took me a while to find a zoom lenses that had a focal length that represented a normal game field of view (~70 degrees) and had a low f-number, meaning that the lens allows in a lot of light. So I can save you the pain patent searching and point you at a couple options:

Nikon 28-75mm f/2.8 (US5835272)
Canon 28-80 f/2.8 (US5576890)
Canon 36-135mm f/3.5-f/4.6 (USPN4629294)

I have found that there is some information missing when reading the patent, namely the heights of the various elements. Luckily the height that matters the most is the aperture stop and this can be calculated using the effective aperture diameter (D)3. The effective aperture represents what the physical aperture looks like if you were looking through the front of the lens. Since the elements in front of the physical aperture often have a magnification property it will differ from the physical aperture stop. The formula for this is based only on the focal length, f and the f-number N:

N=\frac{f}{D}

So for the Nikon zoom lens we can determine that at a focal length of 28mm and an f-number of roughly 2.8 that the effective aperture diameter should be 10mm. Once we are able to cast rays through the optical system this will be used to determine the size of the physical aperture.

Quick note: On more modern lenses (1980s and onward) there are often aspherical lens elements in the optical system. Since the intent here is not to rely on the precise qualities of the optical system I chose to ignore these elements and instead treat them as spherical.

Ray tracing an Optical System

The first step towards modeling lens flares is to establish the optical system we will be evaluating. This can be validated by simply drawing (in 2D) the profile of the optical system. To perform the drawing you simply walk over the elements of the optical system, drawing each and then increment the draw cursor down the optical axis by the d value for that element. Each interface is drawn as an angular patch using the radius of curvature (r) and angle \pm a given by a = asin(\frac{h}{r}). For the first pass I just took a guess at the various element heights making sure they didn’t intersect with the next elements. As mentioned before, we only care about the height of the aperture which will be computed. Here is what it the result should look like:

In Engine Optical System Drawing

In Engine Drawing of the optical system. Red – Optical Axis, Green – Aperture Stop, Blue – Optical Interfaces defined by the tabular patent data.

Engine Optical System Overlay

Overlay of in engine representation and the patent diagram.

Next we will trace a single ray through the optical system considering only refraction, the pseudo code for this is:

OpticalInterface i = first;
while (i)
{
   col = intersect(ray, i);
   if (!col.valid)
      break;
   
   ray.pos += ray.dir * col.dist;
   ray.dir = refract(ray.dir, col.norm, i.n1/i.n2);
   i = i.next;
}

The result should look something like this:

Single Ray

Single ray making it’s way through the optical system

Be mindful that there can be elements with radius 0, in these cases you should just do a ray plane intersection. Once we have a single ray going through we can cast a bunch of rays to see if they all converge on the focal plane. It is a good time to mention that the patents are specified assuming that you are focusing on an object at infinity. This means that all rays entering the lens are travelling in parallel.

Multiple Rays

Rays focusing at the image plane

As you can see, the rays that make it through the optical system are focusing to a single point. The reason many rays are terminated prior to reaching the surface is that the effective aperture is much smaller than the first element. Recalling that we can compute the effective aperture by using the formula N=\frac{f}{D}, and that for the Nikon lens we found the D \approx 10mm. Now if we cast rays 5mm above and 5mm below the optical axis we can find determine the height of the aperture (green line) as the height of the ray when it passes through the aperture stop, in this case ~22mm diameter.

Effective Pupil

Rays cast at the edge of the effective aperture to compute the physical aperture size.

To further validate the optical system, we can sweep rays through the full FOV range specified by the patent4. Remember it isn’t going to be perfect since we ignore aspherical elements but it should be relatively close. On a side note, this an interesting experiment because you can see that fewer rays make it through at steeper angles causing Vignetting5.

FOV Directional Rays

Rays from a direction focusing on the focal plane (from the canon zoom lens)

Using this as the final test we can consider the optical system validated. If you have something horribly wrong with it you would notice it at this point. From here we can start to trace the actual flares.

Flares/Ghosts

The first thing we need to do related to rendering flares is to enumerate the different paths that can end up back at the image plane. Since we lose a significant amount of intensity each time a ray reflects at an optical surface we will follow the papers and only consider rays which have bounced exactly twice. An additional consideration we can make is that a significant amount of light is lost when the light passes through the aperture stop. To factor this into the enumeration process we can say that we only care about light that passes the aperture stop exactly one time and bounces exactly two times. Another way of thinking about this is that we are only considering paths of light that bounce both times before the aperture or after the aperture. The pseudo code for enumerating the flares/ghosts is below:

a = getApertureIndex()
n = getNumElements()
for (i = 0; i < a; ++i)
   for (j = i + 1; j < a; ++j)
       OpticalGhost ghost;
       ghost._reflections[0] = j;
       ghost._reflections[1] = i;
       ghosts.push_back(ghost);

for (i = a + 1; i < n; ++i)
    for (int j = i + 1; j < n; ++j)
       OpticalGhost ghost;
       ghost._reflections[0] = j;
       ghost._reflections[1] = i;
       ghosts.push_back(ghost);

Now that we have a list of all possible ghosts in the system we need to modify the ray tracing to account for reflecting at a certain interface. I’ll leave that as an exercise for the reader since it is fairly straight forward. The result of tracing a single ghost should look like this:

This result suggests that we will have a flare at the center of the screen and it will be clipped by the aperture plane (thus aperture shaped). This will be the end of the ray tracing approach presented by the first paper but we will use it to confirm an alternative approach, Matrix Method.

Matrix Method

The matrix method is a general optics technique that approximates the effects of an optical system as a linear system. Since each optical interface is represented as a linear system the results can be concatenated into a single matrix describing the entire system. At the core approach of this is the paraxial approximation6. This approximation states that for small angles (< 10 degrees) of \theta:

\sin \theta \approx \theta
\tan \theta \approx \theta
\cos \theta \approx 1

Using this approximation we can represent the transforms of our optical system, namely translation(T), reflection(L), and refraction(R), as 2×2 matrices functioning on the height and angle of a ray: \left[ \begin{array}{cc} H_{0}\\\theta_{0}\end{array} \right]. For full derivations of these matrices see Matrix Methods in Paraxial Optics.

T = \left[ \begin{array}{cc} 1&d \\ 0&1\end{array} \right]

R_{curved} = \left[ \begin{array}{cc} 1&0 \\ \frac{n_1 - n_2}{Rn_2}&\frac{n_1}{n_2}\end{array} \right]

R_{flat} = \left[ \begin{array}{cc} 1&0 \\ 0&\frac{n_1}{n_2}\end{array} \right]

L_{curved} = \left[ \begin{array}{cc} 1&0 \\ \frac{2}{R}&1\end{array} \right]

L_{flat} = \left[ \begin{array}{cc} 1&0 \\ 0&1\end{array} \right]

So now instead of tracing N rays for each ghost we will simply concatenate the matrices that represent each optical interface and project a flare quad. Generally there will be pairs of transforms that translate to then refract by the interface, since we are talking about matrices the order is reverse R_n T_n. When we hit the interface that we should reflect at we multiply by L_{n} and then as we progress backwards through the system we apply the inverse of the above pairs, T_n R_{n}^{-1} (Translation is unaffected by the change in direction), until we hit the second reflection L_{n}^{-1}, at which point you resume concatenating the remaining pairs using the original R_n T_n. A simple example would be: M = R_2 T_2 R_1 T_1 L_{0}^{-1} T_1 L_1 T_1 R_0 T_0 representing a ray entering a 3 interface system translating to and refracting through the first element (R_0 T_0), translating and reflecting off the second (L_1 T_1), traveling back through the second (T_1), reflecting off the first and then propagating out of the system (R_2 T_2 R_1 T_1 L_{0}^{-1}).

Important! The inverse in this case doesn’t refer to the matrix inverse but rather the operation in the reverse direction i.e. R(n_1, n_2, r)^{-1} = R(n_2, n_1, -r), L(r)^{-1} = L(-r), and T(d)^{-1} = T(d). This isn’t mentioned in the original paper and if you use the standard matrix inverse the results will be quite a bit off.

While the matrix method (yellow) isn’t as accurate as ray tracing (white) it is pretty close in most situations we care about.

Placing the ghosts

Using the matrices that describe the system, we generate a flare quad for the final flare rendering. To do this we actually need 2 matrices, the first is the matrix that describes the optical system up to the aperture stop (M_a), the second describes the optical system from the aperture stop to the sensor plane (M_s). M_a will be used to find the entrance pupil of the ghost and M_s will be used to find the projected aperture. If you recall from earlier, the entrance pupil is the aperture as seen from the front of the lens following the path of light that we care about. Essentially we want solve for the height of the entrance pupil H_{e_1} and H_{e_2} using the direction of the light \theta_e, the height of the aperture h_a, and the optical system matrix of the lens prior to the aperture M_a.

\left[ \begin{array}{cc} H_{a_1} \\ \theta_{a_1}\end{array} \right] = \left[ \begin{array}{cc} H_{e_1} \\ \theta_{e}\end{array} \right] \times M_{a}

Disregarding \theta_{a_1} since we only care about the height of the entrance pupil we can expand the matrix multiply:

H_{a_1} = H_{e_1} M_{a00} + \theta_{e} M_{a10} (column major)

solving for H_{e_1}:

H_{e_1} = \frac{H_{a_1} - \theta_{e} M_{a10}}{M_{a00}}

Using this equation and plugging in \pm H_{a} we can find tightly fitting bounds for the aperture. These bounds are then projected through the entire optical system (M_s \times M_a) to find the final flare quad position on the sensor. As a final step the positions are projected on the sensor’s image circle, which for 35mm film is \approx 43.3mm diameter, to get the final normalized coordinates and size.

These values represent offets along the vector from the center of the screen to the light position (in my case the sun). I found the screen space position of the sun by projecting a point a far distance from the camera in the direction of the sun although I’m sure there are more accurate ways. The starting angle \theta_e is simply \theta_e = \cos^{-1} (E \cdot L) where E is the forward vector of the camera in world space and L is the direction of the sun. The resulting pseudo code looks like this:

Theta_e = acosf(dot(E, L));
Axis = Project(CameraPos + L * 10000.0).Normalize();
H_a = GetApertureHeight();
foreach ghost
    // Aperture projected onto the entrance
    H_e1 = (H_a - ghost.M_a[1][0] * Theta_e) / ghost.M_a[0][0];
    H_e2 = (-H_a - ghost.M_a[1][0] * Theta_e) / ghost.M_a[0][0];

    // Flare quad bounds projected through the system
    H_p1 = ((ghost.M_s * ghost.M_a) * Vector2(H_e1, Theta_e)).x;
    H_p2 = ((ghost.M_s * ghost.M_a) * Vector2(H_e2, Theta_e)).x;

    // Project on to image circle
    H_p1 /= 21.65;
    H_p2 /= 21.65;

    Center = (H_p1 + H_p2) / 2;
    Radius = abs(H_p1 - H_p2) / 2;

    Transform = CreateScale(Radius, Radius * AspectRatio());
    Transform *= CreateTranslation(Axis * Center);

    RenderFlare(Transform);

Intensity and Color

Once we have the placement of the ghost we need to determine what the intensity and color should be. I chose to deviate from the papers slightly for both of these values. For the intensity I chose to make the intensity relative to that of the optical system’s effective aperture. I calculate it by determining M_a for the optical system with out introducing any reflections. Then I calculate the projected size of the aperture with \theta_e = 0, since it is symmetrical you only need to do this once:

e = \pm \frac{H_{a}}{M_{a00}}

Then intensity, I, is initialized to the ratio between surface area of the ghost’s effective pupil vs the surface area of the system’s effective pupil. The intensity is then scaled by the final projected area of the flare quad. The final intensity represents the relative intensity compared to an in-focus pixel so you would multiply the light color by it to get your final pre-exposed color.

entrancePupil = H_a / system.M_a[0][0];
Intensity = Sqr(H_e1 - H_e2) / Sqr(2 * entrancePupil);
Intensity /= Sqr(2 * Radius);
Comparison of flare intensity at f/8.0 and f/16.0. The perceived brightness of the flares increase at the higher f-number at the same exposure even though the aperture is smaller. This is represented by the ratio of flare/entrance pupil.

Comparison of flare intensity at f/8.0 and f/16.0. The perceived brightness of the flares increase at the higher f-number at the same exposure even though the aperture is smaller. This is represented by the ratio of flare/entrance pupil.

With the intensity calculated the color must be determined. Once again I deviated slightly from the papers to do this. The main contributor to color for a flare is the anti-reflective coating on the lens. These coatings are designed to create an interference pattern and cancel out light that is reflected at each interface similar to noise cancelling headphones. The shift in color occurs because the coatings are designed to minimize reflections at a specific wavelength, generally a mid length wavelength like 500nm, this results in non-uniform reflection across the various wavelengths. I found the best explanation and reference here, this is the same as is presented in the papers. I deviate from the authors because rather than trace a single ray to determine the color, I chose to evaluate the reflectance at random angle per ghost (once for red, green, and blue). This produces a stable color per flare and makes it easier to ensure that the angle doesn’t exceed the critical angle of the system, resulting in invalid reflectance values.

// http://www.gen-opt.com/lamda_admin/Lamda_Edit/UploadFile/2011222112937198.pdf
// lambda	- Wavelength of light being tested for reflectance
// d		- Thickness of the anti-reflectance coating (lambda0 / 4.0f / n1) where
//				lambda0 is the wavelenght of light the coating was designed for, typically
//				a midrange wavelenght like green (550 nm)
// theta1	- Angle of incidence at the edge of the coating
// n1		- The Index of Refraction for the incoming material
// n2		- The Index of Refraction for the coating, max(sqrt(n1 * n2), 1.38) where 1.38
//				is the index of refraction for a common magnesium floride coating.
// n3		- The Index of Refraction for the outgoing material
static float Reflectance(float lambda, float d, float theta1, float n1, float n2, float n3)
{
	// Apply Snell's law to get the other angles
	float theta2 = asinf(n1 * sinf(theta1) / n2);
	float theta3 = asinf(n1 * sinf(theta1) / n3);

	float cos1 = cosf(theta1);
	float cos2 = cosf(theta2);
	float cos3 = cosf(theta3);

	float beta = (2.0f * MATH_PI) / lambda * n2 * d * cos2;

	// Compute the fresnel terms for the first and second interfaces for both s and p polarized
	// light
	float r12p = (n2 * cos1 - n1 * cos2) / (n2 * cos1 + n1 * cos2);
	float r12p2 = r12p * r12p;

	float r23p = (n3 * cos2 - n2 * cos3) / (n3 * cos2 + n2 * cos3);
	float r23p2 = r23p * r23p;

	float rp = (r12p2 + r23p2 + 2.0f * r12p * r23p * cosf(2.0f * beta)) /
		(1.0f + r12p2 * r23p2 + 2.0f * r12p * r23p * cosf(2.0f * beta));

	float r12s = (n1 * cos1 - n2 * cos2) / (n1 * cos1 + n2 * cos2);
	float r12s2 = r12s * r12s;

	float r23s = (n2 * cos2 - n3 * cos3) / (n2 * cos2 + n3 * cos3);
	float r23s2 = r23s * r23s;

	float rs = (r12s2 + r23s2 + 2.0f * r12s * r23s * cosf(2.0f * beta)) /
		(1.0f + r12s2 * r23s2 + 2.0f * r12s * r23s * cosf(2.0f * beta));

	return (rs + rp) * 0.5f;
}

Anti-reflection coatings is one of the areas where lens making has made significant strides in the past 20 years. Below is a comparison between a lens from the 80s and a lens from today.

Top: 1980's Vivitar Zoom Boom: Sony FE 28-70 One of the major differences between legacy and modern lenses is the quality and characteristics of anti-reflection coatings. Both images were taken with similar settings.

Top: 1980’s Vivitar Zoom
Boom: 2014 Sony FE 28-70
One of the major differences between legacy and modern lenses is the quality and characteristics of anti-reflection coatings. Both images were taken with similar settings.

Flare Texture

Now that we have a color, intensity, and placement for the flare we need to generate a texture. This would be an ideal place to hand it over to an artist to whip something up but I don’t have that luxury at the moment so we will just have to make due. The flare texture should be a projection of the aperture’s shape. On most lenses this would be an n-gon, where n is the number of aperture blades. On higher end lenses a lot of effort goes in to maintaining a circular aperture shape at all f-numbers. The simplest way to do either of these is to generate a small distance field texture, 64×64 in my implementation. If you simply want a circular aperture then the distance is just:

distance = length(In.Texcoord * 2.0f - 1.0f);

If you want to have an n-gon then you can calculate the distance as the max distance of the coordinate dotted with each axis, where the axis is a vector that is perpendicular to the aperture edge:

distance = 0
coord = In.Texcoord * 2.0f - 1.0f;
foreach axis
    distance = max(distance, dot(axis, coord));

This is the most flexible representation but you could also exploit symmetry to execute less dot products. For example, an octagon can be represented with 1 dot and 2 max operators instead of 8 dot products and 8 max operators.

 coord = abs(Texcoord * 2.0f - 1.0f);
 distance = max(coord.x, coord.y);
 distance = max(distance, dot(coord, float2(0.70710678118f, 0.70710678118f)));

Smoothing the result

Aperture Smoothing

Application of smoothing removes the peaks at the corners in the faded version of the aperture. k = 0.1 for the right hand side.

Directly using this shape will give acceptable results but when applying a falloff based on the distance field some peaks at aperture edges can be seen. To eliminate these I simply applied the smooth max operator, adapted from Inigo Quilez’s Smooth Min article and the identity max(a,b) = -min(-a, -b).

// polynomial smooth min
// from: http://www.iquilezles.org/www/articles/smin/smin.htm
float smin(float a, float b, float k)
{
	float diff = b - a;
	float h = saturate(0.5 + 0.5 * diff / k);
	return b - h * (diff + k * (1.0f - h));
}

float smax(float a, float b, float k)
{
	float diff = a - b;
	float h = saturate(0.5 + 0.5 * diff / k);
	return b + h * (diff + k * (1.0f - h));
}

Using this technique you can get lots of different source aperture shapes.

Aperture Shapes

Various aperture shapes obtained using the smoothed distance field approach

Entrance Clipping

To get the final shape of the flare we need to account for the case when our effective pupil lies outside the physical entrance heights. In this case we should be clipping the aperture shape with the entrance shape. Incorporating this effect is relatively simple and can break up the otherwise repeating effect. Pseudo code follows:

...
H_e = GetEntranceHeight();
foreach ghost
   ...
   // Entrance projected on to the aperture
   E_a1 = (ghost->_ma * Vector2(H_e, thetaE)).x;
   E_a2 = (ghost->_ma * Vector2(-H_e, thetaE)).x;

   // Move the projection into aperture space as a position and radius
   float C_e = ((e_a1 + e_a2) * 0.5) / H_a - Center;
   float R_e = (abs(E_a1 - E_a2) * 0.5) / H_a;

These two parameters are uploaded along with the screen space light direction and are combined with the distance field using the same smooth max approach as before:

 // Sample the distance field
 float fDistance = FlareTexture.SampleLevel(LinearSampler, Texcoord, 0.0f).x;

 // Clip against the entrance in aperture space
 float2 coord = Texcoord * 2.0f - 1.0f;
 fDistance = smax(fDistance, length(coord - vEntranceCenterRadius.x * vViewspaceDirection) / vEntranceCenterRadius.y);

 // Finally fade the edge to get the final aperture mask
 float fApertureMask = smoothstep(0.95, 0.80, fDistance);
Effects of entrance clipping on a single flare.

Effects of entrance clipping on a single flare.

 Final Results

This brings us to the end of what I decided to implement from the papers. Here are some final results using the info presented in the post.

This slideshow requires JavaScript.

Optimizations

The authors present some recommendations for performance such as culling flares whose intensity falls below a certain value or only rendering the top 40% flares. These ideas could be extended with rendering out the flares at multiple resolutions to avoid the fill rate cost of the larger flares that cover a majority of the screen.On the CPU side the system matrices only need to be computed once so there is no need to precompute their values.

Conclusion

The papers present an interesting method for representing a “Physically Based” lens flare model. If I were to implement it in a production engine I would leave the flare shape and colors as artist configurable and simply implement the flare placement and entrance clipping. The method outlined in this post has it’s limitations since it’s still representing the entire optical system with a linear approximation. Perhaps an interesting approach would be to apply this method to a majority of the flares in a system and then use the more expensive ray-tracing technique proposed by the first paper to represent complex flares. Well, I hope you enjoyed the first installment of Implementation Notes. If you liked it let me know and if you do this type of thing in your own time you should write about it!