# Image Based Lighting

Image Based Lighting is used to implement ambient lighting for dynamic objects coming from static objects in a game level. In most cases, it is used for specular lighting or reflections. In this process lighting information at the potential point of interests is stored in a special type of cube maps called Light probes. These Light probes are created using environment maps captured at same locations, which are then blurred in some special way depending on the BRDF [1] that will consume them at runtime. Each mipmap of a light probe contains a version of environment map blurred by a different amount depending roughness represented by that level. For example, if we are using 10 mip map levels to represent roughness from 0 to 1, then mip-0 will be blurred by a value representing roughness 0, mip-1 will represent 0.1 and so on. The last mip level mip-10 will be blurred by a value represent the roughness of 1.0. This process is also called Cubemap Convolution [4][5]. All of this is done as a pre-process and resulting light probes are fetched at runtime to enable reflections or Image Based Lighting.

There are 2 types of light probes –

- Global Light probes – These represent the lighting information coming from objects at Infinite distances. These are created using global environment maps or skyboxes. Usually, we will have one of these for any given game level/scene. This can be computed as a pre-process since skyboxes are mostly fixed.
- Local Light probes – These are captured at different points of interests in any given scene. They capture the lighting information from nearby objects and they have a location and size or area of influence. There will be many of these in any game level/scene.

So the whole process can be summed up like this –

- Capture environment at some location in the scene (or use skybox in case of global light probe).
- Create Light probe by blurring the captured environment map – Cubemap Convolution
- Fetch the reflection values from light probes based on roughness and reflection vector.
- In case of multiple local light probes find the light probes affecting any particular shading point.
- Then either blend between multiple light probes or select one of them to shade the pixel.
- Add the calculated value as part of ambient specular term (also diffuse in case of skybox/ global light probes).

## Cubemap Convolution

The goal of rendering or shading any particular pixel (or point 3d space) is to compute all the light from environment received at that point and reflected towards the camera. In other words solving this integral over hemisphere –

In image based lighting, the incoming light is represented by an environment where each texel represents an incoming light direction. Cubemap convolution is the process is the process of solving this equation for all the directions represented by texels in output light probe cube map as a pre-process. Each texel in output light probe map represents a viewing direction and we calculate the light incoming from all the possible directions from input environment map. Then at runtime we use reflection vector (or normal vector in the case of diffuse) to index into the generated light probe cube map. We can also do it real-time by doing our calculation for actual viewing direction, but we don’t generally use the real-time version since we have to do this multiple times for each light probe and general game scenes have many light probes making this process not feasible for real-time in actual projects.

Now if we solve this integral even for one pixel or output direction we have to do it over the whole hemisphere which involves solving our brdf equation with 1000 of texels fetched from input environment map which will make this process very slow even for pre-processing. So to solve this problem we use a process called Importance Sampling [] which gives us good results even with few samples. In importance sampling, we generate a fixed number of random samples biased towards the direction that will have the most influence on current shading point & view direction. Those interested in details of the whole process including the mathematics involved can check [4] [5].

This process which depends on the actual BRDF [1] [3] being used in the engine which is smith based GGX specular BRDF[2] in case of my engine. So generally lighting / shading equation looks something like this –

For this brdf we divide the equation into 2 parts which are pre-computed separately, this process is called split-sum approximation. Check out Epic’s notes [10] or Call of Duty presentation [9] for more details of the process and why we split the brdf into 2 parts.

The first part is computed for different roughness values and stored in mipmap of light probes. This is the cube map convolution part of the whole process. Since we are using microfacet based brdf the distribution of specular highlights depends on the viewing angle but for this approximation we assume viewing angle is zero N=V=R and run the following code/ process for every direction/texel of the output cube map. I am using compute shader and whole cube map including all mip levels is processed in a single compute shader pass.

Here’s the code that I am using –

float3 ImportanceSampleGGX(float2 xi, float roughness, float3 N) { float alpha2 = roughness * roughness * roughness * roughness; float phi = 2.0f * CH_PI * xi.x; float cosTheta = sqrt( (1.0f - xi.y) / (1.0f + (alpha2 - 1.0f) * xi.y )); float sinTheta = sqrt( 1.0f - cosTheta*cosTheta ); float3 h; h.x = sinTheta * cos( phi ); h.y = sinTheta * sin( phi ); h.z = cosTheta; float3 up = abs(N.z) < 0.999 ? float3(0,0,1) : float3(1,0,0); float3 tangentX = normalize( cross( up, N ) ); float3 tangentY = cross( N, tangentX ); return (tangentX * h.x + tangentY * h.y + N * h.z); } //This is called for each output direction / texel in output cubemap float3 PreFilterEnvMap(TextureCube envMap, sampler samEnv , float roughness, float3 R) { float3 res = (float3)0.0f; float totalWeight = 0.0f; float3 normal = normalize(R); float3 toEye = normal; //roughness = max(0.02f,roughness); static const uint NUM_SAMPLES = 512; for(uint i=0;i<NUM_SAMPLES;++i) { float2 xi = hammersley_seq(i, NUM_SAMPLES); float3 halfway = ImportanceSampleGGX(xi,roughness,normal); float3 lightVec = 2.0f * dot( toEye,halfway ) * halfway - toEye; float NdotL = saturate ( dot( normal, lightVec ) ) ; //float NdotV = saturate ( dot( normal, toEye ) ) ; float NdotH = saturate ( dot( normal, halfway ) ) ; float HdotV = saturate ( dot( halfway, toEye ) ) ; if( NdotL > 0 ) { float D = DFactor(roughness,NdotH); float pdf = (D * NdotH / (4 * HdotV)) + 0.0001f ; float saTexel = 4.0f * CH_PI / (6.0f * CONV_SPEC_TEX_WIDTH * CONV_SPEC_TEX_WIDTH); float saSample = 1.0f / (NUM_SAMPLES * pdf + 0.00001f); float mipLevel = roughness == 0.0f ? 0.0f : 0.5f * log2( saSample / saTexel ) ; res += envMap.SampleLevel( samEnv, lightVec, mipLevel ).rgb *NdotL; totalWeight += NdotL; } } return res / max(totalWeight,0.001f); }

The second part contains the rest of the equation and can be thought of as integrating specular brdf for a white environment. There are 2 ways of doing this part of the process either calculate this using some analytical approximation [9] or create a lookup texture as part of pre-processing. I have used the second method in my engine in which we have to basically solve the following integral for all the values of roughness and cosθv. All the inputs and out values will vary b/w [0,1]. For more details check [10].

Code I am using –

float2 IntegrateEnvBRDF(float roughness, float NdotV) { float2 res = (float2)0.0f; //roughness = max(0.02f,roughness); float3 toEye = float3( sqrt(1.0f - NdotV*NdotV), 0.0f, NdotV ); float3 normal = float3(0.0f, 0.0f, 1.0f); static const uint NUM_SAMPLES = 1024; for(uint i=0;i<NUM_SAMPLES;++i) { float2 xi = hammersley_seq(i, NUM_SAMPLES); float3 halfway = ImportanceSampleGGX(xi,roughness,normal); float3 lightVec = 2.0f * dot( toEye,halfway ) * halfway - toEye; float NdotL = saturate ( lightVec.z ) ; float NdotH = saturate ( halfway.z ) ; float HdotV = saturate ( dot( halfway, toEye ) ) ; //NdotV = saturate ( dot( normal,toEye ) ); if( NdotL > 0 ) { float D = DFactor(roughness,NdotH); float pdf = (D * NdotH / (4 * HdotV)) + 0.0001f ; float V = V_SmithJoint(roughness,NdotV,NdotL) ; float Vis = V * NdotL * 4.0f * HdotV / NdotH ; float fc = pow(1.0f - HdotV,5.0f); res.x += (1.0f - fc)* Vis; res.y += fc * Vis; } } return res /(float)NUM_SAMPLES; }

At runtime, we can do something like this to fetch the values from above textures and calculate the ambient specular light for given point & view direction.

float3 SpecularIBLRealtime(TextureCube envMap, sampler samEnv , float3 normal, float3 toEye, float roughness, float3 specColor) { float3 res = (float3)0.0f; normal = normalize(normal); static const uint NUM_SAMPLES = 256; for(uint i=0;i<NUM_SAMPLES;++i) { float2 xi = hammersley_seq(i, NUM_SAMPLES); float3 halfway = ImportanceSampleGGX(xi,roughness,normal); float3 lightVec = 2.0f * dot( toEye,halfway ) * halfway - toEye; float NdotL = saturate ( dot( normal, lightVec ) ) ; float NdotV = saturate ( dot( normal, toEye ) ) ; float NdotH = saturate ( dot( normal, halfway ) ) ; float HdotV = saturate ( dot( halfway, toEye ) ) ; if( NdotL > 0 ) { float V = V_SmithJoint(roughness,NdotV,NdotL); float fc = pow(1.0f - HdotV,5.0f); float3 F = (1.0f - fc) * specColor + fc; // Incident light = SampleColor * NoL // Microfacet specular = D*G*F / (4*NoL*NoV) // pdf = D * NoH / (4 * VoH) float D = DFactor(roughness,NdotH); float pdf = (D * NdotH / (4 * HdotV)) + 0.0001f ; float saTexel = 4.0f * CH_PI / (6.0f * CONV_SPEC_TEX_WIDTH * CONV_SPEC_TEX_WIDTH); float saSample = 1.0f / (NUM_SAMPLES * pdf) ; float mipLevel = roughness == 0.0f ? 0.0f : 0.5f * log2( saSample / saTexel ) ; float3 col = envMap.SampleLevel( samEnv, lightVec, mipLevel).rgb; res += col * F * V * NdotL * HdotV * 4.0f / ( NdotH ); } } return res / NUM_SAMPLES; }

For this process of convolution, we have made 2 assumptions –

- BRDF is isotropic so this process can’t be used for anisotropic materials.
- Viewing angle is zero N=V=R. But in actual runtime rendering viewing angle can be different which introduces some errors/artifacts (in epic notes it is suggested to weight the samples by cosθ to reduce the error) and we can’t get long streaks that we should otherwise get from a microfacet based BRDF.

One of the solutions is to avoid pre-processing and do importance sampling at runtime, but it may not be feasible to do so because of performance. We need multiple samples from single cube map (16-32) and in case of local light probes we would have multiple probes effect single shade point making it not feasible for real-time in real projects.

For comparison here’s a screenshot with sphere rendered with pre-processed method (below) and real-time method (above).

## Aliasing Artifacts

We are using the same set of random numbers for every pixel (more accurately called quasirandom random numbers [4]) which producing some aliasing artifacts as seen in the image (a). There are two ways to fix this –

- We can introduce a jitter or further randomness to the numbers being generated ( instead of using values 1-128 etc ). This will replace the artifacts with noise which is visually more acceptable, as can be seen in image (b).
- The second method is via cube map filtering. In this method, we calculate the PDF of each sample direction. If the PDF of a sample direction is small, the more texels from the input environment should be averaged for that sample direction, which roughly translate to using lower mip level. This factor is calculated using the following formula where Ωs is the solid angle associated with the sample and Ωp is solid angle subtended by a pixel at zeroth mipmap level. Image (c) used this method. For more details check out [4].

I am using the filtering method with 256 samples ( I am thinking to use jittering also in future to reduce the number of samples required ). Here’s a comparison image with and without PDF based filtering –

## Light Probes Placement and Interpolation

For local light probes, we have to tackle the problem of placing light probes and how to decide which pixel will use which probes. One way would be to place the light probes in a game scene automatically in some grid format or something and then at runtime grab closes probe or blend b/w multiple. Another method can be to manually place the light probes in the game level wherever needed. This method will allow artists to tweak thing themselves place more probes where necessary, etc. There are few different approaches available for determining at runtime which light probes are affecting the current pixel –

- Grab the closest light probe. Something similar was used in source 1 engine for ambient specular lighting.
- K Nearest – Just grab a K-nearest light probes and interpolate b/w them by blending the results.[11] [12]
- Tetrahedral based method – Check [12] for more details.
- Influence Volume – We define influence volumes for light probes and we blend between all the light probes affecting a pixel. It’s a priority blend & probes with smallest influence volumes are given the highest priority. Unreal & CryEngine uses the same technique. [11]

In my implementation, I am using Influence volume based method. For each probe, we can define a spherical or box based influence volume which decides which pixels will be affected by a particular light probe. In regions of overlapping Light probes with the smaller area are given higher priority. We can also define blend areas of each volume which help us to smoothly blend from one light probe to another in case of two nearby light probes or when transitioning out of inner smaller light probe into the influence of outer / bigger one. It looks something like this –

And here’s how it looks in-game –

## Rendering Environment Maps for Light Probes

For generating local light probes, we would need to render scenes into an environment map as viewed from the position of the light probe. Now for rendering into environment map we have 2 options in terms of deciding the camera far distance –

First is to use sort of infinite camera far distance or main camera or some predefined value and render everything into the environment including the sky. In most cases, it will work but in some cases it might cause some bugs in reflections as can be seen below. It comes from the fact that we are rendering objects outside the influence volume into environment map which in turn will be included in light probe too.

If we are using multiple reflection techniques (screen space reflection, etc) & probes we might be able to hide it. But in cases we can’t hide it we can use another method in which we use the AABB box distance values (or sphere radius) as the camera far distance while rendering into environment maps. We don’t render sky in this case. Also use the alpha channel of render targets to mark pixels to which we have rendered.While shading, we can use these alpha values in our blending method to decide the weights which will result in ignoring these regions and reflection values can be taken from other light probes. We can specify these distances on per light probe basis so we can even use method 1 on some and method 2 where needed. But even in the first method we might need the alpha to leave out the area for the sky because the sky can be dynamic so we can’t include sky values in the light probe *(there might be a better way to handle sky, but this is the only way I know for now)* .

Here’s the screenshot of the same scene with the second method In this method reflection from the smaller probe is ignored since its alpha is 0 and reflection values are taken from bigger overlapping light probe covering the whole region *(scenes contains blending between 3 light probes one on left, one on right and one in middle)* .

## Storing Light Probes

Right now I am storing the light probes directly into the A16R16G16B16 texture format *(which takes a lot of memory i know)*. I will be shifting to some compressed format for sure in near future.If we are using first method and don’t need alpha we can use some 2 channel formats or even for those that require alpha we can use some custom mapping scheme to fit this into 32 bit texture format since we just need a bit to mark for alpha. For those interested in optimizing memory footprint of light probes can check CryEngine presentations [13] or Sébastien Lagarde’s blog [12].

For runtime, I am using TextureCubeArray for storing all local light probes and separate TextureCubes for sky diffuse and specular. All light probe related properties are stored in Structure buffer and another typed buffer for storing indices of light probes effecting a particular tile in ForwardPlus rendering mode.

## Tiled Probe Culling

For light probe interpolation and blending system to work we need to sort probes in increasing order of their influence area and we need to find light probes affecting a particular pixel. For sorting purposes, I am using insertion sort on CPU while creating the array of light probe data which sorts in increasing order based on the size of influence volume.

Finding which light probes are affecting a particular pixel is similar to finding which lights are affecting which pixel. So I used similar Tiled Culling algorithm for light probes also which gives us a list of light probes affecting any tile. For the culling tests I am using the same sphere/mini frustum test as I did for point lights.

One drawback of using this algorithm is that we have to use InterlockAdd to add indices to array containing light probes in global shared memory. As in any threaded operation, we predict the order in which these indices will be added so we have to sort the indices again. (Note that we only sort the indices we index into structure buffer containing CPU sorted light probe data.)

For GPU sorting, I tested following 3 algorithms. I have tested them for up to 32 light probes and all of them take almost same time of around 280 microseconds as checked in Nvidia NSight –

The first one is somewhat similar to bubble sort running on the single thread. To make it faster we can distribute the comparison part on n/2 threads where n is number light probes *(I think after doing that it might be the fastest of 3 but I have to test to be sure. I will post results later in a different blog post in near future after implement that version & testing with some more light probes in a real test scene)* –

void Sort(uint threadIndex) { if( threadIndex == 0 ) { uint countLoop = gsProbeCount > 0 ? ((int)((gsProbeCount + 1) * 0.5f - 1)) : 0; uint index1, index2; for(uint j = 0; j < countLoop; ++j) { for(uint i=1;i < gsProbeCount; i+= 2) { index1 = i-1; index2 = i; if( gArrProbeIndices[index2] < gArrProbeIndices[index1] ) { uint temp = gArrProbeIndices[index1]; gArrProbeIndices[index1] = gArrProbeIndices[index2]; gArrProbeIndices[index2] = temp; } } for(int i=((int)gsProbeCount - 1);i > 0; i-= 2) { index1 = i-1; index2 = i; if( gArrProbeIndices[index2] < gArrProbeIndices[index1] ) { uint temp = gArrProbeIndices[index1]; gArrProbeIndices[index1] = gArrProbeIndices[index2]; gArrProbeIndices[index2] = temp; } } } for(uint i=1;i < gsProbeCount; i+= 2) { index1 = i-1; index2 = i; if( gArrProbeIndices[index2] < gArrProbeIndices[index1] ) { uint temp = gArrProbeIndices[index1]; gArrProbeIndices[index1] = gArrProbeIndices[index2]; gArrProbeIndices[index2] = temp; } } if( (countLoop*2) <= gsProbeCount ) { for(int i=((int)gsProbeCount - 1);i > 0; i-= 2) { index1 = i-1; index2 = i; if( gArrProbeIndices[index2] < gArrProbeIndices[index1] ) { uint temp = gArrProbeIndices[index1]; gArrProbeIndices[index1] = gArrProbeIndices[index2]; gArrProbeIndices[index2] = temp; } } } } }

The second method is similar to my insertion sort logic on CPU side. We find the index for each value by looping through the array and counting the number smaller values. We can do this in parallel on n threads where n is the number of light probes indices. And then after a GroupSync we can swap the values. One drawback is that we can’t sort light probes if their number is greater than the number of threads in a Group which is 256 in my case and I don’t think I would ever reach the point where 256 light probes are affecting one tiles so this solution kind of works for me.

void Sort(uint threadIndex) { /////sort probes //assuming max probe count per tile <= NumThreadInTile == 256 uint temp = gArrProbeIndices[threadIndex]; uint index = 0; for(uint i=0;i<gsProbeCount;++i) { if( temp > gArrProbeIndices[i] ) ++index; } GroupMemoryBarrierWithGroupSync(); gArrProbeIndices[index] = temp; GroupMemoryBarrierWithGroupSync(); }

As can be seen in above we have two GroupSyncs one before the swap and one after it. We can remove the first Group sync by using a second array to contain sorted values (similar approached used by unreal). –

void Sort(uint threadIndex) { /////sort probes //assuming max probe count per tile <= NumThreadInTile == 256 uint temp = gArrProbeIndices[threadIndex]; uint index = 0; for(uint i=0;i<gsProbeCount;++i) { if( temp > gArrProbeIndices[i] ) ++index; } gArrProbeIndicesSorted[index] = temp; GroupMemoryBarrierWithGroupSync(); }

There was a minor difference of roughly 5 microseconds b/w three algorithms and 3rd approach being the fastest and 2nd being slowest. *(But I still have to test in some proper scene with large number of cubes)*

## Parallax Correction for Local Cubemaps

We are using cube maps for local light probes which introduce rendering errors into reflections due to parallax. Objects in the reflection appear to be of the wrong size and at the wrong position. In my implementation, I have solved this problem using sphere and box geometry volumes. We find an intersection between the reflection vector and the geometry proxy & use it to correct the reflection vector. Since this is all done in the shaders per pixel per probe choice of simple proxy geometries are very important for performance. For an AABB, it looks something like this –

For more detailed info on parallax correction & code check out this forum post [14] and this blog post [12].

## IBL in JAGE

So to sum it all up, here’s the overall process to make IBL work in my engine –

- Place Light probes in-game level with their influence volumes – Done manually in my engine. I am using Sphere & Box volumes.
- Capture environment maps at those positions – Either using infinite far distance or based on influence volumes.
- Create light probes out of those environment maps using a Cubemap Convolution method based on the BRDF being used for the shading. – I am using importance sampling based approach for GGX based BRDF in my engine. I use a roughness factor of 0 for generating diffuse probe. At 0 roughness, it averages texels over the whole hemisphere which is equivalent to the Lambert diffuse I am using.
- Sort the light probes on CPU in increasing order of area of influence – I am using insertion sort for this.
- Store these light probes along with required properties in GPU buffers – I am using TextureCubeArray for light probes and a structure buffer for properties. I use another typed buffer for storing indices for ForwardPlus rendering path.
- Find out which light probes will affect which pixels – I am using Compute shader based Tiled Culling method that I used for point lights.
- Sorting the indices again on GPU – I have implemented 3 algorithms for this.
- Fix the reflection vector for local light probes for removing parallax – I am using box & sphere volumes for this.
- At the shading time, we fetch the values from all the light probes affecting this pixel and blend between them – In my case it’s a priority based additive blend with smaller volumes given higher priority.
- Fetch reflection values from Sky Probe both diffuse and specular for the sky.
- Fetch the value from BRDF texture LUT and evaluate the BRDF.
- Add it as ambient light into the whole shading equation.

## IBL Tools

Here’s a short list of IBL tools available for free on the internet that can be used for pre-processing the cube maps –

- AMD’s CubemapGen – https://code.google.com/p/cubemapgen/
- Modified CubemapGen (or upgraded) – https://seblagarde.wordpress.com/2012/06/10/amd-cubemapgen-for-physically-based-rendering/
- IBL Baker – http://www.derkreature.com/iblbaker/
- Cmft – https://github.com/dariomanesku/cmft

#### References –

For readers looking for more in-depth information – [4] [5] are must read for cube map convolution process and [12] is contains a lot of details & references on how to handle local light probes.

- BRDF – https://en.wikipedia.org/wiki/Bidirectional_reflectance_distribution_function
- Specular BRDF Reference – http://graphicrants.blogspot.in/2013/08/specular-brdf-reference.html
- Background: Physics and Math of Shading – ( pdf )
- Importance Sampling – http://http.developer.nvidia.com/GPUGems3/gpugems3_ch20.html
- Irradiance Environment Maps – http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter10.html
- Image-Based Lighting – http://http.developer.nvidia.com/GPUGems/gpugems_ch19.html
- Plausible Environment Lighting in Two Lines of Code – http://casual-effects.blogspot.in/2011/08/plausible-environment-lighting-in-two.html
- Cubemap Texel Solid angle – http://www.rorydriscoll.com/2012/01/15/cubemap-texel-solid-angle/
- Physically Based Lighting in Call of Duty: Black Ops – http://blog.selfshadow.com/publications/s2013-shading-course/lazarov/s2013_pbs_black_ops_2_slides_v2.pptx
- Real Shading in Unreal Engine 4 – http://blog.selfshadow.com/publications/s2013-shading-course/karis/s2013_pbs_epic_slides.pptx
- Light Probes – http://blogs.unity3d.com/2011/03/09/light-probes/
- Image-based Lighting approaches and parallax-corrected cube map – https://seblagarde.wordpress.com/2012/09/29/image-based-lighting-approaches-and-parallax-corrected-cubemap/
- Secrets of CryENGINE 3 Graphics Technology – ( ppt )
- Box Projected Cubemap Environment Mapping – http://www.gamedev.net/topic/568829-box-projected-cubemap-environment-mapping/

Posted on August 26, 2015, in GI, Graphics, JustAnotherGameEngine, Tutorials and tagged IBL, Image Based Lighting, JAGE, Parallax Corrected Cubemap, Real Time Reflection. Bookmark the permalink. 2 Comments.

I can’t believe this has not comments. Thank you so much for this great article! 🙂

Thanks a lot !