BlackMesa XenEngine: Part3 – CSM 2.0

CSM stands for Cascaded Shadow Maps, a popular and most commonly used technique to render Sun Shadows or Shadows from any big Directional ( can also be used for other light types ) lights covering a big area of a virtual scene. For more details, check out my blog posts on Shadows and CSM.

We shipped the first version of CSM in content update 4 back in 2016, and you can check all the details in this blog post – CSM in source engine. This blog post will only discuss improvements and changes made since then.

We had to tackle two critical tasks to lay the foundation for the XenEngine upgrade. The first challenge was upgrading the engine from ShaderModel 2.0 to Shader Model 3.0. The second task involved setting up a reliable Gbuffer with precise depth values and accurate world position reconstruction. This step was crucial not only for enhancing CSM but also for other aspects of the XenEngine.

CSM VRAD Update

VRAD is the light mapper or light baking tool in the source engine. We also integrated the valve’s CSGO CSM update to blackmesa for all the improvements from baked shadows. More details about this update can be found on CSGO Lighting and Shader Improvements ,

Even after that update, there was still a weird bug in the light mapper / VRAD which it would bug out the baking process if there were light-style light entities in the map. Instead of lighting info vrad would bake luxel patterns. Lightstyle is a feature to implement flickering lights with old source light entities.
I spent a lot of time fixing/workaround it but just managed to make it write black texels in the light maps instead of luxel patterns. Our in-house light mapper expert Alex went on a deep dive n vrad and fixed it. It turned out to be something related to pointer math related to the alpha channel of light maps. It was not an easy one to fix. AFAIK this bug is still present in almost all the other versions of source engine (including CSGO) out there with CSM.
At the time, deferred lighting was not in the picture, so light styles were the only way to render flickering lights and it was very important for artists to be able to use these lights in Xen. Later, months/years later, we replaced all of these old-school light-style entities with new deferred lights.

Shadow Map Improvements

Here is a quick recap of the old patch. We used 4 shadow cascades on a single shadow map whose resolution varied from 1k->4k depending on settings. 3 cascades are used for 3 levels of shadow maps in world space, and 4th one was used as a shadow map only for view space for view models.

Static Cascades

For XenEngine, we at least doubled( some even more) all the limits, whether entities, displacements, etc. We also wanted CSM as a feature more affordable in terms of perf budges for low-end machines. So we implemented something called Static Cascade.
A static cascade is a single shadow map on a different big texture (2k->4k, depending on settings). The idea was to render all the static geometry, including brushes, displacements, entities, and even static NPCs on this static shadow map once after level load and then use it for sun shadows even on Potato settings. We (sorta) unified shadow map generation and calculations for view models with world space so we could use same shadow map for both world geometry and view models. This was a huge win in terms of both quality and perf. On potato mode, we never use any dynamic cascade static cascades are use for both view model and world geometry.

We have named our lowest option for all graphics settings as Potato.

Hybrid Approach and Quality Levels

We use static caches even on higher settings, not just Potato. We use a static cascade on the lowest possible configuration and only 1 dynamic cascade for rendering view models and world geometry.

As we increase settings, we include more cascades, the maximum being 3, and start rendering more & more objects into those dynamic cascades. We have a total of 5 levels of configuration, and in the highest one, we don’t use static cascade, we use all 3 dynamic cascades + 1 for ViewModel, and we render everything into all the cascades. We save VRAM a bit since we are not using or allocating texture for static cascade. View model self-shadow is supported when view model cascade is enabled, the last 2 levels (out of 5). The video Setting name for control is called SunShadowQuality. (it goes Potato/Low/Medium/High/Ultra)

On a Medium quality level, which is recommended for anything below GTX 1080 (especially for Xen) we get 2-5x (depending on maps/locations) perf improvements compared to the old system without any major image quality difference unless you are looking for it.

Vram Usage

We have a separate setting to control vram usage or size of shadow map textures for static/dynamic cascades. PC platforms can be wild in terms of hardware. We can have situations where GPUs with high perf can have lower vram and vice versa, so it was essential to separate vram usage to a separate setting. It’s controlled via “Sun Shadow Memory” in video settings.
Static cascade shadow map size ranges from 2k to 4k, and Dynamic cascade goes from 1k to 4k. Both are D16 or 16-bit float textures (at least on Nvidia). In dx9, there was no support for 16 float textures or depth-only draw calls, so we had to use these driver-specific special formats to use 16-bit float textures or special null textures to enable depth-only draw calls. They were a hit-or-miss back then but SHOULD be supported on almost every card nowadays, whether Nvidia/Intel/AMD.

I also experimented with 24-bit and 32-bit Float textures along with 8K shadow maps, but gains were insignificant compared to perf hit, so these are disabled in the code. Also, we do not allocate static cascade at all if it’s not needed.

CSM Volumes

Initially, there were 2 different modes for rendering shadow maps for static cascade –

Method 1 – Render the whole map into static cascade once after level load. In terms of perf, it’s great, but in terms of quality, it sucks because of lower shadow map resolution since we are rendering the whole map into a 2k or 4k texture. With Xen, it will be an even bigger problem since maps were considerably bigger.

Method 2 – Render a limited area around the player into a static cascade and control is update rate at a fixed number of times per second ( we used around 10 -15 times per sec). It’s marginally better than method 1, but we are still rendering a big chunk of level multiple times per second into a static cascade. Also, it’s hard to find a good value for a size variable that fits all maps/locations, so we often end up choosing a big size that was not ideal. Also, updating a static shadow map more than once doesn’t feel right and defeats the purpose of having a static shadow map or shadow cascade.

Solution

We implemented something called CSM volumes. The idea was to place this CSM volume entity or AABBs spanning the whole level in the hammer ( source map editor ) and render only the area covered by the current CSM volume into the static cascade. It increased the shadow map resolution of the static cascade by a large amount since we are rendering a smaller area into the same 2k/4k shadow maps. Also we update this static cascade on demand – once after level load and then update only when triggering a CSM volume change. It was a huge win both in terms of quality and performance.

Here’s an example from a test map. The map contains mainly two rooms/sections connected via a tunnel. Each room has a CSM volume which will be triggered as soon as the player enters that room. The big orange cubes shown in the map editor screenshot below are CSM Volume entities. The size of CSM volume entity decides the size of the world or level to be rendered into CSM shadow maps.

CSM Volume not only improved the quality of shadows at lower settings at even higher frame rates. It also enabled almost all the users to turn on CSM, as opposed to the previous version, which incurred a heavy perf hit even on lower settings.

CSM Volumes have been forced to be used in manual mode in Xen, and Level Designers have manually placed all the CSM volumes. Many users won’t be able to enable shadows in Xen without CSM volumes. It also helped stabilize FPS on some of those heavy sections, even on fairly powerful PC configs at the time.

That’s all for this post, and I will be breaking down deferred rendering in my next post.

About chetanjags

Game Developer

Posted on July 17, 2023, in BlackMesa / Source Engine, Graphics and tagged , , , , . Bookmark the permalink. 1 Comment.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.