Reducing frame time: rendering

This post is intended as a quick glance on how to reduce the CPU and GPU cost of rendering many objects. I will talk very briefly about some of the most common and basic ways to do that. These techniques are not in any specific order. This advice can be used for any engine such as Unity or Unreal or custom engines. I also will write up more detailed explanations for each of these, but for now I’ll just quickly explain the concepts here.

LOD / Level Of Detail

Using LOD means determining on how far off or small the objects are from the camera’s perspective. If they cover very small amount of pixels on the screen, there is no need to render all the details of the objects. If you have a very complex mesh, you should store a more simplified version of it in memory. You could even have multiple levels of detail, meaning you need to store multiple meshes with reduced detail from the previous level.

The lowest level of detail could be a triangle or a quad containing a texture with a render of mesh, which is called billboarding. If your mesh has normal maps or some other more advanced and GPU-intensive effects like parallax mapping, you could also disable those at lower levels of detail to reduce the rendering cost even further. It makes no sense to render such fine details if the end-result is going to be very small on the screen. The end-user most likely will not notice the difference.

Having multiple levels of detail and reducing the vertex count and making the shading more lightweight will help if you have problems with GPU having too much to work on. This will not help with CPU bottlenecks.

Batch drawing

It is a very CPU intensive thing to issue a draw command to the GPU. You can easily get CPU bottle-necked if you draw thousands of objects each frame. This CPU cost can be reduced by drawing objects in batches. Different engines support different batching options and you should dig in to the documentation of the engine for the specifics. However I’ll list some most common stuff here and explain how they work.

Basic idea of batching

The idea of batching is to gather up similar objects together to be drawn in very small amount of issuing GPU commands. The less GPU commands needed to draw more objects the better. State changes like switching buffers, textures, uniforms (material properties etc.) and other related stuff each need the CPU to communicate with the GPU which quickly gets time-consuming. So it is really important to optimize drawing in a way that makes the state changes as infrequent as possible.

Static batching

Static objects that do not move in the game world can be batched up in to a single mesh. For example if you have a thousand trees all with same textures and shading, you can group up all the individual trees in to a single mesh and draw that mesh with changing the required states only once, and finally issuing just a single draw command. This way you can have a lot more throughput when drawing many objects in your game world. Drawback here is the increased GPU memory usage, as all individual objects’ vertices exist in memory.

Instance batching

Also known as instanced drawing. This allows a single mesh to be drawn a lot of times. This usually requires small changes to the rendering pipeline, so it is a little extra work to implement, and to use it requires special treatment for shaders too. On the plus side, it does not take up extra memory like static batching does, but it only allows a single mesh being drawn multiple times at once.

Manual batching

This is pretty much a pre-rendering process which will help if you are bottlenecked by CPU while going through your scene to figure out what to draw or what to batch up for drawing. Look at your game scene and see if there are objects you can easily batch together. For example if you have a forest with a lot of trees, you could batch up a group of them instead of having them as individual objects. Then just duplicate and transform that group of trees to make up a forest just like you would use your individual trees. This will reduce the time required to go through all the objects in the scene, and using a little bit of time creating the group object you can make the patterns produced less apparent.

Early Z-pass

If you have a lot of expensive pixels (i.e. complicated shading) being drawn you can reduce the cost by discarding pixels as early as possible in the pipeline so they won’t get in to the shading part at all. This is usually done by sorting the drawn objects from front to back so the GPU can then determine if the later drawn pixels are behind other pixels making them marked as discarded. Discarding pixels in this manner before they get to the shading pass will reduce the GPU load. Note that it is very easy to accidentally disable the early z-pass, for example by enabling blending or having a shader write to the Z value of the pixel in the shading pass.

Culling

One way to reduce time that goes in to rendering is to reduce the amount of rendered objects. Culling will reduce drawn objects by taking out the objects that are not visible from the camera. Culling in general takes up some CPU time, but the gain is in reduced drawcall count which in turn reduces both CPU and GPU cost.

Frustum culling

Frustum culling will check if an object is inside the camera’s frustum. If the object is not inside the frustum, it will not be processed in the drawing pipeline, making the drawing process faster. The frustum is defined by six planes that are aligned by the camera, and the objects are then checked if they are behind any of the planes and then the ones that are inside the frustum are added to the list of object being drawn.

Occlusion culling

Occlusion culling checks if other objects occlude the current object in a way that it will be not seen at all. If an object is being occluded by other objects, it will not contribute to the final frame image, and thus should not be added to the drawing list.

Lightmapping

One of the most expensive shading operations are calculating lights. This can be optimized if there are static light sources and static objects in the scene. For example, in a FPS map there could be a spot light shining towards somewhere the player can not reach. These situations can be pre-calculated in textures called lightmaps, which then are applied to the objects in the scene at reduced cost.