As SRP Batch technology has advanced, Unity’s automatic instancing feature is often disabled, leading to performance degradation when rendering a large number of objects.
While manual instancing (such as using Graphics.DrawMeshInstanced) offers better performance, it is typically limited to rendering large amounts of single-type objects and lacks support for full-scene optimization.
since Graphics.DrawMeshInstanced requires a pre-passed matrix array, a typical approach is to divide the scene into a grid, with each cell corresponding to a matrix array. Culling is then performed based on the bounds of each cell, or by using QuadTree or OctTree structures. However, this method does not perform per-instance culling, resulting in the rendering of more objects than with SRP batching, which can lead to poor performance.
There is no per-instance culling.
In addition, since the grid cutting is predetermined, if the camera zooms in and out, it can lead to an unsuitable size for the grid, resulting in an excessive number of draw calls when zooming out. Unless batching can be performed, this approach also prevents the number of arrays from being predetermined, and such processing can severely impact performance.
There are no batch cell draw calls.
The unique advantage of InstanceCollector is its ability to optimize GPU instancing across the entire scene. It not only effectively collects instance objects but also leverages Unity’s Job System for precise culling calculations, ensuring optimal rendering performance. If the culling is not accurate enough, the performance might even be worse than SRP Batch, highlighting the core strength of InstanceCollector’s technology.
Per-instance culling allows for one draw call in the batch.
Generally speaking, compared to DrawMeshInstanced, DrawMeshInstancedIndirect offers better performance because it gathers matrix data in the compute shader. However, the biggest issue arises when the object count is zero. Since the CPU cannot determine the number of objects, it still calls the rendering API, and it seems like additional data is being sent to the GPU. When there are many different types of objects, this may lead to a performance drop. Therefore, to handle instance collections for multiple object types, we choose to use DrawMeshInstanced along with culling via the job system.
Example of Using DrawMeshInstancedIndirect for a Single Type of Object.
Instance Collector uses the Job System (without occupying the main thread) to dynamically collect instances and supports multiple types of objects. Below is the related feature list.
- Precise per-instance culling.
- Dynamic instance collection using the Job System, without occupying the main thread’s resources.
- Supports multiple types of instance objects, and even full-scene GPU instancing.
- Provides a matrix array output tool to speed up scene loading and reduce memory usage.
- Without using SSBO, it has better cross-platform compatibility.
- For hardware that doesn’t support GPU instancing, the rendering process will fall back to using.
- Simple instance object setup process (just add the Instancer script and enable “Enable GPU Instancing” in the material).
- Compatible with SRP Batcher.
- Customizable maximum visible object count for each type, adjustable based on project memory and performance requirements.
- Support URP and Built-in render pipeline.
You can find it here:
InstanceCollector
沒有留言:
張貼留言