精選文章

SmallBurger Asset Home

  SmallBurger

2025年9月1日 星期一

Unity Brush Technology: Breakthroughs and Practical Insights — From SetPixels to Compute Shader

During the process of developing a brush system in Unity, I found that in-depth discussions about the “principles of brushes” are quite rare online. Many developers tend to use the engine’s built-in tools directly, rarely exploring the underlying mechanisms. In reality, only by truly understanding how brushes work can we break through the limitations of existing tools and create fully customized brushes tailored to our needs. This is also my motivation for writing this article — to share my technical explorations and practical insights, and to help more people understand and build their own brush systems.

SetPixels32 Brush: Moving Data from CPU to GPU

Let’s start with the simplest: SetPixels32. Anyone who has worked on fog of war systems is probably very familiar with it. Some might think using Graphics.CopyTexture is more efficient, but since brushes usually need to support arbitrary sizes, such approaches aren’t really suitable.

The following code is a typical SetPixels32 implementation: it iterates over every pixel within the brush coverage area, calculates the color based on distance and weight, and finally applies the entire texture to the GPU in one go.

var destColor = new Color32(0, 0, 0, 0);
for (int rowIndex = startRowIndex; rowIndex<=endRowIndex; ++rowIndex)
{
    for (int columnIndex = startColumnIndex; columnIndex<=endColumnIndex; ++columnIndex)
    {
        Vector2 gridCenter = new Vector2(
            worldMin.x + columnIndex * gridSize.x + halfGridSize.x,
            worldMin.y + rowIndex * gridSize.y + halfGridSize.y);

        int index = columnIndex + rowIndex * mapWidth;
        float sqrDistance = (brushCenter - gridCenter).sqrMagnitude;

        sqrDistance = Mathf.Clamp(sqrDistance, sqrhalfBrushInnerSize, sqrhalfBrushSize);
        float weightRatio = 1.0f - (sqrDistance - sqrhalfBrushInnerSize) / (sqrhalfBrushSize - sqrhalfBrushInnerSize);
        
        destColor = editColorBuffer[index];

        ProcessPixelColor(weightRatio, speed * 0.033f, isAdd, sqrDistance, sqrhalfBrushSize,
            in mouseDelta, ref destColor);
        
        editColorBuffer[index] = destColor;
    }
}
editTexture.SetPixels32(editColorBuffer);
editTexture.Apply();

The principle here is simple: the CPU processes each pixel in the brush area, then sends the entire texture data back to the GPU at once. This causes a large amount of CPU → GPU data transfer. The bigger the texture or the more textures you process simultaneously, the more apparent the performance bottleneck becomes. For brush operations that require real-time feedback and fine control, this approach is often not sufficient.

Fragment Shader Brush: Shifting Computation to the GPU

After understanding the performance bottleneck of SetPixels32, another way to optimize brush operations is to move the pixel processing from the CPU to the GPU — that is, using a Fragment Shader to achieve brush effects. This greatly reduces the CPU-to-GPU data transfer and allows each pixel’s calculation to be efficiently handled in parallel by the GPU.

The following shader code demonstrates how the calculation of pixel weight and color within the brush area is entirely handled by the Fragment Shader:

bool GetBltTextureColorAndBrushWeight(in VertexOutput input, 
    out half brushWeight, out half4 bltTextureColor)
{
    half2 dsetUV = input.screenPosition.xy / input.screenPosition.w;
    bltTextureColor = SAMPLE_TEXTURE2D(_MainTex, sampler_MainTex,
        input.screenPosition.xy / input.screenPosition.w);
    uint rowIndex = _TextureArrayIndex / 2;
    uint columnIndex = _TextureArrayIndex % 2;

    float2 wordPosition2D = float2((columnIndex + dsetUV.x) * _BrushInfo.z,
        (rowIndex + dsetUV.y) * _BrushInfo.w);

    float distance = length(wordPosition2D - _BrushTransform.xy);
    if (distance > _BrushTransform.w)
        return false;

    distance = clamp(distance, _BrushTransform.z, _BrushTransform.w);
    brushWeight = (1.0 - (distance - _BrushTransform.z) / _BrushInfo.y) * 
        unity_DeltaTime.z * _BrushInfo.x * sc_weightApplyRatio;
    return true;
}

This code mainly does two things:

  • Calculates the current pixel’s world position based on screen coordinates, then determines whether it falls within the brush radius.
  • If it is within range, it calculates the weight (brushWeight), which determines the brush’s influence on that pixel.

Advantages

  • Fully handled on the GPU: No need to send pixel data from CPU back to GPU, resulting in a significant performance boost, especially suitable for high-resolution or multi-brush operations.
  • Real-time feedback: Provides a more immediate and nuanced brush feel.

Disadvantages

  • Must traverse the entire RenderTexture: Even if the brush only covers a small area, the Fragment Shader still processes the entire texture, causing unnecessary computation.
  • Input/Output RenderTarget limitation: During processing, the input and output RenderTargets cannot be the same texture. Each draw requires copying (blitting) the original content to another RenderTexture before processing, which adds an extra GPU pass.
  • More complex for multiple targets: Managing and optimizing brush operations on TextureArrays becomes even more challenging.

P.S.: Unity’s official Terrain brush also uses this approach. For reference, see this link:
PaintTexture.shader

Compute Shader Brush: OnePass High-Efficiency Processing, TextureArray Support

After understanding the limitations of Fragment Shaders regarding Input/Output RenderTargets, we can further leverage the features of Compute Shaders to effectively solve these bottlenecks:

  • Input can be equal to Output
    You can read and write directly on the same RenderTexture, without the need for extra Blit copies, significantly reducing resource consumption and latency.
  • Only process the pixels within the brush area
    No need to traverse the entire RenderTexture; instead, calculations are performed precisely on the area affected by the brush, greatly improving performance.
  • Full support for TextureArray
    You can directly read and write to TextureArrays, easily meeting advanced multi-layer texture requirements, offering greater flexibility and scalability.

These advantages make Compute Shader brushes far superior to traditional Fragment Shader brushes in terms of both performance and functionality, achieving true OnePass efficient painting.

The following Compute Shader code demonstrates these features:

  • Input is Output: Directly read and write on the target texture.
  • Only process a local area: Only computes the Rect area covered by the brush.

[numthreads(8, 8, 1)]
void CSMain (uint3 id : SV_DispatchThreadID)
{   
    if ((id.x >= _BrushPixelsColumnCount) ||
        (id.y >= _BrushPixelsRowCount))
        return;
    
    TextureIndex textureIndex;
    half brushWeight;    
    if (!GetDestPixelWeight(id.xy, textureIndex, brushWeight))
        return;

    SetEditTextureValue(textureIndex, half4(_TintColor,
        saturate(GetEditTextureValue(textureIndex).a + brushWeight)));
}

As shown above, the Compute Shader only processes the area affected by the brush (controlled by _BrushPixelsColumnCount and _BrushPixelsRowCount), and directly reads and writes on the target texture (which may be a TextureArray), fully leveraging the advantages of OnePass and high efficiency. Here, TextureIndex refers to the index of the TextureArray.

Comparison of the Three Approaches

  • SetPixels32 Brush
    Advantages: Simple to implement, easy to understand.
    Disadvantages: Large CPU→GPU data transfer, obvious performance bottleneck.
  • Fragment Shader Brush
    Advantages: All computation is done on the GPU, improved performance, great real-time feedback.
    Disadvantages: Must traverse the entire texture, computational waste; Input/Output RenderTarget limitations.
  • Compute Shader Brush
    Advantages: Only processes the affected area, Input is Output, supports TextureArray, best performance and flexibility.
    Disadvantages: More complex to implement, requires understanding of GPU parallel computation.

Among these three solutions, the Compute Shader brush stands out not just for its performance advantage, but as a cleaner, more direct, and elegant solution.

It transforms brushes from a process of data shuffling and repeated copying to a real-time creative process on the canvas by the GPU.

This shift brings not just “faster” performance, but an entirely new experience:

  • Artists can feel real-time feedback as if using a real paintbrush.
  • Engineers can implement complex brush requirements with cleaner logic.
  • Project development gains a highly efficient solution for large scenes and multi-layer textures.

From now on, SmallBurger’s Brush tool plugins will gradually transition from FragmentShader to ComputeShader.

SmallBurgerAssets

沒有留言:

張貼留言