这里截取了2023那篇分享中的图来展示这个方案的思路,其核心思路就是复用内存数组中的空间,避免内存空间的切换。通过维护一个类型数组(Typed array of objects)和一个可重用池(Freelist for slot reuse)来共同实现这样一个数据池,这样传入GPU时始终都是同类型的连续内存,便于后续的并行计算。
In the earlier discussion, you might have wondered why GPUs are restricted to 2x2 quads, and the reason is derivatives. Any time that you perform a texture sample, the GPU is approximating the partial derivative of the UV by using the 4 samples. That’s why they are called “helper” lanes…they help the active lanes compute partial derivatives.
But if you are using material graphs, be prepared for a bit of an adventure, as you’ll have to change your code generator to maintain derivatives and perform a bunch of conversion between derivative and non-derivative nodes.
And the great thing about booleans is that we can pack them into uints. Since our search radius is only 4 pixels, we can search for 24 pixels at a time with 4 bits of pad on each side.
If you want accurate ordering of the layers, the typical method would be to render the mesh while storing all the samples in a per-pixel linked list, which seems to have replaced depth peeling. Note that I’m focused on accurate representations, so moment-based OIT isn’t an option here.
Here is the actual shader code. You can look at it later. Every mesh has a 16 bit id. For every sample we add the value of 1 plus the hash shifted 8 bits to the left. The bottom 8 bits store the count, the middle 16 bits will store a unique hash, and the top 8 bits are to prevent overflow.
The actual passes for OIT are very similar to the passed for opaque geometry. We sort the samples by material, and perform an execute indirect for each material which stores the GBuffer data. Then we have a lighting pass, just like the opaque pass. For edges detection, we can use the hash instead of the depth, normal, and instance id. And the reconstruction chooses the a neighbor from each of the 4 regions.
If we can solve these problems like refraction and depth of field in a generic way it should be possible to create a reasonable approximation that “just works”. And my hope is that sparse OIT can be a step in that direction.
评论区
共 2 条评论热门最新