Global Illumination is fundamentally incoherent, and it’s a challenge to solve that light transfer efficiently when GPU’s are designed for memory and execution coherency.
硬件光线追踪很强大——并且是未来的方向,但我们也需要相对小规模的备选项。在PC市场上仍有很多图形卡(就是显卡 video cards)不支持硬件光线追踪,同时主机提供的硬件光线追踪也不够快。
我们也希望能处理网格高度重叠的场景——这在硬件光线追踪的两层加速结构中运行很慢。
因此我们需要开发一个软件光线追踪方案来突破这些限制。
When we started working on Software Ray Tracing, one of the first things we tried was to capture the scene using a bunch of orthographic cameras, giving what we call cards. We then ray trace through the card heightfields, and sample the card lighting when the ray hits.
所以作为替代我们选择了网格有向距离场(Mesh Signed Distance Fields)作为我们软件光线追踪的几何描述结构。这能带来可靠的遮挡关系,所有区域都被覆盖了,并且我们仍然能通过球面追踪方式来快速进行射线追踪——跳过了空白的空间(距离场的特性保证的)。
与距离场的相交仅能获知位置和法线信息,因此我们无法查找材质参数或光照。
We interpolate the lighting where the ray trace hit from the cards, which we call the Surface Cache. Areas that are missing coverage only result in lost energy, instead of leaking. Ray tracing the card heightfields didn’t work, but using them for lighting does.
The easiest way to do a compaction is to use local atomics to allocate the compacted index. That has the effect of scrambling the rays, which you can see on the left. The red lines are all the different rays within a single wave, and they’re now starting from different positions within the scene.
Those primitives are stored in a two level structure, where on the bottom level we have our primitives and top level is a flat instance descriptor array. This approach allows us to leverage instances for storage and decreases memory usage, which is important for any kind of a volumetric mesh representation.
For generation we use Embree point query to efficiently find distance to the nearest triangle. We also cast 64 rays from every voxel and count backface hits to decide whether we are inside or outside the geometry, which determines the distance field sign.
Volumetric structures don’t scale well with resolution and are quite memory intensive, so we store only a narrow band distance field inside a mip mapped virtual volume texture.
Distance field bricks are stored inside a fixed size pool, which is managed by a simple linear allocator. It’s a convenient setup, as we don’t need to deal with any kind of a variable sized 3d allocations or resulting fragmentation.
Bottom level is different, as per instance instead of a 3d distance field, we raymarch a 2d heightfield and try to find a zero crossing. After finding two samples, where one is above and the other is below heightfield, we linearly interpolate between them to approximate the final hit point.
At this point we have all the data in the memory and we know how to trace an individual instance. Now we need to figure out how to trace the entire scene as we cannot just loop over all instances in the scene and raymarch each one of them.
We tried BVH and grids. Those are really nice acceleration structures as you can build them once per frame and then reuse them in multiple passes. Unfortunately performance of long incoherent rays wasn’t good enough. Software BVH traversal has a quite complex kernel. Grids have complex handling of objects spanning multiple cells. On top of that scenes with overlapping instances require to raymarch each one of them in order to find the closest hit.
This was an important realization that we need a precise scene representation only for the first segment of a ray and after that we can switch to a coarse scene representation. This also gave us an opportunity to solve the object overlap issue, as now we can merge entire scene into some simplified global representation.
For cached updates we track all scene modifications and build a list of modified bricks on the GPU. Next we cull all the objects in the scene to the current clipmap and then cull resulting list to modified bricks. During the last culling step we sample mesh distance fields for more accurate culling that checking analytical object bounds.
We tried finding the nearest point through an analytical gradient and then recomputing distance from it, but it didn’t work well in practice due to the limited distance field resolution. In the end what worked for us is simply to bound the distance field using distance to analytical object bounds. Most of the non-uniformly scaled objects are also simple shapes like walls so it works really well in practice.
最终我们更新粗粒度的mip——它是一个四分之一精度的非稀疏距离场容器(non sparse distance field volume),用于加速空白空间的跳过过程。我们在执行步进时使用粗粒度的mip替代clipmap层级——因为我们的clipmap有着不同的LOD层级,在最大的层级中可能会丢失一些物体。
With the assumption of tracing only short rays we don’t need BVH or world space grids anymore. Instead we can cull objects to an influence froxel grid, where every cell contains a list of all objects which need to be intersected if a ray starts from that cell.
Next to fill this grid we scatter objects by rasterizing their object oriented bounds. Inside the pixel shader we do extra fine culling by sampling the mesh distance field.
First one is that many meshes aren’t closed. This often happens with scanned meshes or simply meshes which aren’t supposed to be seen from the opposite side. It’s not an issue for rasterizer, but in case of a distance field it produces a negative region which will stick out from the geometry and break tracing.
To solve this problem, during the distance field generation we insert a virtual surface after 4 voxels. Or in other words we wrap negative distance after 4 voxels.
In this diagram you can see an example of a thin wall, which is placed between the sampling points. Evaluating such distance field won’t ever result in a zero or negative distance and ray marcher won’t ever register a hit. Gradient computation will be also incorrect as gradient around this wall will be zero.
This expand fixes leaking and now we can reliably hit any thin surface. Gradient also will be fixed, as we will be computing it further away from the surface where we have reliable distance field values.
这种延展的缺点是会导致过度剔除,同时我们也需要更大的表面偏移量来判断离开表面的位置(to escape the surface)——这会导致破坏邻接阴影的问题。(*就是说bias要额外把延展出的这一段考虑进去,对算阴影来说就偏差更大了)
让我们看看如何改进表面bias的问题。
We preserve the original distance field data and expand surfaces at runtime, which allows us to start at the surface and then linearly increase expand as we move further away from it. This way we can trace that initial ray segment instead of just skipping it and losing all contact shadows.
We mark distance field instances based on the two-sided material and then resample this data into a separate global distance field channel. Coverage allows us to distinguish solid thin surfaces which should block all the light, from surfaces with partial transparency, which should let some light pass through.
During ray marching every step we sample the coverage and based on it we increase raymarching step size and decrease expand. Additionally we use coverage for stochastic transparency on every hit to decide whether we should accept this hit or we should continue tracing.
Distance fields don’t have any vertex attributes and we can’t run material shaders on them. We have access only to position, normal and mesh instance data. This means that we need some kind of a UV-less surface representation to be able to shade those hits.
Next for every surfel we trace 64 rays and count the number of triangle back face hits. If most of those hits are back faces then given surfel is inside geometry and we can discard it. We also compute surfel’s occlusion based on the average distance to hits. Occlusion will be used to determine how important it is to cluster a given surfel.
Final step is a global optimization, where we re-grow all clusters in parallel from their current centroids. Again we do a few iterations of parallel growing until we hit a limit or clusters don’t change anymore.
We weight each texel by delta between stored depth in surface cache and ray hit depth to discard occluded samples. We also weight texels by card projection normal to prevent projection from stretching. Then we discard texels marked as invalid.
In order to support multiple bounces for every indirect ray hit we sample current frame’s direct lighting and last frame’s indirect lighting. So for every frame we compute the first two bounces and the following bounces are then feedback based.
Finally interpolated results are temporarily blended into the indirect lighting atlas. Alongside this atlas we keep a current number of accumulated frames. Indirect lighting update rate is quite low and we need to limit the total number of accumulated frames to 4 in order to minimize ghosting.
We weight every sample by weight stored in the alpha channel. This weight allows us to account for the missing cards and for card re-projection onto a fixed world space axis.
评论区
共 7 条评论热门最新