A Macro View of Nanite

haxiomic · on May 31, 2021

For those out of the loop: Unreal Engine 5 is in early access, two of the most exciting new features are Nanite and Lumen

Nanite is a system for efficiently rendering masses of geometry (more so than using traditional methods)

Lumen is a technique to enable global illumination (rendering with multiple light bounces, traditional engines do 1 bounce + trickery) by first converting geometry to signed distance fields. Signed distance field techniques are very interesting, swapping triangles for analytic surface equations, they can be used to accelerate complex physics and lighting calculations. They've only recently started making their way into big commercial engines – starting with Media Molecule's Dreams game and now in UE5

Lumen deep dive: https://docs.unrealengine.com/5.0/en-US/RenderingFeatures/Lu...

Signed Distance Functions, Inigo Quilezles: https://iquilezles.org/www/articles/distfunctions/distfuncti...

djmips · on May 31, 2021

SDF techniques have been used in one form or another for over a decade in big engines, but may not with such a front and center role.

_kbh_ · on June 1, 2021

The PS4 game "Dreams" from 2015 is pretty much entirely signed distance fields and software based rendering. It a great example of what you can do with them.

Here is a link to an admittedly very long and very large PDF detailing the methods they used.

http://advances.realtimerendering.com/s2015/mmalex_siggraph2...

jmiskovic · on June 1, 2021

What do you mean by "software based rendering"? I was under impression that all of rendering is on GPU; first with point clouds being calculated in a compute shader, and after that the points being converted to 'splats' and rendered as quads.

It is strange that Dreams didn't make a bigger splash. It is the first time an engine focused on making low-polygon assets look as aesthetically pleasing as possible. The ability to turn up the rendering roughness in some early sketches, and then tune it down once more details are added is incredible. I expected a small revolution with many competing engines reusing the tech, but nothing happened so far?

_kbh_ · on June 1, 2021

software based rendering doesn't use the fixed function hardware stages of the GPU. So it won't use the ROPs or other fixed function rasterisation hardware in the GPU, it instead does everything in software using GPGPU techniques.

jmiskovic · on June 2, 2021

I watched the talk you linked to, but I couldn't understand most of it. Currently reading through Real-time Rendering book to make sense of it.

I thought once they create splats as quads, the rest of pipeline is standard? Each quad should be 2 triangles that get clip view coordinates, then pixel shader paints the brush texture, and then ROP assembles them on screen. Could you point out where I'm wrong? And in what way can a pixel end up on the screen without going through ROPs?

_kbh_ · on June 2, 2021

I'm not actually sure anymore after looking at more content. Based on the video here https://www.youtube.com/watch?v=1Gce4l5orts&t=1225s . It appears that they are even using the splat engine anymore (which I thought they where), and are using the older 'brick' based engine.

paavohtl · on June 1, 2021

Indeed. One of the earliest use cases I remember is Valve using them for decals, text and UI elements in Source Engine since 2007: https://steamcdn-a.akamaihd.net/apps/valve/2007/SIGGRAPH2007....

rurounijones · on June 1, 2021

Out of curiosity and If you are familiar: Is this the same Signed Distance Fields used in Star Citizen for their shield tech [1]?

Or is it a different thing with the same name?

[1] https://robertsspaceindustries.com/spectrum/community/SC/for...

Jasper_ · on June 1, 2021

Correct. An SDF is basically an acceleration structure for an "omnidirectional" raycast. A "sphere-cast", if you will. If you are extra careful, you can also approximate a cone trace. Unreal Engine 4 has used it as a way of doing shadows for objects that don't animate for a long time [0].

But it's now a pretty standard staple of VFX and other sorts of industries to do a cheap approximate raycast on the GPU.

Lumen simply uses the same data structure to accelerate bounce lighting cone tracing.

[0] http://advances.realtimerendering.com/s2015/DynamicOcclusion...

Agentlien · on June 1, 2021

There's a lot more practical information which hints at how it works in the Nanite documentation.[0]

The biggest repeated claim there is that Nanite makes rendering cost scale primarily with screen resolution and with little impact from scene complexity.

I haven't had the opportunity to test it myself, but I've spoken to colleagues who got the demo running, ran some tests, and analysed it in RenderDoc. As incredible as it sounds, at a glance it seems Nanite largely delivers on what it promises. I've been very skeptical, but this is really exciting.

[0] https://docs.unrealengine.com/5.0/en-US/RenderingFeatures/Na...

whateveracct · on June 1, 2021

Super cool stuff. There are few things in computer science as mentally stimulating and satisfying as computer graphics. And this new wave of UE5 stuff just goes to show it.

My gamedev path is (stubbornly?) along a low-level path through C and Haskell. So I doubt I'll ever use UEx. But I can't wait to know enough to learn from this and build libraries for myself.

omniscient_oce · on June 2, 2021

Quite a combo. Why Haskell and what have you done with it graphics-y by chance?

whateveracct · on June 3, 2021

I'm by far more proficient in Haskell than any other language (half a decade professionally at this point), and in general I find that it really helps me manage complexity, which games bring in spades. Abstraction feels like it is key, and Haskell so far even has really allowed for nice abstraction.

I've mostly done some 2D stuff, at most using some fixed-function-pipeline-sort of things (blending.) I'm still early in my computer graphics journey, but the Haskell Vulkan bindings are good (there's even a Quake3 engine written with Haskell and Vulkan) so I figure starting with learning Vulkan will be a good starting point. I'm a CompE by education so I would really prefer to start low-level I think.

holoduke · on May 31, 2021

Really impressive. I sometimes wonder if in 30 years from now it would be still possible to write a 3D engine from scratch and having the same graphical level of that time.

Cloudef · on June 1, 2021

Its possible. General purpose game engines still can be many times slower than custom one. The main reason general purpose engines are used is because they allow faster development and have standardized workflow which is very important for game studios that have large amount of people working together.

Here for example is impressive GI implementation from single Path of Exile dev https://youtu.be/OPFvcsQAKjc

okamiueru · on June 1, 2021

Impressive as it is, and also not trying to take away from your main point, but I thought I'd mention that that particular demo is using a screenspace method, which doesn't need to deal with the main challenge of full global illumination.

seoaeu · on June 1, 2021

For a single person, it is unclear if that's really even possible now. The only way I see that changing is if hardware advances to the point where the trivial rendering approach can actually run in real time. But even then, so much of impressive visuals from 3D engines are actually the assets / procedural effects you feed into them, so large teams might have the upper hand indefinitely.

badsectoracula · on June 1, 2021

I do not see why not, people were thinking like that 20 years ago, it isn't like your 3D engine will have to implement all techniques from the last 50-60 years (no need for wolf3d raycasting, for example and a lot of the shading and lighting code will become simplified once everything is capable of raytracing). Also AFAIK UE5 contains something like 5 render paths for different rendering approaches for all the different hardware and game types they need to cater to, but a specialized engine wont need to be that generic.

taneq · on June 1, 2021

I don't think it's been possible for years. A person can write a cutting edge tech demo but it's not really usable as a general 3D engine. A small team might be able to write a limited 3D engine that would be state of the art in some circumstances.

pkAbstract · on May 31, 2021

It probably will be, just extremely computationally expensive. But 30 years is also further out than when most experts predict the singularity will arrive; after which all bets are off.

bangonkeyboard · on May 31, 2021

This is a lot less exotic than expected.

Jasper_ · on June 1, 2021

There is a lot of complexity here not covered by the overview, such as the way that the clustering works (the adaptive splitting like that means that we have not just a tree but a full DAG traversed by the GPU), the way the software rasterizer works (lots of careful balancing to manage the GPU well), the way that the materials are calculated on this geometry (don't have GPU derivatives accurate enough, so analytic derivatives are used instead), the interaction with the rest of the frame (decals use stencil so a lot of juggling is done to rearrange things). Heck, even the two rasterization passes to handle occlusion is pretty unobvious.

The 10000ft overview is perhaps easy to understand but the details are anything but basic.

kaetemi · on June 1, 2021

Marketing speak makes a lot of tech sound more complex, advanced, or magical than it usually is. This sadly seems to be becoming more common lately in the 3D graphics industry.

(In a similar marketing trend, Nvidia's "Deep Learning Super Sampling" is more like 2x2 grid deinterlacing, not magical AI upscaling.)

kanwisher · on June 1, 2021

DLSS literally has a super computer to train models to create accurate AI models to do upscaling for games. you are horribly mistaken https://www.nvidia.com/en-us/geforce/technologies/dlss/

kaetemi · on June 1, 2021

That page is exactly the marketing speak I'm talking about, too. Watch the technical explanation instead.

https://youtu.be/d5knHzv0IQE?t=1240

kaetemi · on June 1, 2021

It's not upscaling. The rendering is still an alternating 2x2 grid.

pornel · on June 1, 2021

2x2 deinterlacing is TAA. DLSS goes beyond that, and 2.0 really delivers to the point it looks better than TAA at native resolution.

kaetemi · on June 1, 2021

DLSS is TAAU, it's still temporal, based on a 2x2 grid. 2.0 is simply a ML implementation. It doesn't upscale single frames, it doesn't add additional detail to textures. It's "temporal upscaling", which is pretty much just a fancy way of saying deinterlacing.

pornel · on June 1, 2021

If discount the entirety of the implementation responsible for the quality improvement as "simply ML", then it is just fancy a deinterlacing indeed. Just like every ML breakthrough is just a fancy matrix multiplication.

But the fact remains that DLSS 2 is friggin amazing: achieving combination of visual quality and performance that was not possible before it, and in some cases it looks even better than rendering at the native resolution. It's solving a hard problem, and does it very very well.

kaetemi · on June 2, 2021

Yup, it is an impressive tech.

My point is rather that most of the marketing speak makes it sound like an _impossible_ upscaling tech, rather than an impressive temporal super sampling tech.

I do wonder about the "better than rendering at native", though. In true TV-set marketing fashion, DLSS 2.0 comes topped up with a free sharpening filter to make it look slightly sharper than native renders, by default.

kaetemi · on June 1, 2021

https://youtu.be/d5knHzv0IQE?t=1240

It's still alternately rendering a 2x2 grid.

thatswrong0 · on May 31, 2021

My idealism for elegant solutions (that is often thwarted when I actually try to create usable programs) appreciates this.

nynx · on May 31, 2021

It seems like the magic part here is the knitting together of clusters with different LODs.

How does the streaming part work? I don't think this article mentioned it.

kaetemi · on June 1, 2021

If you remove the triangle strips between two clusters of triangles, and represent them as one big polygon, it's "just" a retriangulation problem.

kaetemi · on June 2, 2021

Alternatively, for each cluster, render the edge triangles of the cluster at the neighbouring cluster's resolution with Z-write disabled, then render the cluster (overdrawing the real edge triangles), then render the Z-write of the edge. That'll reasonably fill the cracks with little overhead.

HellDunkel · on May 31, 2021

May be completely wrong but it looks as if triangles are subdivided to approximate the next lod level while avoiding cracks.

devit · on May 31, 2021

This is one the things that it's surprising that it wasn't available before.

As far as I can tell this architecture was possible at least starting from the introduction of DX11 hardware more than 10 years ago, and it seems to be the most straightforward way of rendering a realistic 3D scene.

djmips · on May 31, 2021

A lot of what makes Nanite possible isn't available 10 years ago and that's how compute is integrated into the 3D pipeline. There were tricks then but now it's supported by design. As you can see in the breakdown a whole lot of the pipeline is done in compute.

vblanco · on June 1, 2021

Parts of this pipeline have been used since 2015. You can see most of the techniques explained in here https://www.advances.realtimerendering.com/s2015/aaltonenhaa... . What nanite does is to continue the development of this by combining it with Visibility Buffers (decouples rasterization from materials) and adds incredibly fancy LOD system and rasterization for small triangles.

Jare · on May 31, 2021

The architecture may be possible in the sense that feature-wise you have the pieces (or most of them). But resources like memory, bandwidth and gpu speed, would fall way below the thresholds and tradeoffs needed to make this useful and not just "possible".

HellDunkel · on May 31, 2021

GPU memory has been a limiting factor but it also took photogrammetry to capture models of sufficient detail to justify the effort.