- Added a whole new book covering the pipeline in detail
- Added 2 new videos and 32 new links to great articles, whitepapers, …
- Extended the section copying data from HDD to graphic card.
- Updated some terms in the pipeline animation and made clear that this is only the logical pipeline.
- Updated some smaller text passages in the book about the problems.
- Added some new problems.
- Added a new solutions and some general words.
See Render Hell 1.1 Change Log
A lack of knowledge sometimes can be a strength, because you naively say to yourself “Pfff..how complicated can it be?” and just dive in. I started this article by thinking “Hm…what exactly is a draw call?”. During my 5-Minute-Research I didn’t find a satisfying explanation. I checked the clock and since i still had 30 minutes before bedtime i said …
… and just started. This was two months ago and since that i was continuously reading, writing and asking a lot questions.
It was the hardest and low levelest research i ever did and for me as a non-programmer it was a nightmare of “yes, but in this special case…” and “depends on the api…”. It was my personal render hell – but i went through it and brought something with me: five books, each representing an attempt to explain one part of rendering from an artist perspective. I hope you’ll like it.
Open this Book
Open this Book
Open this Book
Open this Book
Open this Book
Thanks goes out to all readers but especially to the people listed below. This article wouldn’t be there without you guys! Thank you for answering all my questions, reading over all my text iterations and supporting me.
Merlijn Van Holder
Links & Resources
[v01] CPU vs GPU Demonstration with Paint-Gun-Robots
[v02] Multiple Materials in one Draw Call
[p01] Overview about Rendering, APIs and all that stuff
[b01] Real-Time Rendering: Page 711
[a01] MSDN: Accurately Profiling Direct3D API Calls (Direct3D 9)
[a02] GPU Programming Guide GeForce 8 and 9 Series
[a03] MSDN: States (Direct3D 9)
[a04] MSDN: Efficiently Drawing Multiple Instances of Geometry (Direct3D 9)
[a05] Understanding Modern GPUs
[a06] A trip through the Graphics Pipeline
[a07] Understanding GPUs from the ground up
[a08] Flushing the pipeline
[a09] Sides: Avoiding Catastrophic Performance Loss
[a10] Radeon R5xx Acceleration
[a11] SIGGRAPH 2006: GPU Shading and Rendering
[a12] Tool: GPUView for performance measurement
[a13] Real-Time Graphics Architecture
[a14] How GPUs Work
[a15] NVidia GPU Gems Book
[a16] Wikipedia: Shader
[a17] Wikipedia: Graphics Pipeline
[a18] ExtremeTech 3D Pipeline Tutorial
[a19] Linux Programmer’s Reference Manuals
[a20] More AMD References (like [a10])
[a21] OpenGL Lecture
[a22] Learning Modern 3D Graphics Programming
[a23] OpenGL Programming Guide
[a24] Draw Call Batching
[a25] OpenGL Step by Step
[a26] OpenGL 3 & DirectX 11: The War Is Over
[a27] Rendering Pipeline Overview
[a28] GPU Parallelizable Methods
[a29] Parallelism in NVIDIA GPUs
[a30] Many SIMDs Make One Compute Unit
[a31] PowerPoint: Modern GPU Architecture
[a32] Unreal: Layered Material
[a33] Unity: One draw call for each shader
[a34] Reducing GPU Offload Latency via Fine-Grained CPU-GPU Synchronization
[a35] Accurately Profiling Direct3D API Calls
[a36] Technical Breakdown – Assassins Creed II
[a37] NVidia GPU Gems 2
[a44] The minimum number of triangles per draw call
[a45] How GPU Shader Cores Work
[a46] Interpolant Shader Processes
[a47] Latency numbers every programmer should know
[a48] Structure of the GTX680 GPU
[a49] Structure of the Tegra K1
[a50] Comparision: Structure of Kepler vs Maxwell GPUs
[a51] Structure of the GTX680 Kepler GPU
[a51] NVidia GF100 Whitepaper
[a53] Wikipedia: Processor Registers
[a54] Life of a triangle – NVIDIA’s logical pipeline
[a55] Fast Tesselated Rendering on Fermi GF100
[a56] TechRadar: Nvidia’s Fermi graphics architecture explained
[a57] GLSL Core Tutorials – Primitive Assembly
[a58] Image Processing and Computer Graphics – Rendering Pipeline
[a59] Geometry Shader Programming in OpenGL
[a61] OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
[a62] A SIMD-efficient 14 Instruction Shader Program for High-Throughput Microtriangle Rasterization
[a63] Article based on the GF100 Whitepaper
[a64] GLSL Core Tutorial – Rasterization and Interpolation
[a65] Wikipedia: Kepler Architecture
[a66] Guard Band Clipping by NVidia
[a67] CryEngine Documentation about Overdraw
[a68] NVIDIA OpenGL extension showcasing perf benefits of new concepts in APIs
[a69] OpenGL From Zero To Hero
[a70] Nvidia Guard Band Clipping Power Point Presentation
[a71] Cuda Core Programming Guide: Compute Capabilities
[a72] Interactive Indirect Illumination Using Voxel Cone Tracing
[a73] Triangle Tesselation
[a74] Quad Tesselation
[a78] Humus: Triangulation
[a78] Humus: Particle Trimming Tool
[a79] Fermi GF100 Graphics Processing Unit (GPU)
[f01] 2 Materials on one mesh
[f02] Which is faster
[f03] Multiple Materials with one glDrawElements()
[f04] What Is A Draw Call? How Does It Effect My Product?
[f05] Why are draw calls expensive
[f06] A great reddit discussion about the content of this article
[f07] Another great reddit discussion about the content of this article
This is pretty cool! Thanks for the guide, I’d been wondering about all of this for a while now. Good stuff :)
As for current-gen GPU core amounts, I usually look at AnandTech’s GPU comparisons, e.g.:
http://www.anandtech.com/show/8069/nvidia-releases-geforce-gtx-titan-z (scroll down a bit for a table.)
Nvidia (didn’t check ATI) have the number of cores listed in the tech specs for their GPUs, and in the feature lists on their website.
I vaguely remember reading that there are seperate, specialized cores for handling textures, but I don’t know anything certain about that.
Oh really nice! Thanks for the link! That’s the first time i see some numbers. Might not explain anything but now i’ve a vague idea about what count we’re talking about :)
For GPU core counts, looking up any GPU’s specification page would do. Take the best GPU so far, GeForce GTX 780 Ti for example: http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-780-ti/specifications
The webpage reads 2880 CUDA cores.
That’s a great work you’ve done there. I enjoyed reading it all along. Keep it up:)
Uhw, nice! Thanks for the link. I’ll add it to the link list later. Awesome :,) Oh and thank you for the big compliment :) Glad you enjoy reading it!
Fastest NVIDIA card: GTX TITAN Z (5760 cores)
Fastest AMD card: Radeon R9 295X2 (5632 cores)
Both cards have two GPUs, so it’s 2880 (NVIDIA) or 2816 (AMD) cores per GPU.
Wow, cool! Thank you for those numbers! :)
Cool article, very good explanation, maybe in the next chapter you could go more in depth regarding vertex & index buffers. Keep up the good work im sure your site will get lots of traffic.
GLad you like the article! Hm i’m not sure if i shall do another article in that technical level :D It almost crushed me and i’m always not sure how thrustworthy i’m as an artist are, when i try to explain programmerstuff. But thanks for the compliment. Regarding the buffers: As far as i know those buffers are justs lists…and the index buffer refers to a part of the vertex list. Are there special questions which are bothering you?
I have a suggestion for another tip right after “ask the coder”:
Embrace the coder
First do this literally, he or she won’t bite. Done? Good. Now think about your relationship: Normally the artist starts with an asset solely based on artistic premises. In the next step the coder(s) will try to optimize the things you want to display as much as possible, hopefully with your help.
Why not reverse this process and start with an interesting technique? After all this is how many great games were made, a good example being Minecraft. Yes, it does not have fancy high-end graphics, but that’s because the foremost goal was to create a world that is completely editable.
Yes of course, good communication is key and i think from the start all the different artisans have to work together and don’t sit in a separated room, thinking about something for 2 years and then confront the team with there special idea which just isn’t possible to execute. But minecraft has its issues too – all those cubes need to be handled and i just saw an article recently where they solved some sorting issue because they weren’t able to cull the dungeons below the surfaces (which weren’t visible BUT which were in your viewcone). So even this simple style has its problems. :D
Hopefully this article gives you, and other artists, a better appreciation of the programmers :) It seems like we are becoming less and less relevant as the game engines slowly try to replace us with artist friendly tools.
Hehe sometimes i think the other way around. While art is often outsource-able, you always need a lot coders in the core team. Sure, you can “easily” create a standard shooter with an engine which gives you the tools to do that, but mostly you need a unique selling point (e.g. portals) and if the engine doesn’t support such a game mechanic, you always need programmers. But when i see that this Limit Theory guy does (all procedural generated graphics) and how stunning it looks, i feel fear about my future :D
But in general i think every department deserves appreciation. I really don’t like these “fights” about designer vs coders etc – I really like to work together with programmers, designers, testers, … :)
very nice writeup, complete with very fun and nice animations…and you used html5/webm! thanks for that. not doing that wouldve brought any high end system to it’s knees in any browser (e.g. using gif, flash, etc). now if only more people could follow your example :)
Thanks for the compliment :) I can understand thath people use GIF because it’s just simple. I had to to several tests and only because of very nice twitter followers i was able to manage that those videos run on every browser and operating system. But of course, it saves a lot of space! On the other side: i received a message that those videos make use of a core to 100% in firefox….so maybe gif is less CPU-dependent? Anyway, i’ll use webm/mp4 in the future and i’m really happy with it. And i’m glad that you like it too :)
Very nice article, i would had loved to find as good explanation three months ago :P I and our lead programmer have been going through the exactly same research this summer for our upcoming game . Couldn’t highlight more how important it is to have good communication between artist and programmer.
Keep up good work, been loving your game art tricks series :D
sorry, failed with the tags, how do i edit?
Thanks man. Sorry for coming too late :D I’ve wished to finish this beast faster but it took me two month :D Your game looks nice! Just faved it :)
Hi. I just want to say BIG THANK YOU for this article. I remember days long ago when I started to learn 3d graphics and I really missed articles like this – basic things from the very beginning. Thought I’m a programmer and absolutely not an artist I’m reading your blog with a great pleasure and I hope that you won’t stop and continue share you knowledge. Simon if you have any questions about programming/graphics you can freely contact me, maybe I can be useful (or you’ll teach me something new, hehe :) ).
Hi Nikita! Thanks a lot for this offer! But beware, i can have a lot questions. In fact, most of my articles only exist because i had the luck to have people to ask. I annoyed Timon (first place in the thanks-list) almost every day and wrote long mails to other programmers and stole their time :D
Oh and…pssst…this isn’t my knowledge. To be honest, most of the stuff was surprising to me and i had to do research to find out how that stuff works :) So it’s actually my not-knowledge, which makes me have questions and write the answers into articles :D
Great article! I really liked the animations, it strengthened your explanation. Keep up the good work :) For GPU cores you can check these links :
Wow cool, thanks! This will all go into 1.1 of the article :) I need a bit time for the preparation but then it will be included. Thanks a lot!
WebMs dont work!! I wish I could see these awesome animations that reddit is talking about :(
Should be fixed now. The server was overloaded so i moved all videos to vimeo and embedded them. I hope there are no problems anymore?
Hello, I can host you something if you need, I have a server in France. Contact me if you need!
Thanks man! As far as i see, it wasn’t the traffic, but the processor power needed to decompress hte mp4/webm videos. I moved them to vimeo, now it should work :)
Been enjoying going through this writeup over the past few days :) Wish I’d found a nice overview like this when I was learning these things originally.
As for video hosting, maybe try services like Gfycat, or maybe even Coub or Vine? What did you make them in—is there any chance it can export to SVG Animation or an HTML5 script? (I’m honestly not sure if that would perform better or worse than GIF, but it would sure be smaller at least!)
Thanks for the compliment :) Regarding the html5: i have no idee :D but i moved that stuff to vimeo and it should work. i already used standard html5 video tags but it seems that server cpus don’t like that :,( OR the server CPU was jealous because he wasn’t in the article :D
Hi Simon! Maybe you can host your WebM files here:
I’ve used it before, but it seems to work…
And thank you for the awesome write-up. This is super helpful!
Thanks for the suggestion :) I use vimeo for now….does it work for you? Oh man, i’m so glad that some people find that helpful. I was often very near to give up because i thought “nobody needs that stuff” :D
Vimeo is great. The article is great. You are great.
Stop making me blush :D Thanks! But actually YOU are great, you take the time and read my stuff and even give me feedback. That’s so cool :)
Great article! This is the most understandable sum-up of CPU-GPU interplay I’ve seen, with hilarious animations to boot.
One relevant technique that bears mentioning, though outside of the scope of the article, is billboards. It’s one of the oldest tricks in the book, long predating programmable shaders. It’s how Creative Assembly rendered thousands of Japanese fighting men back in Shogun Total War, and, more subtly, the same way they showed even greater numbers in 2003 with the “fully 3D” Rome. Believe it or not, even GTA 4 renders crowd members as anonymous animated billboards when things get really heavy.
Why are billboards so effective? You can show anything in a quad, and a quad can be cheap no matter what. In the case of batching, it’s not so painful to upload four verts per object per frame to the GPU. This scaled to the thousands even a decade and a half ago. In the case of instancing, you’re elegantly liberated from the restriction that all objects each command share unique defining geometry, simply because a quad is generic geometry defined by its texture content.
And yes, billboards are cool. Especially when they get rendered dynamically by the engine (Imposters). Do you know, if the textures for the billboards in GTA are pre-calculated?
They definitely appear pre-calculated in GTA 4, as in the Total War games. It looks goofy as hell when you focus in on the people, yet I only noticed the other day, having clocked hundreds of hours in the game before. Rockstar got away with murder!
Have you ever seen dynamic imposters used for distant chunks of environments, apart from small objects like trees?
Here, I’ve collected some examples of what I call “shadow people” in GTA 4, including side-by-sides and single shots. Zoom into the image. Note that they respond to the lighting environment, but have no color information- they’re just gray blobs! Presumably this is to make them reusable across more pedestrian types. It’s fascinating how Rockstar pulled them off with just a human silhouette and grounding in the environment through lighting.
They animate more smoothly than you’d expect for imposters, but what makes them clearly pre-calcuated is the limited directions they’re visible in, which itself is only apparent when they’re running away from the player’s carnage.
Thanks for the picture! This is really interesting. I would also think that they are pre-calculated but i must say, that imosters in distance are only updated if the viewing angle changes drastically. A *plop* between direction changes would be expected i think – so even real-time generated imposters could look like they were pre-calculated … i would think (but i don’t know it).
No, that*s why i wrote to the guy from Limit Theory because he said in his last dev diary that he uses imposters for asteroids. I would love to see this in action (if they are real-time generated) :D
Nicely explained article. Really liked the way you have explained some of the complex stuff neatly. Please keep up the great work.
Thanks for the kind words :) With all that great feedback i can’t not continue :,)
Great post Simon :)
If your looking for numbers on how many triangles you can draw for “free” on different GPUs you can find them . This is for OpenGL but I suspect they will be the same for Direct-X as it is decided by the hardware more then the API.
Also the same guy did a similar article on the optimal number of triangles.
Oops, messed up the tags… The last line in the end was about the optimal number of TRIANGLES.
Wow Thanks! That looks exactly like what i searched for. cool! I corrected your comment :D
Very nice, did not read everything but what got me is this: “A draw call is a command to render one mesh.”
I don’t know if you clarify later. But it is more than “one mesh”, it’s a set of buffers. With modern techniques you can actually draw the whole scene with one multi-draw-indirect call (you must use some uber-shader). See the following for details:
And here discussions how to implement it for Ogre3D, some complex sh#t, the user “gsellers” here is the author of the previous links content:
I mentioned that, modern system fill a command buffer and send it as whole to the GPU and/or are able to fill several buffers at the same time. But your links look very good and i’ve to read them later and will add them to version 1.3 of the article. Great, thanks for your time, comment and the links :)
That was an excellent well researched and presented article. Thanks loads for taking the effort to piece this together!
Thank you very much! I’m working on version 1.2 and hope you’ll like it too as soon as it’s released :)
As an artist, I find asset optimization intimidating but you really broke it down. I was having trouble trying to decide on modular characters or not (swappable hair/armor/weapons) including lots of small textures; now I see the overhead this creates. It’s good to know since it is unnecessary for my game and was more of an uninformed design choice. I need to optimize since I am limited to the restrictions of a console but I wasn’t sure how. This is exactly what I was looking for. Thank you so much! :)
Glad to hear that i could help :) But you should also checkout other sources and maybe post the question in some forums. It’s really hard to define overall rules – so many dependancies. It’s all so complicated :(
Is there already something to show about your project? :)
I don’t have anything online yet, but I hope to release it late this year. It’s a fantasy turn-based tactical RPG for the Wii U. My programmer partner is optimizing his side, I’m glad now that I can also optimize mine. The game is not complete so we cannot tell if more optimization is necessary yet, but everything is running very well so far. :)
Sounds great! For Wii U? This is kind of special, right? I’m looking forward seeing some screenshots :) Not many people developing something for the Wii U i’ve heard. Is it hard to get a DevKit or does Nintendo support “external” developers as much as e.g. Sony?
It’s as easy to become a developer for Nintendo as it is for Sony. There are a bunch of indie developers getting projects ready for release on Wii U but not a whole lot of them, especially compared to some other systems.
If you’d like I could send you a link for screenshots on Reddit once we go public with our game.
Didn’t know that. Thanks for updating me :) I wish you the best for your game and would love to see what you’re working on!
“… and brought something with me: Four books,” – its actually Five already :)
Ups :) I’ll change it. Thanks for the hint!
Oh how could I ended up here ?
THIS IS A TREASURE CHAMBER OMG XD
Thanks for making this, I’m sure learned a lot from this.
Thanks for the tut.
I was asking my self a question but maybe you can answer me.
I need to do a shader where I need a gradient in it to make an effect.
I have to choice:
– Create a new map with the gradient
– Create a uv2 where I will use the Y coordinate to drive the gradient
Which case do you think is more optimized ?
Having one more map or having one more uv set ?
I think the question goes about having several different gradients in one texture, right? Or is there another reason why you want to sample the gradient-texture by Y of a second UV-Set (for example to morph the gradient-colors over time)?
It’s more about performance. Extra texture vs extra uv set.
Also sampling a gradient with the uvs will be more precise if I’m not mistaken, no chance to have color banding
I would suggest to put the video links to open in a new tab.
I kind of closed the tab a lot of times after watching the videos.
Thanks for the suggestion! I wonder: Are you watching the content on mobile? Because on PC the videos should just start right there so I wonder how it happened that you accidentally closed a tab?
The Render Hell articles are really great! It help me a lot to understand the GPU HW.
I come from China. Can I translate these articles into chinese on my blog? I’ll not change the author of them.
Waiting for your reply. Thanks very much!
Yes of course. That’s super cool! If you give me the Link to your blog afterwards I’ll link to it :) Looking forward to your translation!
Thanks for your approval! I have translated the Book I ~ V into chinese as follow links:
Or you can find them here:
Because the CSDN blog can not upload videos directly, I have to convert all the animations into GIF before upload. That make all the anmations not be true to the original videos. Anyway, I think chinese programer will still be interested in these articles and your videos.
Thanks again for your effort!
Wow super cool! Thank you so much for your work! I’ll add a link the articles. Do you have a Twitter account?
I have no Twitter account because of the China network limitation. In China, we use WeChat instead of Twitter.
But you can still send email to me!
My e-mail: firstname.lastname@example.org
Oh ok :) I used your blog-account to link your name. I hope your audience likes the articles! Thanks again for the translation!
Wow! That’s really cool!
Thank you all the same!