Hey, I'm Simon and I'm making a game!! :)
Support me: Wishlist & Play Demo on Steam

I didn’t embed the video directly to avoid any tracking from Google and complications with the DSGVO.

See Render Hell 1.1 Change Log


  • Added new possible problem-source about batching here.
  • Added a note about the pipeline simplification here. Madsy9 pointed it out here.
  • This article was honored with two reddit threads. Here and here.
  • Added better FIFO explanation here. Thx sccrstud92, tmachineorg and koyima
  • Cort mentioned this these slides about “How Shader Cores Work”.
  • NVidia specifications with actual core counts. Thx Jan, Volkan and Hakan Candemir
  • Mr.Yeah also wrote actual core counts in his comment.
  • Corrected some misspellings. Thx OddEyesCG for mentioning!
  • Videos are now only auto-played when you scroll to them (to prevent another server overload). Thx Flannon and RXMESH for the code.And garblefart for all the feedback.

A lack of knowledge sometimes can be a strength, because you naively say to yourself “Pfff..how complicated can it be?” and just dive in. I started this article by thinking “Hm…what exactly is a draw call?”. During my 5-Minute-Research I didn’t find a satisfying explanation. I checked the clock and since i still had 30 minutes before bedtime i said …

“Pfff, how complicated can it be to write it by my own?”

… and just started. This was two months ago and since that i was continuously reading, writing and asking a lot questions.

It was the hardest and low levelest research i ever did and for me as a non-programmer it was a nightmare of “yes, but in this special case…” and “depends on the api…”. It was my personal render hell – but i went through it and brought something with me: five books, each representing an attempt to explain one part of rendering from an artist perspective. I hope you’ll like it.

Open this Book

Open this Book

Open this Book

Open this Book

Open this Book

Thank you!

Thanks goes out to all readers but especially to the people listed below. This article wouldn’t be there without you guys! Thank you for answering all my questions, reading over all my text iterations and supporting me.

Links & Resources

Videos
[v01] CPU vs GPU Demonstration with Paint-Gun-Robots
[v02] Multiple Materials in one Draw Call

Podcast
[p01] Overview about Rendering, APIs and all that stuff

Book
[b01] Real-Time Rendering: Page 711

Articles
[a01] MSDN: Accurately Profiling Direct3D API Calls (Direct3D 9)
[a02] GPU Programming Guide GeForce 8 and 9 Series
[a03] MSDN: States (Direct3D 9)
[a04] MSDN: Efficiently Drawing Multiple Instances of Geometry (Direct3D 9)
[a05] Understanding Modern GPUs
[a06] A trip through the Graphics Pipeline
[a07] Understanding GPUs from the ground up
[a08] Flushing the pipeline
[a09] Sides: Avoiding Catastrophic Performance Loss
[a10] Radeon R5xx Acceleration
[a11] SIGGRAPH 2006: GPU Shading and Rendering
[a12] Tool: GPUView for performance measurement
[a13] Real-Time Graphics Architecture
[a14] How GPUs Work
[a15] NVidia GPU Gems Book
[a16] Wikipedia: Shader
[a17] Wikipedia: Graphics Pipeline
[a18] ExtremeTech 3D Pipeline Tutorial
[a19] Linux Programmer’s Reference Manuals
[a20] More AMD References (like [a10])
[a21] OpenGL Lecture
[a22] Learning Modern 3D Graphics Programming
[a23] OpenGL Programming Guide
[a24] Draw Call Batching
[a25] OpenGL Step by Step
[a26] OpenGL 3 & DirectX 11: The War Is Over
[a27] Rendering Pipeline Overview
[a28] GPU Parallelizable Methods
[a29] Parallelism in NVIDIA GPUs
[a30] Many SIMDs Make One Compute Unit
[a31] PowerPoint: Modern GPU Architecture
[a32] Unreal: Layered Material
[a33] Unity: One draw call for each shader
[a34] Reducing GPU Offload Latency via Fine-Grained CPU-GPU Synchronization
[a35] Accurately Profiling Direct3D API Calls
[a36] Technical Breakdown – Assassins Creed II
[a37] NVidia GPU Gems 2

Forum Discussions
[f01] 2 Materials on one mesh
[f02] Which is faster
[f03] Multiple Materials with one glDrawElements()
[f04] What Is A Draw Call? How Does It Effect My Product?
[f05] Why are draw calls expensive
[f06] A great reddit discussion about the content of this article
[f07] Another great reddit discussion about the content of this article

Update 1
I found some sketches I did during writing this article. :)

69 thoughts on “Render Hell 2.0

  1. Jan

    This is pretty cool! Thanks for the guide, I’d been wondering about all of this for a while now. Good stuff :)

    As for current-gen GPU core amounts, I usually look at AnandTech’s GPU comparisons, e.g.:
    http://www.anandtech.com/show/8069/nvidia-releases-geforce-gtx-titan-z (scroll down a bit for a table.)
    Nvidia (didn’t check ATI) have the number of cores listed in the tech specs for their GPUs, and in the feature lists on their website.
    I vaguely remember reading that there are seperate, specialized cores for handling textures, but I don’t know anything certain about that.

    Reply
    1. Simon Post author

      Oh really nice! Thanks for the link! That’s the first time i see some numbers. Might not explain anything but now i’ve a vague idea about what count we’re talking about :)

      Reply
        1. Simon Post author

          Uhw, nice! Thanks for the link. I’ll add it to the link list later. Awesome :,) Oh and thank you for the big compliment :) Glad you enjoy reading it!

          Reply
    2. Mr.Yeah

      Fastest NVIDIA card: GTX TITAN Z (5760 cores)
      Fastest AMD card: Radeon R9 295X2 (5632 cores)

      Both cards have two GPUs, so it’s 2880 (NVIDIA) or 2816 (AMD) cores per GPU.

      Reply
  2. Lee Day

    Cool article, very good explanation, maybe in the next chapter you could go more in depth regarding vertex & index buffers. Keep up the good work im sure your site will get lots of traffic.

    Reply
    1. Simon Post author

      GLad you like the article! Hm i’m not sure if i shall do another article in that technical level :D It almost crushed me and i’m always not sure how thrustworthy i’m as an artist are, when i try to explain programmerstuff. But thanks for the compliment. Regarding the buffers: As far as i know those buffers are justs lists…and the index buffer refers to a part of the vertex list. Are there special questions which are bothering you?

      Reply
  3. Lars Kokemohr

    I have a suggestion for another tip right after “ask the coder”:
    Embrace the coder
    First do this literally, he or she won’t bite. Done? Good. Now think about your relationship: Normally the artist starts with an asset solely based on artistic premises. In the next step the coder(s) will try to optimize the things you want to display as much as possible, hopefully with your help.
    Why not reverse this process and start with an interesting technique? After all this is how many great games were made, a good example being Minecraft. Yes, it does not have fancy high-end graphics, but that’s because the foremost goal was to create a world that is completely editable.

    Reply
    1. Simon Post author

      Yes of course, good communication is key and i think from the start all the different artisans have to work together and don’t sit in a separated room, thinking about something for 2 years and then confront the team with there special idea which just isn’t possible to execute. But minecraft has its issues too – all those cubes need to be handled and i just saw an article recently where they solved some sorting issue because they weren’t able to cull the dungeons below the surfaces (which weren’t visible BUT which were in your viewcone). So even this simple style has its problems. :D

      Reply
  4. John

    Hopefully this article gives you, and other artists, a better appreciation of the programmers :) It seems like we are becoming less and less relevant as the game engines slowly try to replace us with artist friendly tools.

    Reply
    1. Simon Post author

      Hehe sometimes i think the other way around. While art is often outsource-able, you always need a lot coders in the core team. Sure, you can “easily” create a standard shooter with an engine which gives you the tools to do that, but mostly you need a unique selling point (e.g. portals) and if the engine doesn’t support such a game mechanic, you always need programmers. But when i see that this Limit Theory guy does (all procedural generated graphics) and how stunning it looks, i feel fear about my future :D

      But in general i think every department deserves appreciation. I really don’t like these “fights” about designer vs coders etc – I really like to work together with programmers, designers, testers, … :)

      Reply
  5. anon

    very nice writeup, complete with very fun and nice animations…and you used html5/webm! thanks for that. not doing that wouldve brought any high end system to it’s knees in any browser (e.g. using gif, flash, etc). now if only more people could follow your example :)

    Reply
    1. Simon Post author

      Thanks for the compliment :) I can understand thath people use GIF because it’s just simple. I had to to several tests and only because of very nice twitter followers i was able to manage that those videos run on every browser and operating system. But of course, it saves a lot of space! On the other side: i received a message that those videos make use of a core to 100% in firefox….so maybe gif is less CPU-dependent? Anyway, i’ll use webm/mp4 in the future and i’m really happy with it. And i’m glad that you like it too :)

      Reply
    1. Simon Post author

      Thanks man. Sorry for coming too late :D I’ve wished to finish this beast faster but it took me two month :D Your game looks nice! Just faved it :)

      Reply
  6. nikitablack

    Hi. I just want to say BIG THANK YOU for this article. I remember days long ago when I started to learn 3d graphics and I really missed articles like this – basic things from the very beginning. Thought I’m a programmer and absolutely not an artist I’m reading your blog with a great pleasure and I hope that you won’t stop and continue share you knowledge. Simon if you have any questions about programming/graphics you can freely contact me, maybe I can be useful (or you’ll teach me something new, hehe :) ).

    Reply
    1. Simon Post author

      Hi Nikita! Thanks a lot for this offer! But beware, i can have a lot questions. In fact, most of my articles only exist because i had the luck to have people to ask. I annoyed Timon (first place in the thanks-list) almost every day and wrote long mails to other programmers and stole their time :D

      Oh and…pssst…this isn’t my knowledge. To be honest, most of the stuff was surprising to me and i had to do research to find out how that stuff works :) So it’s actually my not-knowledge, which makes me have questions and write the answers into articles :D

      Reply
    1. Simon Post author

      Wow cool, thanks! This will all go into 1.1 of the article :) I need a bit time for the preparation but then it will be included. Thanks a lot!

      Reply
    1. Simon Post author

      Should be fixed now. The server was overloaded so i moved all videos to vimeo and embedded them. I hope there are no problems anymore?

      Reply
    1. Simon Post author

      Thanks man! As far as i see, it wasn’t the traffic, but the processor power needed to decompress hte mp4/webm videos. I moved them to vimeo, now it should work :)

      Reply
  7. RavenWorks

    Been enjoying going through this writeup over the past few days :) Wish I’d found a nice overview like this when I was learning these things originally.

    As for video hosting, maybe try services like Gfycat, or maybe even Coub or Vine? What did you make them in—is there any chance it can export to SVG Animation or an HTML5 script? (I’m honestly not sure if that would perform better or worse than GIF, but it would sure be smaller at least!)

    Reply
    1. Simon Post author

      Thanks for the compliment :) Regarding the html5: i have no idee :D but i moved that stuff to vimeo and it should work. i already used standard html5 video tags but it seems that server cpus don’t like that :,( OR the server CPU was jealous because he wasn’t in the article :D

      Reply
    1. Simon Post author

      Thanks for the suggestion :) I use vimeo for now….does it work for you? Oh man, i’m so glad that some people find that helpful. I was often very near to give up because i thought “nobody needs that stuff” :D

      Reply
        1. Simon Post author

          Stop making me blush :D Thanks! But actually YOU are great, you take the time and read my stuff and even give me feedback. That’s so cool :)

          Reply
  8. JimmyThickNThin

    Great article! This is the most understandable sum-up of CPU-GPU interplay I’ve seen, with hilarious animations to boot.

    One relevant technique that bears mentioning, though outside of the scope of the article, is billboards. It’s one of the oldest tricks in the book, long predating programmable shaders. It’s how Creative Assembly rendered thousands of Japanese fighting men back in Shogun Total War, and, more subtly, the same way they showed even greater numbers in 2003 with the “fully 3D” Rome. Believe it or not, even GTA 4 renders crowd members as anonymous animated billboards when things get really heavy.

    Why are billboards so effective? You can show anything in a quad, and a quad can be cheap no matter what. In the case of batching, it’s not so painful to upload four verts per object per frame to the GPU. This scaled to the thousands even a decade and a half ago. In the case of instancing, you’re elegantly liberated from the restriction that all objects each command share unique defining geometry, simply because a quad is generic geometry defined by its texture content.

    Reply
    1. Simon Post author

      Thank you!

      And yes, billboards are cool. Especially when they get rendered dynamically by the engine (Imposters). Do you know, if the textures for the billboards in GTA are pre-calculated?

      Reply
      1. JimmyThickNThin

        They definitely appear pre-calculated in GTA 4, as in the Total War games. It looks goofy as hell when you focus in on the people, yet I only noticed the other day, having clocked hundreds of hours in the game before. Rockstar got away with murder!

        Have you ever seen dynamic imposters used for distant chunks of environments, apart from small objects like trees?

        Reply
        1. JimmyThickNThin

          http://i.imgur.com/Alu67Yf.jpg

          Here, I’ve collected some examples of what I call “shadow people” in GTA 4, including side-by-sides and single shots. Zoom into the image. Note that they respond to the lighting environment, but have no color information- they’re just gray blobs! Presumably this is to make them reusable across more pedestrian types. It’s fascinating how Rockstar pulled them off with just a human silhouette and grounding in the environment through lighting.

          They animate more smoothly than you’d expect for imposters, but what makes them clearly pre-calcuated is the limited directions they’re visible in, which itself is only apparent when they’re running away from the player’s carnage.

          Reply
          1. Simon Post author

            Thanks for the picture! This is really interesting. I would also think that they are pre-calculated but i must say, that imosters in distance are only updated if the viewing angle changes drastically. A *plop* between direction changes would be expected i think – so even real-time generated imposters could look like they were pre-calculated … i would think (but i don’t know it).

        2. Simon Post author

          No, that*s why i wrote to the guy from Limit Theory because he said in his last dev diary that he uses imposters for asteroids. I would love to see this in action (if they are real-time generated) :D

          Reply
  9. Rupesh

    Wow!

    Nicely explained article. Really liked the way you have explained some of the complex stuff neatly. Please keep up the great work.

    Cheers,
    Rupesh.

    Reply
  10. mike

    Very nice, did not read everything but what got me is this: “A draw call is a command to render one mesh.”

    I don’t know if you clarify later. But it is more than “one mesh”, it’s a set of buffers. With modern techniques you can actually draw the whole scene with one multi-draw-indirect call (you must use some uber-shader). See the following for details:
    http://www.openglsuperbible.com/2013/10/16/the-road-to-one-million-draws/
    And here discussions how to implement it for Ogre3D, some complex sh#t, the user “gsellers” here is the author of the previous links content:
    http://www.ogre3d.org/forums/viewtopic.php?f=25&t=81060

    Reply
    1. Simon Post author

      I mentioned that, modern system fill a command buffer and send it as whole to the GPU and/or are able to fill several buffers at the same time. But your links look very good and i’ve to read them later and will add them to version 1.3 of the article. Great, thanks for your time, comment and the links :)

      Reply
  11. David

    That was an excellent well researched and presented article. Thanks loads for taking the effort to piece this together!

    Reply
    1. Simon Post author

      Thank you very much! I’m working on version 1.2 and hope you’ll like it too as soon as it’s released :)

      Reply
  12. Stacie

    As an artist, I find asset optimization intimidating but you really broke it down. I was having trouble trying to decide on modular characters or not (swappable hair/armor/weapons) including lots of small textures; now I see the overhead this creates. It’s good to know since it is unnecessary for my game and was more of an uninformed design choice. I need to optimize since I am limited to the restrictions of a console but I wasn’t sure how. This is exactly what I was looking for. Thank you so much! :)

    Reply
    1. Simon Post author

      Glad to hear that i could help :) But you should also checkout other sources and maybe post the question in some forums. It’s really hard to define overall rules – so many dependancies. It’s all so complicated :(

      Is there already something to show about your project? :)

      Reply
      1. Stacie

        I don’t have anything online yet, but I hope to release it late this year. It’s a fantasy turn-based tactical RPG for the Wii U. My programmer partner is optimizing his side, I’m glad now that I can also optimize mine. The game is not complete so we cannot tell if more optimization is necessary yet, but everything is running very well so far. :)

        Reply
        1. Simon Post author

          Sounds great! For Wii U? This is kind of special, right? I’m looking forward seeing some screenshots :) Not many people developing something for the Wii U i’ve heard. Is it hard to get a DevKit or does Nintendo support “external” developers as much as e.g. Sony?

          Reply
          1. Stacie

            It’s as easy to become a developer for Nintendo as it is for Sony. There are a bunch of indie developers getting projects ready for release on Wii U but not a whole lot of them, especially compared to some other systems.

            If you’d like I could send you a link for screenshots on Reddit once we go public with our game.

          2. Simon Post author

            Didn’t know that. Thanks for updating me :) I wish you the best for your game and would love to see what you’re working on!

  13. Pramana

    Oh how could I ended up here ?

    THIS IS A TREASURE CHAMBER OMG XD
    Thanks for making this, I’m sure learned a lot from this.

    Reply
  14. Ziboo

    Hello,

    Thanks for the tut.

    I was asking my self a question but maybe you can answer me.

    I need to do a shader where I need a gradient in it to make an effect.
    I have to choice:
    – Create a new map with the gradient
    – Create a uv2 where I will use the Y coordinate to drive the gradient

    Which case do you think is more optimized ?
    Having one more map or having one more uv set ?

    Thanks

    Reply
    1. Simon Post author

      I think the question goes about having several different gradients in one texture, right? Or is there another reason why you want to sample the gradient-texture by Y of a second UV-Set (for example to morph the gradient-colors over time)?

      Reply
  15. Ziboo

    It’s more about performance. Extra texture vs extra uv set.
    Also sampling a gradient with the uvs will be more precise if I’m not mistaken, no chance to have color banding

    Reply
  16. Tab Closer

    Hello!
    I would suggest to put the video links to open in a new tab.

    I kind of closed the tab a lot of times after watching the videos.

    Reply
    1. Simon Post author

      Thanks for the suggestion! I wonder: Are you watching the content on mobile? Because on PC the videos should just start right there so I wonder how it happened that you accidentally closed a tab?

      Reply
  17. Leon He

    Hi Simon!

    The Render Hell articles are really great! It help me a lot to understand the GPU HW.
    I come from China. Can I translate these articles into chinese on my blog? I’ll not change the author of them.
    Waiting for your reply. Thanks very much!

    Reply
    1. Simon Post author

      Yes of course. That’s super cool! If you give me the Link to your blog afterwards I’ll link to it :) Looking forward to your translation!

      Reply
      1. Leon He

        Hi Simon:

        Thanks for your approval! I have translated the Book I ~ V into chinese as follow links:

        Book I:
        https://blog.csdn.net/hexiaolong2009/article/details/104084445
        Book II:
        https://blog.csdn.net/hexiaolong2009/article/details/104088308
        Book III:
        https://blog.csdn.net/hexiaolong2009/article/details/104089572
        Book IV:
        https://blog.csdn.net/hexiaolong2009/article/details/104108749
        Book V:
        https://blog.csdn.net/hexiaolong2009/article/details/104108917

        Or you can find them here:
        https://blog.csdn.net/hexiaolong2009/category_9705063.html

        Because the CSDN blog can not upload videos directly, I have to convert all the animations into GIF before upload. That make all the anmations not be true to the original videos. Anyway, I think chinese programer will still be interested in these articles and your videos.

        Thanks again for your effort!

        Reply
        1. Simon Post author

          Wow super cool! Thank you so much for your work! I’ll add a link the articles. Do you have a Twitter account?

          Reply
          1. Simon Post author

            Oh ok :) I used your blog-account to link your name. I hope your audience likes the articles! Thanks again for the translation!

  18. Albert F

    Very nice article(s). I just started playing around on shaders in Unity (shadergraph, the easy start) and finding this website was an amazing experience. Everything I read is very interesting and super helpful.

    Many thanks for your work!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *