A new article version? Below you’ll find a list about what’s new!


What's new in 1.1

  • Added new possible problem-source about batching here.
  • Added a note about the pipeline simplification here. Madsy9 pointed it out here.
  • This article was honored with two reddit threads. Here and here.
  • Added better FIFO explanation here. Thx sccrstud92, tmachineorg and koyima
  • Cort mentioned this these slides about “How Shader Cores Work”.
  • NVidia specifications with actual core counts. Thx Jan, Volkan and Hakan Candemir
  • Mr.Yeah also wrote actual core counts in his comment.
  • Corrected some misspellings. Thx OddEyesCG for mentioning!
  • Videos are now only auto-played when you scroll to them (to prevent another server overload). Thx Flannon & RXMESH for the code! And garblefart for all the feedback.



A lack of knowledge sometimes can be a strength, because you naively say to yourself “Pfff..how complicated can it be?” and just dive in. I started this article by thinking “Hm…what exactly is a draw call?”. During my 5-Minute-Research I didn’t find a satisfying explanation. I checked the clock and since i still had 30 minutes before bedtime i said …

“Pfff, how complicated can it be to write it by my own?”

… and just started. This was two months ago and since that i was continuously reading, writing and asking a lot questions.

It was the hardest and low levelest research i ever did and for me as a non-programmer it was a nightmare of “yes, but in this special case…” and “depends on the api…”. It was my personal render hell – but i went through it and brought something with me: Four books, each representing an attempt to explain one part of rendering from an artist perspective. I hope you’ll like it.


Open this Book


Artists must be strong now: From a computers perspective, your assets are just lists of vertex- and texture data. Converting this raw data into a next-gen image, is mainly done by your system processor (CPU) and your graphics processor (GPU).

1. Copy the data into system memory for fast access

At first all necessary data is loaded from your hard drive (HDD) into the system memory (RAM) for faster access. Now the necessary meshes and textures are loaded into the memory on the graphic card (VRAM). This is because the graphic card can access the VRAM a lot faster and mostly doesn’t have direct access to the RAM.

If a texture isn’t needed anymore (after loading it into the VRAM), it can be thrown out of the RAM (but you should be sure, that you won’t need it again soon, because reloading it from HDD costs a lot time). The meshes should stay in the RAM because it’s most likely that the CPU wants to have access to them e.g. for collision detection.

Before the render-party can start, the CPU sets some global values which describe how the meshes shall be rendered. This value collection is called Render State.

2. Set the Render State
A render state is kind of a global definition of how meshes are rendered. It contains information like:

“vertex and pixel shader, texture, material, lighting, transparency, etc. […]” [b01 page 711]

Important: Each mesh, which the CPU commands the GPU to draw, will be rendered under these conditions! You can render a stone, a chair or a sword – they all get the same render values assigned (e.g. the material) if you don’t change the render state before rendering the next mesh.

After the preparation is done, the CPU can finally call the GPU and tell it what to draw. This command is known as: Draw Call.

3. Draw Call
A draw call is a command to render one mesh. It is given by the CPU. It is received by the GPU. The command only points to a mesh which shall be rendered and doesn’t contain any material information since these are already defined via the render state. The mesh resides at this point in the memory of your graphic card (VRAM).

After the command is given, the GPU takes the render state values (material, textures, shader, …) and all the vertex data to convert this information via some code magic into (hopefully) beautiful pixels on your screen. This conversion process is also known as Pipeline.

4. Pipeline
As i said at the beginning, an asset is more or less just a list of vertex- and texture data. To convert those into a mind blowing image, the Graphic Card has to create triangles out of the vertices, calculate how they are lit, paint texture-pixels on them and a lot more. These actions are called states. Pipeline states.
Depending on where you read, you’ll find that most of the stuff is done by the GPU. But sometimes they say, that for example the triangle creation & fragment creation is done by other parts of the graphic card.

This pipeline example is extremely simplified and shall just be seen as a rough overview. I tried to visualize it as good as i could but as a non-programmer i have problems to judge when it gets too simply to not lead you into a wrong direction. So please don’t take it too seriously and consider all the other beautiful sources i linked at the bottom of this article. Or feel free to mail, tweet or facebook me so i can improve the animation/explanation. :)



Here is an example with only one GPU core:

Rendering is basically doing an immense number of small tasks such as calculate something for thousands of vertices or painting millions of pixels on a screen. At least in (hopefully) 30fps.

It’s necessary to be able to compute a lot of that stuff at the same time and not every vertex/pixel one after another. In the good old days, processors had only one core and no graphic acceleration – they could only do one thing at the same time. The games looked … retro. Modern CPUs have 6-8 cores while GPUs have several thousands (they aren’t that complex like CPU-Cores, but perfect for pushing through a lot vertex and pixel data).

Exact GPU core numbers can be found in [a38], [a39], [a40], [a41], [a42] or in Mr.Yeah’s comment.

When data (e.g. a heap of vertices) is put into a pipeline stage, the work of transforming the points/pixels is divided onto several cores, so that a lot of those small elements are formed parallel to a big picture:

Now we know, that the GPU can work on stuff in parallel. But what’s about the communication between CPU and GPU? Does the CPU has to wait until the GPU finished the job before it can receive new commands?

NO!

Thankfully not! The reason is, that such a communication would create bottlenecks (e.g. when the CPU can’t deliver commands fast enough) and would make parallel working impossible. The solution is list where commands can be added by the CPU and read by the GPU – independent from each other! This list is called: Command Buffer.

5. Command Buffer
The command buffer makes it possible that CPU and GPU can work independent from each other. When the CPU wants something to be rendered, it can push that command into the queue and when the GPU has free resources, it can take the command out of the list and execute it (but the list works as a FIFO – so the GPU can only take the oldest item in the list (which was first/earlier added than all others) and work on that).

By the way: there are different commands possible. One example is a draw call, another would be to change the render state.

That’s it for the first book. Now you should have an overview about asset data during rendering, draw calls, render states and the communication between CPU and GPU.

The End


Open this Book

Welcome to the second book! Here we’ll checkout some problems which can occur during the rendering process. But first, some practice:

To know about a problem is useful. To actually feel the problem is even better for understanding. So let’s try to feel like a CPU/GPU.

Experiment
Please create 10.000 small files (e.g. 1 KB each) and copy them from one hard drive to an other. It will take a long time even if the data amount is just 9,7 MB in total.

Now create a single file with a size of 9,7 MB and copy it the same way. It will go a lot faster!

Why? It’s the same amount of data!

That’s right but for every copy-action there’s some stuff to do, for example: prepare the file transfer, allocate memory, read/write heads move back and forth in the HDD, … which is overhead for every write action. As you painful feel, this overhead is immense if you copy a lot small files. Rendering many meshes (which means executing many commands) is a lot more complex, but it feels similar.

Let’s now have a look at the worst case you can get during the rendering process.

Worst Case
To have many small meshes is bad. If they’re using different material parameters on them, it gets even worse. But: Why?

1. Many Meshes

The GPU can render faster than the CPU can send commands.

“The main reason to make fewer draw calls is that graphics hardware can transform and render triangles much faster than you can submit them. If you submit few triangles with each call, you will be completely bound by the CPU and the GPU will be mostly idle. The CPU won’t be able to feed the GPU fast enough.” [f05]

In addition, every draw call produces some kind of overhead (like mentioned above):

“There is driver overhead whenever you make an API call, and the best way to amortize this overhead is to call the API as little as possible.” [a02]

2. Many Draw Calls
One example for such an overhead is the command buffer (explained above). Do you remember that the CPU fills the command buffer and the GPU reads from it? Well, they have to communicate about the changes and this creates overhead too (they do this by updating read/write pointers – read more about it here)!
Therefore it might be better, to not hand over one command after another but first fill up the buffer and then hand over a complete chunk of commands to the GPU. This increases the risk that the GPU has to wait until the CPU is done with building the chunk, but it reduces the communication overhead.

The GPU would (hopefully) have stuff to do (e.g. working on the last chunk of commands) while the CPU builds up the new command buffer. Modern CPUs can also fill several command buffers in parallel and hand them over to the GPU later one after another.

This was only one example. In the real world not only CPU, GPU and Command Buffer are speaking with each other. The Api (DirectX, OpenGL), the driver and whatnot are also integrated in the process which doesn’t make it easier.

We only spoke about many meshes with the same material parameters (render state). But what happens when you want to render meshes with different materials?

3. Many Meshes and Materials

Flush the pipeline.

“When changing the state, there is sometimes a need to wholly or partially flush the pipeline. For this reason, changing shader programs or material parameters can be very expensive […]” [b01 page 711/712]

You thought it can’t get worse? Well … if you have different materials on different meshes, you can’t send their render-commands as chunks. You set a render state for the first mesh, command to render it, set a new render state, command the next mesh-rendering and so on.

I colored the “change state” commands in red because a) they are expensive and b) for better overview.

Setting the render state sometimes (not always, depends on what parameters you want to change) results in a “flush” of the whole pipeline. This means: every mesh which is currently processed (with the current render state) has to be finished before new meshes can be rendered (with the new render state) . It would looks like in the image above. Instead of taking a huge number of vertices (e.g. when you combine several meshes of the same render state – an optimization I’ll explain later), you would render a small amount before changing the render state which – this should be clear by now – is a bad thing.

By the way: Since the CPU needs a minimum time for setting up a draw call (independent of the given mesh size), you can assume that there’s no difference in rendering 2 or 200 triangles. The GPU is crazy fast and before the CPU has prepared a new draw call, the triangles are already freshly baked pixels on screen.
This “rule” changes of course when we talk about combining several small meshes into one big mesh (we’ll look at this in a second).

I wasn’t able to get latest values for how many polygons you render “for free” on current graphic cards. If you know something about that or did some benchmarks recently, please tell me!

4. Meshes and Multi-Materials
What when not only one material is assigned to a mesh but two or more? Basically, your mesh is ripped into pieces and then fed piece by piece into the command buffer.

This of course creates one draw call per mesh piece.

I hope i could give you a small insight of what is bad about a lot meshes and materials. Let’s now look at some solutions because even all of this sounds really really bad: There are beautiful games out there which means that they solved the mentioned problems somehow.

The End


Open this Book


Now it gets interesting! Here I will present you some solutions i found during my research. This hopefully gives you an idea how an asset should be optimized to be well renderable.

1. Sorting
Firstly you can sort all your commands (e.g. by render state) before you fill the command buffer. This would reduce the necessary state changes to the minimum since you go through all meshes of the same kind before changing the state.

But you would still create a lot overhead by rendering every mesh one after another. To reduce this overhead, a technique called Batching seems to be useful.

2. Batching
When sorting your meshes, you kind of pile them together to heaps of the same kind. The next step would be, to tell the GPU to render such a heap at once. This is what batching is about:

“‘Batching’ means to group some meshes together before calling the API to draw them. This is why it takes less time to render a big mesh than multiple small meshes.” [a36]

So, instead of using one draw call per mesh (which share the same render state)…

…you would combine the meshes (with the same render state) and render them as one draw call. This is a really interesting topic because you can render different meshes (stone, chair or a sword) at once as long as they use the same render states (which basically means that the use the same material setup).

It’s important to mention, that you combine the meshes in the system memory (RAM) and then send the newly created big mesh to the graphic card’s memory (VRAM). This takes time! Therefore batching is good for static objects (stones, houses, …) which you combine once and let them stay in the memory for long time.
You can also batch together dynamic objects like for example laser-bullets in a space game. But since they’re moving you would have to create this bullet-cloud-mesh every frame and send it to the GPU memory!

Another point why you have to be careful (thx koyima for mentioning): If an object isn’t in the camera frustum, you can just cull (ignore it for the rendering). But if you batch together several objects, you have to consider the whole new big mesh while rendering (even if only a small part of it is actually visible). This might decrease performance in some cases.

A better solution for handling dynamic objects is Instancing.

3. Instancing
Instancing means, that you send only one mesh (e.g. a laser bullet) instead of many and let the GPU duplicate it several times. Having the same object at exact the same position with the same rotation or animation would be a bit boring. Therefore you can provide a stream of extra data like the transformation matrix to render the duplicates at different positions (and in different poses).

“Typical attributes per instance are the model-to-world transformation matrix, the instance color, and an animation player providing the bones used to skin the geometry packet.” [a37]

Don’t nail me down on that, but as far as i know, is this data stream just a list in the RAM where the GPU has access to.

This would result in only one draw call per mesh type! The difference in comparison to Batching is, that all instances look the same (because they’re copies of the same mesh) while a batched mesh can consist of several different meshes as long as they use the same render state parameters.

Now it gets a bit more creative. I think the following tricks are very cool, even if they are only suitable for special cases:

4. Multi-Material-Shader
A shader can access several textures and therefore it’s possible to not only have one diffuse/normal/specular/… map but e.g. two of them – which basically means, that you have two materials combined in one shader. The materials are blended into each other controlled by a blend-texture. Of course, this cost’s GPU power because the blending is expensive, but it reduces the draw call count because a mesh with two or more materials would not be ripped into pieces anymore (explained under “4. Meshes and Multi-Materials”).

Read more about this here.

The documentation says that a higher draw call count is still better than this expensive technique. Anyway, i found it very interesting and if you need some good numbers for statistics, you can argue that layer materials reduce the draw call count (even if this says nothing about the performance … but psssssssst!).

5. Skinned Meshes
Do you remember the laser-bullet-mesh i talked about? I said that this mesh would have to be updated every frame since the bullets constantly move. Batching them together and sending the resulting mesh every frame would be expensive.
An interesting approach to this problems is to automatically add a bone to every bullet and give it skinning information. With that you would have one big mesh which could stay in the memory and you would only update the bone data for every frame. Of course, if a new bullet as shot or an old gets destroyed, you would have to create a new mesh. But it sounds like a really interesting idea to me.

Read more about this here.

Feel free to send me more links about creative solutions to reduce draw calls!

Almost done! You should now have a vague understanding of what can be done to render assets a bit faster. Don’t worry, the next book will be short.

The End


Open this Book


Here I’ll shortly conclude what we learned so far:

Avoid small meshes
Check if small meshes are necessary or if you could connect several small to one big mesh. If you have small, talk to a graphic programmer to get infos about the polycount “sweet spot” (meshes below that triangle count aren’t rendered faster). So maybe yo want add some tris to make things round if you need to keep a small mesh. You should also care about multi materials. If you have one big mesh but with 5 sub-materials assigned, the big mesh is ripped apart during the rendering and this means you now have 5 small meshes again. Maybe an atlas texture could help?

Avoid too many materials
Speaking of materials: think about the material management. Sharing materials between assets might be possible if you plan ahead before the asset creation. Bigger atlas textures can help.

Debug Tools
Talk to you programmers if you can get ingame statistics so that you can assume how problematic your asset might be. Sometimes it’s hard to have the overview over complex assets. But if a tool can warn you, that this asset could be a potential performance problem, you might solve the problem before the asset is committed as final.

Ask the coder
As you see this topic is highly technical and very context dependent (hardware, engine, driver, game perspective…). So it might be a good idea to ask your programmer how assets should be setup. Or just wait, because if performance drops because of your assets, programmers will find your office and poke you until your optimized your stuff. :)

You have more tips I should add here? Let me know!

Wow, you read until here? You’re crazy! Thanks a lot! Let me now what you think. I hope you learned something. :)

The End

Thank you!

Thanks goes out to all readers but especially to the people listed below. This article wouldn’t be there without you guys! Thank you for answering all my questions, reading over all my text iterations and supporting me.

Links & Resources

Videos
[v01] CPU vs GPU Demonstration with Paint-Gun-Robots
[v02] Multiple Materials in one Draw Call

Podcast
[p01] Overview about Rendering, APIs and all that stuff

Book
[b01] Real-Time Rendering: Page 711

Articles
[a01] MSDN: Accurately Profiling Direct3D API Calls (Direct3D 9)
[a02] GPU Programming Guide GeForce 8 and 9 Series
[a03] MSDN: States (Direct3D 9)
[a04] MSDN: Efficiently Drawing Multiple Instances of Geometry (Direct3D 9)
[a05] Understanding Modern GPUs
[a06] A trip through the Graphics Pipeline
[a07] Understanding GPUs from the ground up
[a08] Flushing the pipeline
[a09] Sides: Avoiding Catastrophic Performance Loss
[a10] Radeon R5xx Acceleration
[a11] SIGGRAPH 2006: GPU Shading and Rendering
[a12] Tool: GPUView for performance measurement
[a13] Real-Time Graphics Architecture
[a14] How GPUs Work
[a15] NVidia GPU Gems Book
[a16] Wikipedia: Shader
[a17] Wikipedia: Graphics Pipeline
[a18] ExtremeTech 3D Pipeline Tutorial
[a19] Linux Programmer’s Reference Manuals
[a20] More AMD References (like [a10])
[a21] OpenGL Lecture
[a22] Learning Modern 3D Graphics Programming
[a23] OpenGL Programming Guide
[a24] Draw Call Batching
[a25] OpenGL Step by Step
[a26] OpenGL 3 & DirectX 11: The War Is Over
[a27] Rendering Pipeline Overview
[a28] GPU Parallelizable Methods
[a29] Parallelism in NVIDIA GPUs
[a30] Many SIMDs Make One Compute Unit
[a31] PowerPoint: Modern GPU Architecture
[a32] Unreal: Layered Material
[a33] Unity: One draw call for each shader
[a34] Reducing GPU Offload Latency via Fine-Grained CPU-GPU Synchronization
[a35] Accurately Profiling Direct3D API Calls
[a36] Technical Breakdown – Assassins Creed II
[a37] NVidia GPU Gems 2
[a38] NVidia Titan Z
[a39] NVIdia Geforce GTX 780 TI Specifications
[a40] List of Nvidia graphics processing units
[a41] List of AMD graphics processing units
[a42] List of Intel graphics processing units
[a43] From Shader Code to a Teraflop: How Shader Cores Work

Forum Discussions
[f01] 2 Materials on one mesh
[f02] Which is faster
[f03] Multiple Materials with one glDrawElements()
[f04] What Is A Draw Call? How Does It Effect My Product?
[f05] Why are draw calls expensive
[f06] A great reddit discussion about the content of this article
[f07] Another great reddit discussion about the content of this article

44 thoughts on “Render Hell 1.1

  1. Jan

    This is pretty cool! Thanks for the guide, I’d been wondering about all of this for a while now. Good stuff :)

    As for current-gen GPU core amounts, I usually look at AnandTech’s GPU comparisons, e.g.:
    http://www.anandtech.com/show/8069/nvidia-releases-geforce-gtx-titan-z (scroll down a bit for a table.)
    Nvidia (didn’t check ATI) have the number of cores listed in the tech specs for their GPUs, and in the feature lists on their website.
    I vaguely remember reading that there are seperate, specialized cores for handling textures, but I don’t know anything certain about that.

    Reply
    1. SimonSimon Post author

      Oh really nice! Thanks for the link! That’s the first time i see some numbers. Might not explain anything but now i’ve a vague idea about what count we’re talking about :)

      Reply
        1. SimonSimon Post author

          Uhw, nice! Thanks for the link. I’ll add it to the link list later. Awesome :,) Oh and thank you for the big compliment :) Glad you enjoy reading it!

          Reply
    2. Mr.Yeah

      Fastest NVIDIA card: GTX TITAN Z (5760 cores)
      Fastest AMD card: Radeon R9 295X2 (5632 cores)

      Both cards have two GPUs, so it’s 2880 (NVIDIA) or 2816 (AMD) cores per GPU.

      Reply
  2. Lee Day

    Cool article, very good explanation, maybe in the next chapter you could go more in depth regarding vertex & index buffers. Keep up the good work im sure your site will get lots of traffic.

    Reply
    1. SimonSimon Post author

      GLad you like the article! Hm i’m not sure if i shall do another article in that technical level :D It almost crushed me and i’m always not sure how thrustworthy i’m as an artist are, when i try to explain programmerstuff. But thanks for the compliment. Regarding the buffers: As far as i know those buffers are justs lists…and the index buffer refers to a part of the vertex list. Are there special questions which are bothering you?

      Reply
  3. Lars Kokemohr

    I have a suggestion for another tip right after “ask the coder”:
    Embrace the coder
    First do this literally, he or she won’t bite. Done? Good. Now think about your relationship: Normally the artist starts with an asset solely based on artistic premises. In the next step the coder(s) will try to optimize the things you want to display as much as possible, hopefully with your help.
    Why not reverse this process and start with an interesting technique? After all this is how many great games were made, a good example being Minecraft. Yes, it does not have fancy high-end graphics, but that’s because the foremost goal was to create a world that is completely editable.

    Reply
    1. SimonSimon Post author

      Yes of course, good communication is key and i think from the start all the different artisans have to work together and don’t sit in a separated room, thinking about something for 2 years and then confront the team with there special idea which just isn’t possible to execute. But minecraft has its issues too – all those cubes need to be handled and i just saw an article recently where they solved some sorting issue because they weren’t able to cull the dungeons below the surfaces (which weren’t visible BUT which were in your viewcone). So even this simple style has its problems. :D

      Reply
  4. John

    Hopefully this article gives you, and other artists, a better appreciation of the programmers :) It seems like we are becoming less and less relevant as the game engines slowly try to replace us with artist friendly tools.

    Reply
    1. SimonSimon Post author

      Hehe sometimes i think the other way around. While art is often outsource-able, you always need a lot coders in the core team. Sure, you can “easily” create a standard shooter with an engine which gives you the tools to do that, but mostly you need a unique selling point (e.g. portals) and if the engine doesn’t support such a game mechanic, you always need programmers. But when i see that this Limit Theory guy does (all procedural generated graphics) and how stunning it looks, i feel fear about my future :D

      But in general i think every department deserves appreciation. I really don’t like these “fights” about designer vs coders etc – I really like to work together with programmers, designers, testers, … :)

      Reply
  5. anon

    very nice writeup, complete with very fun and nice animations…and you used html5/webm! thanks for that. not doing that wouldve brought any high end system to it’s knees in any browser (e.g. using gif, flash, etc). now if only more people could follow your example :)

    Reply
    1. SimonSimon Post author

      Thanks for the compliment :) I can understand thath people use GIF because it’s just simple. I had to to several tests and only because of very nice twitter followers i was able to manage that those videos run on every browser and operating system. But of course, it saves a lot of space! On the other side: i received a message that those videos make use of a core to 100% in firefox….so maybe gif is less CPU-dependent? Anyway, i’ll use webm/mp4 in the future and i’m really happy with it. And i’m glad that you like it too :)

      Reply
    1. SimonSimon Post author

      Thanks man. Sorry for coming too late :D I’ve wished to finish this beast faster but it took me two month :D Your game looks nice! Just faved it :)

      Reply
  6. nikitablack

    Hi. I just want to say BIG THANK YOU for this article. I remember days long ago when I started to learn 3d graphics and I really missed articles like this – basic things from the very beginning. Thought I’m a programmer and absolutely not an artist I’m reading your blog with a great pleasure and I hope that you won’t stop and continue share you knowledge. Simon if you have any questions about programming/graphics you can freely contact me, maybe I can be useful (or you’ll teach me something new, hehe :) ).

    Reply
    1. SimonSimon Post author

      Hi Nikita! Thanks a lot for this offer! But beware, i can have a lot questions. In fact, most of my articles only exist because i had the luck to have people to ask. I annoyed Timon (first place in the thanks-list) almost every day and wrote long mails to other programmers and stole their time :D

      Oh and…pssst…this isn’t my knowledge. To be honest, most of the stuff was surprising to me and i had to do research to find out how that stuff works :) So it’s actually my not-knowledge, which makes me have questions and write the answers into articles :D

      Reply
    1. SimonSimon Post author

      Wow cool, thanks! This will all go into 1.1 of the article :) I need a bit time for the preparation but then it will be included. Thanks a lot!

      Reply
    1. SimonSimon Post author

      Should be fixed now. The server was overloaded so i moved all videos to vimeo and embedded them. I hope there are no problems anymore?

      Reply
    1. SimonSimon Post author

      Thanks man! As far as i see, it wasn’t the traffic, but the processor power needed to decompress hte mp4/webm videos. I moved them to vimeo, now it should work :)

      Reply
  7. RavenWorks

    Been enjoying going through this writeup over the past few days :) Wish I’d found a nice overview like this when I was learning these things originally.

    As for video hosting, maybe try services like Gfycat, or maybe even Coub or Vine? What did you make them in—is there any chance it can export to SVG Animation or an HTML5 script? (I’m honestly not sure if that would perform better or worse than GIF, but it would sure be smaller at least!)

    Reply
    1. SimonSimon Post author

      Thanks for the compliment :) Regarding the html5: i have no idee :D but i moved that stuff to vimeo and it should work. i already used standard html5 video tags but it seems that server cpus don’t like that :,( OR the server CPU was jealous because he wasn’t in the article :D

      Reply
    1. SimonSimon Post author

      Thanks for the suggestion :) I use vimeo for now….does it work for you? Oh man, i’m so glad that some people find that helpful. I was often very near to give up because i thought “nobody needs that stuff” :D

      Reply
        1. SimonSimon Post author

          Stop making me blush :D Thanks! But actually YOU are great, you take the time and read my stuff and even give me feedback. That’s so cool :)

          Reply
  8. JimmyThickNThin

    Great article! This is the most understandable sum-up of CPU-GPU interplay I’ve seen, with hilarious animations to boot.

    One relevant technique that bears mentioning, though outside of the scope of the article, is billboards. It’s one of the oldest tricks in the book, long predating programmable shaders. It’s how Creative Assembly rendered thousands of Japanese fighting men back in Shogun Total War, and, more subtly, the same way they showed even greater numbers in 2003 with the “fully 3D” Rome. Believe it or not, even GTA 4 renders crowd members as anonymous animated billboards when things get really heavy.

    Why are billboards so effective? You can show anything in a quad, and a quad can be cheap no matter what. In the case of batching, it’s not so painful to upload four verts per object per frame to the GPU. This scaled to the thousands even a decade and a half ago. In the case of instancing, you’re elegantly liberated from the restriction that all objects each command share unique defining geometry, simply because a quad is generic geometry defined by its texture content.

    Reply
    1. SimonSimon Post author

      Thank you!

      And yes, billboards are cool. Especially when they get rendered dynamically by the engine (Imposters). Do you know, if the textures for the billboards in GTA are pre-calculated?

      Reply
      1. JimmyThickNThin

        They definitely appear pre-calculated in GTA 4, as in the Total War games. It looks goofy as hell when you focus in on the people, yet I only noticed the other day, having clocked hundreds of hours in the game before. Rockstar got away with murder!

        Have you ever seen dynamic imposters used for distant chunks of environments, apart from small objects like trees?

        Reply
        1. JimmyThickNThin

          http://i.imgur.com/Alu67Yf.jpg

          Here, I’ve collected some examples of what I call “shadow people” in GTA 4, including side-by-sides and single shots. Zoom into the image. Note that they respond to the lighting environment, but have no color information- they’re just gray blobs! Presumably this is to make them reusable across more pedestrian types. It’s fascinating how Rockstar pulled them off with just a human silhouette and grounding in the environment through lighting.

          They animate more smoothly than you’d expect for imposters, but what makes them clearly pre-calcuated is the limited directions they’re visible in, which itself is only apparent when they’re running away from the player’s carnage.

          Reply
          1. SimonSimon Post author

            Thanks for the picture! This is really interesting. I would also think that they are pre-calculated but i must say, that imosters in distance are only updated if the viewing angle changes drastically. A *plop* between direction changes would be expected i think – so even real-time generated imposters could look like they were pre-calculated … i would think (but i don’t know it).

        2. SimonSimon Post author

          No, that*s why i wrote to the guy from Limit Theory because he said in his last dev diary that he uses imposters for asteroids. I would love to see this in action (if they are real-time generated) :D

          Reply
  9. Rupesh

    Wow!

    Nicely explained article. Really liked the way you have explained some of the complex stuff neatly. Please keep up the great work.

    Cheers,
    Rupesh.

    Reply
  10. mike

    Very nice, did not read everything but what got me is this: “A draw call is a command to render one mesh.”

    I don’t know if you clarify later. But it is more than “one mesh”, it’s a set of buffers. With modern techniques you can actually draw the whole scene with one multi-draw-indirect call (you must use some uber-shader). See the following for details:
    http://www.openglsuperbible.com/2013/10/16/the-road-to-one-million-draws/
    And here discussions how to implement it for Ogre3D, some complex sh#t, the user “gsellers” here is the author of the previous links content:
    http://www.ogre3d.org/forums/viewtopic.php?f=25&t=81060

    Reply
    1. SimonSimon Post author

      I mentioned that, modern system fill a command buffer and send it as whole to the GPU and/or are able to fill several buffers at the same time. But your links look very good and i’ve to read them later and will add them to version 1.3 of the article. Great, thanks for your time, comment and the links :)

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>