TSE optimizations
by Brian Ramage · 06/06/2006 (5:43 pm) · 41 comments
Hey everyone, just wanted to give a little bit of an update on the optimizations I've been working on for TSE. The batching system is working out very well as you can see in the fps of the screenshots I've posted. That's 10 space orcs in one shot running around 140 fps on a 3.4ghz P4 with an x800.
The batching essentially will group up draw calls into one manager, stuffing them into bins like mesh, interior, translucent, glow, etc. Each bin is then sorted by Material, and then at draw time, it will loop through each Material and send the appropriate draw and shader data to the card. It greatly reduces the number of render calls to the GPU and speeds things up quite a bit.
Since you can control the order that the bins are drawn and you can add your own fairly easily, a lot of the sorting issues in TSE should be eliminated, and also customized to the needs of your game.
While working on the new batching system I also found and resolved more issues related to the skinning system in TSE. Dynamic vertex buffers have now been fully implemented (as opposed to the existing 'volatile' vertex buffers), so transferring vertex data up has been further optimized.

In this lower shot, you can see the new "Warning" material that I've created to let the user know that there is an unmapped material in the scene. This should make it easier to detect than the previous "drop to fixed function with no lighting" system. It works for most of the major geometry - interiors and meshes.

This update should go live pretty soon, I'm just cleaning up some minor remaining issues and waiting for some Atlas changes to be complete before it's released. I'm also hoping to kill some bugs over the next few days that the community has brought to my attention.
Almost forgot to mention - the batching has been tested on Geforce cards and seems to have substantially increased performance. This scene runs about 120fps on Matt Fairfax's laptop with a Geforce 6800 in it. I put a 6600 in my machine and it runs about 40ps - not spectacular, but I'm not sure even one Orc ran at that speed previously.
I also want to mention the great work Joe Maruschak did on optimizing the space orc and his gun to use just two materials. This reduced the number of separate meshes in the orc from 27 down to 16 and almost doubled the performance when combined with the batching. Thanks Joe!
The batching essentially will group up draw calls into one manager, stuffing them into bins like mesh, interior, translucent, glow, etc. Each bin is then sorted by Material, and then at draw time, it will loop through each Material and send the appropriate draw and shader data to the card. It greatly reduces the number of render calls to the GPU and speeds things up quite a bit.
Since you can control the order that the bins are drawn and you can add your own fairly easily, a lot of the sorting issues in TSE should be eliminated, and also customized to the needs of your game.
While working on the new batching system I also found and resolved more issues related to the skinning system in TSE. Dynamic vertex buffers have now been fully implemented (as opposed to the existing 'volatile' vertex buffers), so transferring vertex data up has been further optimized.

In this lower shot, you can see the new "Warning" material that I've created to let the user know that there is an unmapped material in the scene. This should make it easier to detect than the previous "drop to fixed function with no lighting" system. It works for most of the major geometry - interiors and meshes.

This update should go live pretty soon, I'm just cleaning up some minor remaining issues and waiting for some Atlas changes to be complete before it's released. I'm also hoping to kill some bugs over the next few days that the community has brought to my attention.
Almost forgot to mention - the batching has been tested on Geforce cards and seems to have substantially increased performance. This scene runs about 120fps on Matt Fairfax's laptop with a Geforce 6800 in it. I put a 6600 in my machine and it runs about 40ps - not spectacular, but I'm not sure even one Orc ran at that speed previously.
I also want to mention the great work Joe Maruschak did on optimizing the space orc and his gun to use just two materials. This reduced the number of separate meshes in the orc from 27 down to 16 and almost doubled the performance when combined with the batching. Thanks Joe!
About the author
#22
06/07/2006 (8:07 am)
Excellent news, look forward to it, looking forward to the Atlas changes too, keeping my fingers crossed they're the ones you've teased about! ;)
#23
06/07/2006 (8:49 am)
This is fantastic, cant wait for the release
#24
@James - yes, it is similar to what Quake3 does with their material batching.
Very soon means likely within the month.
I'll let Ben talk about the latest Atlas update.
06/07/2006 (11:39 am)
@Phil - Sort of. It will definitely render a bunch of instances for an object in a much more optimized way. I guess you could combine some of the batches to take advantage of hardware instancing. It would be interesting to see what kind of gain that would get. Yes, your translucency problems should go away now - worst case you'll have to rearrange some of the render bins, but that's only 5 mintues to an hours work.@James - yes, it is similar to what Quake3 does with their material batching.
Very soon means likely within the month.
I'll let Ben talk about the latest Atlas update.
#25
06/07/2006 (11:51 am)
dead img links :(
#26
I only get 14 FPS with TSE demo and the Starter.FPS mod.
I wonder how much better it'd be with these optimizations :)
I got a Geforce FX 5600 too.... can you believe it?
06/07/2006 (4:25 pm)
image links working just fine here.I only get 14 FPS with TSE demo and the Starter.FPS mod.
I wonder how much better it'd be with these optimizations :)
I got a Geforce FX 5600 too.... can you believe it?
#27
06/07/2006 (5:00 pm)
Quote:Yes, your translucency problems should go away nowAwesome news; even if it takes some minor tweaking, I'm ready for this.
#28
The more optimizations, the more polies you can have in objects. The more polies per object, the less my artist (and me) whine. The less whining the better ;)
Now, back to work :P
Heh :)
[Ishbuu]
06/07/2006 (11:35 pm)
Wonderful news! Looks pretty as usual ;)The more optimizations, the more polies you can have in objects. The more polies per object, the less my artist (and me) whine. The less whining the better ;)
Now, back to work :P
Heh :)
[Ishbuu]
#29
Good work and good start, man! :) I'm glad TSE's bin is getting filled with more features!
I have exactly the same dev computer and a video card as you and just for your reference, I would like to give you a reference point: I'm working with another AAA engine that uses shaders as well. That engine gives about 100fps on 100 object similar to the SpaceOrc on my dev system, so keep the excellent job ;) I hope TSE will be there soon as well ;) Or should I say "I confident TSE will be there soon"? ;P
Edit - typos.
06/08/2006 (8:23 am)
@Brian Ramage:Good work and good start, man! :) I'm glad TSE's bin is getting filled with more features!
I have exactly the same dev computer and a video card as you and just for your reference, I would like to give you a reference point: I'm working with another AAA engine that uses shaders as well. That engine gives about 100fps on 100 object similar to the SpaceOrc on my dev system, so keep the excellent job ;) I hope TSE will be there soon as well ;) Or should I say "I confident TSE will be there soon"? ;P
Edit - typos.
#30
All I know is when I see 10 characters even lower poly than this in Oblivion, my fps is about 10, so I think we're doing alright.
06/08/2006 (12:32 pm)
@Alexander - I don't know what dark and mysterious technology (mainly because you didn't say) you've been working with that can draw 450,000 polys worth of skinned characters at 100fps, but it sure sounds impressive ;)All I know is when I see 10 characters even lower poly than this in Oblivion, my fps is about 10, so I think we're doing alright.
#31
06/08/2006 (1:29 pm)
oops, not THAT many polygons :) I think an object is about 1000 polys, but about the same texture with shaders and a normal map. I dont give a name of that technology for simple purpose - I dont want to sparkle another Engine vs Engine flame war ;) I didnt pay attention that orc is about 4500 polys.
#32
06/08/2006 (2:15 pm)
Yes, keep in mind that the space orc isn't a truly optimized model either--it's in fact high poly (it could probably get away with a much lower poly quota with some work on normal mapping).
#33
Bah, I'm just being oversensitive, I know you didn't mean it badly Alexander ;)
06/08/2006 (10:16 pm)
Yes, it could be further optimized - I put the screen up to compare old TSE performance to new TSE performance, not benchmark TSE. I got a little prickly at Alexander's comment because it's comparing apples to oranges in terms of the data set, especially since this orc doesn't have LOD data on it. Plus, it's a little uncool to say, "Hey there's this tech I'm using that's 10x better than yours.... Oh but I can't show you or talk about it."Bah, I'm just being oversensitive, I know you didn't mean it badly Alexander ;)
#34
TSE just gets sexier every time you guys show off the latest tweaks! Problem is, it makes me want to play with it and not focus on my TGB project! Oh, what a horrible life for me, so many nice engines to play with... ;-D
- Don
06/09/2006 (7:51 am)
*rowwwwr*!!TSE just gets sexier every time you guys show off the latest tweaks! Problem is, it makes me want to play with it and not focus on my TGB project! Oh, what a horrible life for me, so many nice engines to play with... ;-D
- Don
#35
06/09/2006 (9:43 am)
@Brian: No, I didnt, mAn! ;) It was just a reference point, not comparison. Take it easy, plz :P
#36
06/10/2006 (12:03 am)
Can we get lighting now?
#37
"I'm just cleaning up some minor remaining issues and waiting for some Atlas changes to be complete before it's released"
Are those changes the release of Atlas 2?
Nice plan BTW should prove very useful.
06/10/2006 (8:56 am)
When you say Dynamic Vertex Buffers are now fully implemented does that include dynamic vertex buffers for Interiors. I keep overflowing the vertex buffers and the old cut apart your interior hack just doesn't seem right."I'm just cleaning up some minor remaining issues and waiting for some Atlas changes to be complete before it's released"
Are those changes the release of Atlas 2?
Nice plan BTW should prove very useful.
#38
06/11/2006 (4:57 am)
How much do these changes affect the prepRenderImage() and renderObject() dance? We've been working on a custom class for large, walkable structures (we use gile[s] files, which are mesh-based, not CSG), and I'd like to know how much I'll need to change to integrate it on the new system (we've got the loading and rendering parts done, all that is left is culling, collision and transparency sorting).
#39
@Manoel - it is a big change to prepRenderImage and renderObject - at least in the areas of the engine that take advantage of batching. If you've got your own object class and are happy with the way it performs, you can still pretty much render it the old way. The Sky and misc fxRender objects are handled like that.
06/12/2006 (1:06 pm)
@Jesse - are you overflowing the vertex buffer or the number of available indexed vertices? We're currently using 16 bit index buffers which I believe was in place to support Geforce 3 cards, but I'll take another look. I don't believe there is a limit to the size of the static (or dynamic) vertex buffers other than what D3D will allow. Volatile buffers you can overflow, but you can change the size of the pool if you need it to be larger.@Manoel - it is a big change to prepRenderImage and renderObject - at least in the areas of the engine that take advantage of batching. If you've got your own object class and are happy with the way it performs, you can still pretty much render it the old way. The Sky and misc fxRender objects are handled like that.
#40
06/13/2006 (1:58 pm)
@Brian - awesome! So DTS objects will magically run faster, and our own objects won't get broken. Sounds very good! Our custom objects use a basic per-material batching for it's sub-materials already, and I was impressed at performance, even without culling.
Associate James Urquhart
That batching system seems similar to what quake 3 does, where each "surface" in the scene is sorted by type of shader (decal, weapon, level, brush, etc). It also has some provisions for merging shader passes, e.g. when rendering blood particles.
That system in itself reminds me of TGE's scenegraph, where each SceneRenderImage has a sort key - though sadly, i don't think it was exploited enough to be useful for batch rendering :(