Previous Blog Next Blog
Prev/Next Blog
by date

TSE optimizations

TSE optimizations
Name:Brian Ramage
Date Posted:Jun 07, 2006
Rating:4.6 out of 5
Public:YES
Comments:YES
RSS Feed:GarageGames Blog feedor Subscribe with .
Profile Page:View profile page for Brian Ramage

Blog post
Hey everyone, just wanted to give a little bit of an update on the optimizations I've been working on for TSE. The batching system is working out very well as you can see in the fps of the screenshots I've posted. That's 10 space orcs in one shot running around 140 fps on a 3.4ghz P4 with an x800.

The batching essentially will group up draw calls into one manager, stuffing them into bins like mesh, interior, translucent, glow, etc. Each bin is then sorted by Material, and then at draw time, it will loop through each Material and send the appropriate draw and shader data to the card. It greatly reduces the number of render calls to the GPU and speeds things up quite a bit.

Since you can control the order that the bins are drawn and you can add your own fairly easily, a lot of the sorting issues in TSE should be eliminated, and also customized to the needs of your game.

While working on the new batching system I also found and resolved more issues related to the skinning system in TSE. Dynamic vertex buffers have now been fully implemented (as opposed to the existing 'volatile' vertex buffers), so transferring vertex data up has been further optimized.






In this lower shot, you can see the new "Warning" material that I've created to let the user know that there is an unmapped material in the scene. This should make it easier to detect than the previous "drop to fixed function with no lighting" system. It works for most of the major geometry - interiors and meshes.





This update should go live pretty soon, I'm just cleaning up some minor remaining issues and waiting for some Atlas changes to be complete before it's released. I'm also hoping to kill some bugs over the next few days that the community has brought to my attention.

Almost forgot to mention - the batching has been tested on Geforce cards and seems to have substantially increased performance. This scene runs about 120fps on Matt Fairfax's laptop with a Geforce 6800 in it. I put a 6600 in my machine and it runs about 40ps - not spectacular, but I'm not sure even one Orc ran at that speed previously.

I also want to mention the great work Joe Maruschak did on optimizing the space orc and his gun to use just two materials. This reduced the number of separate meshes in the orc from 27 down to 16 and almost doubled the performance when combined with the batching. Thanks Joe!

Recent Blog Posts
List:01/24/07 - TGEA Milestone 4.2 release
10/30/06 - TGEA Milestone 4 Demo
08/17/06 - TSE Milestone 4
06/07/06 - TSE optimizations
07/26/05 - Plan for Brian Ramage
06/24/05 - Plan for Brian Ramage
02/25/05 - Plan for Brian Ramage
10/25/04 - Plan for Brian Ramage

Submit ResourceSubmit your own resources!

Brian Ramage   (Jun 07, 2006 at 00:59 GMT)   Resource Rating: 5
PS - those orcs are about 4500 verts and have no LOD on them.

Konstantin Teterin   (Jun 07, 2006 at 01:30 GMT)
That is an awesome news! I can't wait until I'll be able to "touch" the updated TSE :)

Mike Kuklinski   (Jun 07, 2006 at 01:44 GMT)
Awesome Blossom! This will go excellently in Kuiper, which uses a lot of objects that would benefit from batching!

Aaron Ellis   (Jun 07, 2006 at 01:54 GMT)
Very nice. I'm looking forward to seeing this in action. :)

Ray Noolness Gebhardt   (Jun 07, 2006 at 02:07 GMT)
That is awesome stuff Brian, I can't wait to get my hands on the code! :)

Raxx   (Jun 07, 2006 at 02:13 GMT)
Nice, I've noticed that a lot of game engines seemed much more optimized than both TGE and TSE, and was wondering why. Glad to know something's being done about it.

Matt Laurenson   (Jun 07, 2006 at 02:49 GMT)
nice work :]

- Raxx, "much more optimised?" - probably a little exgeratad, of course i dont know what engines your
comparing torque to - tse is incomplete, tge is geared towards older hardware...if its tge your comparing
against other engines, you need to ask yourself - "do those engines run better than tge on older hardware",
by that i mean hardware without shader support and vertex buffer objects.
heh..anyway, im in no position to argue the point to any sensible conclusion, im not an engineer..

er..yeh, anyway....sounds great + will there be any new demos?

Anton Bursch   (Jun 07, 2006 at 03:19 GMT)
NICE, happy to see this implemented. :)

Jesse (Midhir) Liles   (Jun 07, 2006 at 03:31 GMT)
Good news indeed.

Martin Schultz   (Jun 07, 2006 at 03:59 GMT)
Huh, cool stuff. Sound good. :-)

Ben Garney   (Jun 07, 2006 at 04:01 GMT)
This is the kind of rendering optimization you find in the latest incarnations of the Quake engine, or in things like Unreal - definitely a big step up from our previous rendering architecture. :)

It's also way cooler and cleaner to program against, and makes the engine generally much more of a pleasure to work with. I'm very happy with how it's going, and it's only going to get more refined and performant from here...

Vashner   (Jun 07, 2006 at 04:44 GMT)   Resource Rating: 5
The future is so bright.. I got to wear shades.

:)

James Laker (BurNinG)   (Jun 07, 2006 at 05:57 GMT)
Randy: hehe

Stefan Lundmark   (Jun 07, 2006 at 07:04 GMT)   Resource Rating: 5
This is cool stuff!

Brian, did you notice the thread in the TSE Bugs Forum about Materials not getting sent over the network properly? And specular not showing up on clients but on the server? If you haven't nailed these yet, I got some more info which might be helpful.

Let me know.

Mike Kuklinski   (Jun 07, 2006 at 07:12 GMT)
EDIT---Didn't actually read his bug, now I did. Cheers :)
Edited on Jun 07, 2006 07:42 GMT

John Kanalakis   (Jun 07, 2006 at 07:45 GMT)
Grouping draw calls by material definitions... very clever! We're very excited about the new performance gains. Thanks team!!!

Tom Spilman   (Jun 07, 2006 at 08:28 GMT)
44 fps with one Orc would be an improvement from what i get with the 6600... 10 is all gravy. =)

Can't wait to merge this bad boy.

Nigel Hungerford-Symes   (Jun 07, 2006 at 08:40 GMT)
Sounds like a nice big step forward, congrats!

Vincent BILLET   (Jun 07, 2006 at 10:15 GMT)
Waooohh... Sounds very interesting. I think this changes are going to increase drastically FPS. When you say it's ready pretty soon... Do you mean to the end of the month? Which Atlas changes do you wait?

Phil Carlisle   (Jun 07, 2006 at 10:29 GMT)
Brian,

Would this system be a good starting point to support instancing? I've got a hankering to play around with instancing although when I spoke to Richard Huddy he didnt sound absolutely convincing to the extra framerate I might get for it.

I'm glad to see the batching coming together though, if it can help me fix the alpha render order stuff that'll be great (and getting new atlas stuff in there together will be supah!).

Just want to say I appreciate the effort that goes into this stuff, youre a star!

As Ben said, its only up from here.

So is now the time to ask, whats next? :)

James Urquhart   (Jun 07, 2006 at 10:40 GMT)   Resource Rating: 3
Nice work, Brian!

That batching system seems similar to what quake 3 does, where each "surface" in the scene is sorted by type of shader (decal, weapon, level, brush, etc). It also has some provisions for merging shader passes, e.g. when rendering blood particles.
That system in itself reminds me of TGE's scenegraph, where each SceneRenderImage has a sort key - though sadly, i don't think it was exploited enough to be useful for batch rendering :(

Ian \"Xest\" Winter   (Jun 07, 2006 at 15:07 GMT)
Excellent news, look forward to it, looking forward to the Atlas changes too, keeping my fingers crossed they're the ones you've teased about! ;)
Edited on Jun 07, 2006 15:08 GMT

James Thompson   (Jun 07, 2006 at 15:49 GMT)
This is fantastic, cant wait for the release

Brian Ramage   (Jun 07, 2006 at 18:39 GMT)   Resource Rating: 5
@Phil - Sort of. It will definitely render a bunch of instances for an object in a much more optimized way. I guess you could combine some of the batches to take advantage of hardware instancing. It would be interesting to see what kind of gain that would get. Yes, your translucency problems should go away now - worst case you'll have to rearrange some of the render bins, but that's only 5 mintues to an hours work.

@James - yes, it is similar to what Quake3 does with their material batching.

Very soon means likely within the month.

I'll let Ben talk about the latest Atlas update.

X-Tatic   (Jun 07, 2006 at 18:51 GMT)
dead img links :(

Ramen-sama   (Jun 07, 2006 at 23:25 GMT)
image links working just fine here.

I only get 14 FPS with TSE demo and the Starter.FPS mod.
I wonder how much better it'd be with these optimizations :)
I got a Geforce FX 5600 too.... can you believe it?

Chris \"C2\" Byars   (Jun 08, 2006 at 00:00 GMT)
Quote:

Yes, your translucency problems should go away now

Awesome news; even if it takes some minor tweaking, I'm ready for this.

Ishbuu   (Jun 08, 2006 at 06:35 GMT)
Wonderful news! Looks pretty as usual ;)

The more optimizations, the more polies you can have in objects. The more polies per object, the less my artist (and me) whine. The less whining the better ;)

Now, back to work :P

Heh :)
[Ishbuu]

Alexander "taualex" Gaevoy   (Jun 08, 2006 at 15:23 GMT)
@Brian Ramage:

Good work and good start, man! :) I'm glad TSE's bin is getting filled with more features!
I have exactly the same dev computer and a video card as you and just for your reference, I would like to give you a reference point: I'm working with another AAA engine that uses shaders as well. That engine gives about 100fps on 100 object similar to the SpaceOrc on my dev system, so keep the excellent job ;) I hope TSE will be there soon as well ;) Or should I say "I confident TSE will be there soon"? ;P

Edit - typos.
Edited on Jun 08, 2006 15:24 GMT

Brian Ramage   (Jun 08, 2006 at 19:32 GMT)   Resource Rating: 5
@Alexander - I don't know what dark and mysterious technology (mainly because you didn't say) you've been working with that can draw 450,000 polys worth of skinned characters at 100fps, but it sure sounds impressive ;)

All I know is when I see 10 characters even lower poly than this in Oblivion, my fps is about 10, so I think we're doing alright.

Alexander "taualex" Gaevoy   (Jun 08, 2006 at 20:29 GMT)
oops, not THAT many polygons :) I think an object is about 1000 polys, but about the same texture with shaders and a normal map. I dont give a name of that technology for simple purpose - I dont want to sparkle another Engine vs Engine flame war ;) I didnt pay attention that orc is about 4500 polys.

Stephen Zepp   (Jun 08, 2006 at 21:15 GMT)
Yes, keep in mind that the space orc isn't a truly optimized model either--it's in fact high poly (it could probably get away with a much lower poly quota with some work on normal mapping).

Brian Ramage   (Jun 09, 2006 at 05:16 GMT)   Resource Rating: 5
Yes, it could be further optimized - I put the screen up to compare old TSE performance to new TSE performance, not benchmark TSE. I got a little prickly at Alexander's comment because it's comparing apples to oranges in terms of the data set, especially since this orc doesn't have LOD data on it. Plus, it's a little uncool to say, "Hey there's this tech I'm using that's 10x better than yours.... Oh but I can't show you or talk about it."

Bah, I'm just being oversensitive, I know you didn't mean it badly Alexander ;)

Don Hogan   (Jun 09, 2006 at 14:51 GMT)
*rowwwwr*!!

TSE just gets sexier every time you guys show off the latest tweaks! Problem is, it makes me want to play with it and not focus on my TGB project! Oh, what a horrible life for me, so many nice engines to play with... ;-D

- Don

Alexander "taualex" Gaevoy   (Jun 09, 2006 at 16:43 GMT)
@Brian: No, I didnt, mAn! ;) It was just a reference point, not comparison. Take it easy, plz :P

Jeromy Stroh   (Jun 10, 2006 at 07:03 GMT)
Can we get lighting now?

Jesse McKinney   (Jun 10, 2006 at 15:56 GMT)
When you say Dynamic Vertex Buffers are now fully implemented does that include dynamic vertex buffers for Interiors. I keep overflowing the vertex buffers and the old cut apart your interior hack just doesn't seem right.

"I'm just cleaning up some minor remaining issues and waiting for some Atlas changes to be complete before it's released"
Are those changes the release of Atlas 2?

Nice plan BTW should prove very useful.

Manoel Neto   (Jun 11, 2006 at 11:57 GMT)
How much do these changes affect the prepRenderImage() and renderObject() dance? We've been working on a custom class for large, walkable structures (we use gile[s] files, which are mesh-based, not CSG), and I'd like to know how much I'll need to change to integrate it on the new system (we've got the loading and rendering parts done, all that is left is culling, collision and transparency sorting).

Brian Ramage   (Jun 12, 2006 at 20:06 GMT)   Resource Rating: 5
@Jesse - are you overflowing the vertex buffer or the number of available indexed vertices? We're currently using 16 bit index buffers which I believe was in place to support Geforce 3 cards, but I'll take another look. I don't believe there is a limit to the size of the static (or dynamic) vertex buffers other than what D3D will allow. Volatile buffers you can overflow, but you can change the size of the pool if you need it to be larger.

@Manoel - it is a big change to prepRenderImage and renderObject - at least in the areas of the engine that take advantage of batching. If you've got your own object class and are happy with the way it performs, you can still pretty much render it the old way. The Sky and misc fxRender objects are handled like that.

Manoel Neto   (Jun 13, 2006 at 20:58 GMT)
@Brian - awesome! So DTS objects will magically run faster, and our own objects won't get broken. Sounds very good! Our custom objects use a basic per-material batching for it's sub-materials already, and I was impressed at performance, even without culling.

Alexander "taualex" Gaevoy   (Jun 19, 2006 at 20:56 GMT)
Fixed function question for TSE

You must be a member and be logged in to either append comments or rate this resource.