Plan for Clark Fagot
by Clark Fagot · 01/06/2003 (12:16 pm) · 28 comments

First some numbers
Frame-rates
| Before | After | After+ | After++ |
Athlon 900 | Geforce 4 | 640 x 480 | 34 | 49 | 56 | 62 |
Athlon 900 | Geforce 4 | 800 x 600 | 31 | 42 | 44 | 47 |
P1000 | Geforece 4 | 800 x 600 | 33 | 47 | 50 | 57 |
P1000 | Geforce2MX | 640 x 480 | 35 | 36 | 37 | 40 |
P1000 | Geforce2MX | 800 x 600 | 23 | 24 | 24 | 26 |
P300 | Geforce2MX | 640 x 480 | 12 | 22 | 24 | 26 |
P400 | TNT | 800 x 600 | 8 | 12 | 12 | 14 |
P400 | TNT | 640 x 480 | 9 | 13 | 14 | 20 |
Poly counts for 640 x 480
| Terrain | Three-space |
Before/After | 6622 | 8250 |
After+ | 5520 | 8250 |
After++ | 2580 | 8250 |
Poly counts for 800 x 600
| Terrain | Three-space |
Before/After | 7602 | 10774 |
After+ | 6605 | 10774 |
After++ | 3027 | 10774 |These tables compares frame-rates (and list poly counts) for various machines before and after optimizations. The "After" condition includes several optimizations I'll mention below. The "After+" includes a terrain optimization which reduces the size of the terrain (block shift) but not so much that it affects the visual quality of our game. The "After++" reduces the terrain one more time, leaving it the precise size of our play field. This provides a nice detail setting for low end machines (and high end machines that want the best framerate without sacrificing any important visual details). ThinkTanks will use the After+ and After++ terrains only (the standard terrain is simply too large for our game).
Notice that the P400 with a TNT video card (an old card) is now in the playable range (the above numbers are worst case, standing at one end of the playfield and viewing everything -- framerate jumps considerably when in the middle of the playfield or facing away from it). This also doesn't include any detail preferences ($TS::detailAdjust, $Terrain::ScreenError and everything else is on full).
Below is a list of some of the optimizations I performed. Before you ask, yes we plan to share. Over the next couple weeks I'll post resources for some of the optimizations. Some of them will be difficult to extend to the community due to the extensiveness of the changes, but we will do our best.
1. glFinish. Go into guiCanvas.cc and get rid of the call to glFinish. It simply isn't needed. I've never been able to find any resource on the net or any book suggesting it should be used for real-time work. If you can find such a thing, let me know and I'll eat my hat. Removing this will improve framerates as much as 30% on low end machines. Note: even the "Before" framerates above had this optimization, so in a sense those numbers are understated.
2. terrCheck. SceneTraversal performs an occlusion check against the terrrain on all potentially rendered objects. I found that this was taking up about 5% of the cycles when everything was visible (we have upwards of 100 objects out there). I optimized it to use the results from the previous frame in order to decide whether to perform the check on an object that frame or wait for a few frames (16) before rechecking. Since objects tend to remain occluded or not occluded from one frame to the next this is very effective. This saved about 5% in the worst case (when nothing was really occluded) and about 4% in the case where many things were! I will post a resource on this optimization soon. Note: the terrCheck function itself is a bit odd, and probably can use a re-work. This has been submitted as a resource -- check it out.
3. cone check. The scene graph compares the bounding box of the potentially rendered object to the view frustum after already doing a bit of work (e.g., checking to see if object is fully fogged). I added a simple cone check in order to quickly accept or reject the object as being inside the view frustum. This saves the fog check and the more complicated frustum check in 90% of the cases. Savings were less than 1% on my machine, but it was clearly a positive move.
4. inlines. There are several functions that should be inlines. If you are doing a lot of ts animation you will see a benefit of inlining many of the functions in tsthread. More importantly, inline all of the FrameAllocator functions. This will get you about 2% (mostly because of the terrain, which uses the frame allocator extensively).
5. dglSetCanonicalState. Get rid of all calls to dglSetCanonicalState inside rendered objects. This will cause asserts in some cases where some object didn't reset the state properly. Just fix that problem and don't pay the price of resetting the gl state every object.
6. renderImage new. Overrode the new operator on render images. Now allocate them from a pool (be careful because interior SubObjectRenderImage has a different size than all others). This fix will give you a modest speedup, but will fight memory fragmentation which is important.
7. sleep. If you are making a single player game, consider removing the call to sleep in winWindow. The purpose of this call is to allow dedicated servers in the background to get some cycles. If you are making a single player game, that shouldn't matter (windows is a pre-emptive multi-tasking OS, so it can handle apps that don't give their cycles away to charity). This can get you a few percent (maybe more on slow machines).
8. GameTSCtrl::onRender. Remove the call to renderChildControls. The parent class already does this. Currently, all gui ctrls are being drawn twice. If any of your controls are time consuming you will get hit twice.
10. Reduce blockShift. Ok, this was a biggie for us. Added the ability to change the terrain block shift and whether or not the terrain tiles. So, by changing block shift to 7 instead of 8, we reduce the terrain block from 2048 x 2048 to 1024 x 1024. The result is fewer terrain polys and fewer textures to be blended. If you have a game like ours which is best played on a reduced playfield, this can provide a nice speedup without sacrificing any visual quality. Reducing the terrain size one more notch, brings the terrain down to 512 x 512. This is our full playfield. The low end machines sacrifice nothing important for drastically improved performance. The framerates at the top of my .plan show the drastic improvements this provides on some machines.
11. projection/viewport state. If you look at the rendering code for all the objects in the Torque you will notice all these calls that setup the projection and save off the viewport (and later restore it). Those are really a waste of time for most objects in the game (anything on the terrain rather than inside an interior). I added some state management code so that it doesn't end up doing all this work just to set up the viewport and projection matrix to be exactly like it already was. This optimmization saves about 5%.
12. state switches. One of the places where the Torque does not do a great job is the rendering pipeline. There is no management of the graphics states, so if you ever look through a gl log of a Torque rendered scene, you start seeing things like glEnable(GL_BLEND), glDisable(GL_BLEND), glEnable(GL_BLEND). You get the idea. Since most of our shapes are 3space shapes, I added a mode to the scene graph whereby it tracks whether it is in "3space mode" rather than "cannonical mode". This allows us to render all the rocks without switching state, all the trees without switching state, etc. This optimization got rid of about 8% of cpu cycles that were being spent in the state switchin code, but seemed to also create a further boost due to the remaining rendering calls taking less time. Getting this optimization to the community will be difficult. If someone wants to step forward to help out please let me know.
Ok, that's all for now. Look for some of these in resources in the coming weeks.
clark
Technical Director
BraveTree Productions
About the author
Recent Blogs
• Plan for Clark Fagot• Plan for Clark Fagot
• Plan for Clark Fagot
• Plan for Clark Fagot
• Plan for Clark Fagot
#2
Thanks for all your hard work
01/06/2003 (12:24 pm)
Right on, Clark, very cool to hear this is in the pipeline... Thanks for all your hard work
#3
01/06/2003 (12:40 pm)
I was just thinking that for #10, you know, Orbz is played on a pretty reduced playing field. Maybe they could implement this? I have a GeForce 4 Ti4200, and I get slowdowns (probably from the large amounts of particle effects, but I don't know).
#4
How exactly did you measure the performance...did you use the engine's profiler? Also did you use prerecorded demos to get a consistent set of tests? Just curious...really don't know the best methodology for instrumenting this beast.
01/06/2003 (2:07 pm)
Nice work Clark. I expect this plan update will become a key resource on the subject of performance.How exactly did you measure the performance...did you use the engine's profiler? Also did you use prerecorded demos to get a consistent set of tests? Just curious...really don't know the best methodology for instrumenting this beast.
#5
01/06/2003 (2:59 pm)
I look forward to seeing these things come out as resources - I already added a few of the basic ones to my own copy of Torque. Good stuff!
#6
01/06/2003 (3:02 pm)
Wow. That was supurb.
#7
Keep up the great work, man!
01/06/2003 (4:41 pm)
At the moment I am still quite ignorant about Torque's innards, so I am tremendously grateful for this.Keep up the great work, man!
#8
@Luc - 21-6 will be in town next weekend so we'll be talking to them about getting some of this into Orbz.
@John - I simply used the built in Torque profiler. It's quite excellent. I have it hooked up to a key so I simply hold it down and it profiles till I release the key. Perhaps I can put that up as a resource too. I have used demo recordings in the past. That's a good route. For this, though, I simply layed down a marker and set the tank transform to that marker before recording fps and doing a profile. This worked well because I was mostly concerned about the rendering pipeline (collisions, physics, and everything else is pretty low cpu hit in our game).
01/06/2003 (4:57 pm)
Thanks for the comments.@Luc - 21-6 will be in town next weekend so we'll be talking to them about getting some of this into Orbz.
@John - I simply used the built in Torque profiler. It's quite excellent. I have it hooked up to a key so I simply hold it down and it profiles till I release the key. Perhaps I can put that up as a resource too. I have used demo recordings in the past. That's a good route. For this, though, I simply layed down a marker and set the tank transform to that marker before recording fps and doing a profile. This worked well because I was mostly concerned about the rendering pipeline (collisions, physics, and everything else is pretty low cpu hit in our game).
#9
01/06/2003 (9:13 pm)
Excellent work, Clark! Glad you pointed all this out. Truthfully, I couldn't figure out how to quickly implement a couple o' items there (no big surprise to me - there's parts I still haven't bothered to get familiar with), and still got a pretty nice boost rather quickly. I never looked through stuff that closely, and didn't realize the UI items were being drawn twice - that made a huge difference when stuff like my radar controls are onscreen! Very cool :-)
#10
01/06/2003 (9:36 pm)
@Davis - We're using the radar resource that was posted by Matt Webster. That thing is nice because it is general, but we found that iterating through the entire client group was a little time consuming (we have over 100 static shapes, so it was taking up about 1%). I changed it so that objects that wanted to be on the radar had to add themselves to a list. A little less general (objects need to know that they want to be added to the radar) but quicker. I like the idea of a general resource like that which can be tailored for speed for a particular game during optimization stages. I don't know if you're using the same resource, but that might present another optimization opportunity for you.
#11
So I'll take the optimization a step further, and kill off the GuiRadarControl I think. ;-)
01/06/2003 (10:12 pm)
At the moment - yep, I'm using a modified version of Matt's radar control. However, I think it's about to go out the window anyway (or it may say, but it's usefulness is severely downgraded.) With TZ, one of the problems I've been having was that since it's all static bases (think Scorched Earth) finding your opponent in a 3D landscape that is deformable can be no small problem ;-) I'm hackin' out a different way now - I've modified the ShapeBaseName control to act as 'x-ray vision'. When an opponent is behind a mountain (or anything else) you still see thier name, and an indication of how far away they are (an imprecise reading.)So I'll take the optimization a step further, and kill off the GuiRadarControl I think. ;-)
#12
Just one thing though: Be careful with the GL states. Either you go all the way through adding a state system that optimizes state changes, or you leave it as it is. I personally don't think that getting rid of a few "glEnable" and "glDisable"-calls is worth the whole effort. Yes, state changes do cost CPU time. But it really isn't that much. If you mess up the states however, you'll get all kinds of fancy effects and you'll have a hard time to find out what's causing it ;)
01/07/2003 (12:33 am)
Very good!Just one thing though: Be careful with the GL states. Either you go all the way through adding a state system that optimizes state changes, or you leave it as it is. I personally don't think that getting rid of a few "glEnable" and "glDisable"-calls is worth the whole effort. Yes, state changes do cost CPU time. But it really isn't that much. If you mess up the states however, you'll get all kinds of fancy effects and you'll have a hard time to find out what's causing it ;)
#13
01/07/2003 (4:28 pm)
@Josef - I hear where you are coming from in terms of causing rendering bugs, but I think you underestimate the impact of excessive state switching. It not only takes up CPU time, but slows down the video card too. You'll notice in the screen shot at the top of this page that the same shape is repeated quite a bit. It's a huge win to be able to draw all of the shapes of the same type without reseting any states (except for lighting and fogging).
#14
Can't wait to see how this stuff works in Orbz.
01/08/2003 (12:19 pm)
Clark, very nice stuff here.Can't wait to see how this stuff works in Orbz.
#15
Could someone elaborate on the "just fix that problem" portion of this optomization???
01/09/2003 (9:08 pm)
5. dglSetCannonical. Get rid of all calls to dglSetCannonical inside rendered objects. This will cause asserts in some cases where some object didn't reset the state properly. Just fix that problem and don't pay the price of resetting the gl state every object.Could someone elaborate on the "just fix that problem" portion of this optomization???
#16
01/09/2003 (11:05 pm)
Jarrod: All renderable objects should restore themselves to canonical state on their own, based on the changes they made to diverge from the canonical state. This is quicker than calling the dgl function, which sets all state variables to make sure they're proper.
#17
01/10/2003 (6:42 am)
hmmm, I grepped a December HEAD for dglSetCannonical and found absolutely *no* instances of it....
#19
01/10/2003 (7:09 am)
@Ken - I mis-spelled "canonical". Try searching for "dglSetCanonical" (actually, "dglSetCanonicalState"). I'll fix it above.
#20
Found 16 instances this time !
01/10/2003 (6:16 pm)
Thanks Clark - of course I just copy&pasted from here into the find box ... :-)Found 16 instances this time !

Torque Owner Prairie Games
Prairie Games, Inc.
-J