Previous Blog Next Blog
Prev/Next Blog
by date

Plan for Clark Fagot

Plan for Clark Fagot
Name:Clark Fagot
Date Posted:Jan 06, 2003
Rating:4.9 out of 5
Public:YES
Comments:YES
RSS Feed:GarageGames Blog feedor Subscribe with .
Profile Page:View profile page for Clark Fagot

Blog post
Improved frame-rate for ThinkTanks 50-100% in December. Optimizations were mostly rendering speed-ups. Low end machines saw largest impact. These are worst case improvements (highest poly count scenes) -- average case seems to have improved even more.


First some numbers


Frame-rates
| Before | After | After+ | After++ |
Athlon 900 | Geforce 4 | 640 x 480 | 34 | 49 | 56 | 62 |
Athlon 900 | Geforce 4 | 800 x 600 | 31 | 42 | 44 | 47 |
P1000 | Geforece 4 | 800 x 600 | 33 | 47 | 50 | 57 |
P1000 | Geforce2MX | 640 x 480 | 35 | 36 | 37 | 40 |
P1000 | Geforce2MX | 800 x 600 | 23 | 24 | 24 | 26 |
P300 | Geforce2MX | 640 x 480 | 12 | 22 | 24 | 26 |
P400 | TNT | 800 x 600 | 8 | 12 | 12 | 14 |
P400 | TNT | 640 x 480 | 9 | 13 | 14 | 20 |


Poly counts for 640 x 480

| Terrain | Three-space |
Before/After | 6622 | 8250 |
After+ | 5520 | 8250 |
After++ | 2580 | 8250 |

Poly counts for 800 x 600

| Terrain | Three-space |
Before/After | 7602 | 10774 |
After+ | 6605 | 10774 |
After++ | 3027 | 10774 |


These tables compares frame-rates (and list poly counts) for various machines before and after optimizations. The "After" condition includes several optimizations I'll mention below. The "After+" includes a terrain optimization which reduces the size of the terrain (block shift) but not so much that it affects the visual quality of our game. The "After++" reduces the terrain one more time, leaving it the precise size of our play field. This provides a nice detail setting for low end machines (and high end machines that want the best framerate without sacrificing any important visual details). ThinkTanks will use the After+ and After++ terrains only (the standard terrain is simply too large for our game).

Notice that the P400 with a TNT video card (an old card) is now in the playable range (the above numbers are worst case, standing at one end of the playfield and viewing everything -- framerate jumps considerably when in the middle of the playfield or facing away from it). This also doesn't include any detail preferences ($TS::detailAdjust, $Terrain::ScreenError and everything else is on full).

Below is a list of some of the optimizations I performed. Before you ask, yes we plan to share. Over the next couple weeks I'll post resources for some of the optimizations. Some of them will be difficult to extend to the community due to the extensiveness of the changes, but we will do our best.

1. glFinish. Go into guiCanvas.cc and get rid of the call to glFinish. It simply isn't needed. I've never been able to find any resource on the net or any book suggesting it should be used for real-time work. If you can find such a thing, let me know and I'll eat my hat. Removing this will improve framerates as much as 30% on low end machines. Note: even the "Before" framerates above had this optimization, so in a sense those numbers are understated.

2. terrCheck. SceneTraversal performs an occlusion check against the terrrain on all potentially rendered objects. I found that this was taking up about 5% of the cycles when everything was visible (we have upwards of 100 objects out there). I optimized it to use the results from the previous frame in order to decide whether to perform the check on an object that frame or wait for a few frames (16) before rechecking. Since objects tend to remain occluded or not occluded from one frame to the next this is very effective. This saved about 5% in the worst case (when nothing was really occluded) and about 4% in the case where many things were! I will post a resource on this optimization soon. Note: the terrCheck function itself is a bit odd, and probably can use a re-work. This has been submitted as a resource -- check it out.

3. cone check. The scene graph compares the bounding box of the potentially rendered object to the view frustum after already doing a bit of work (e.g., checking to see if object is fully fogged). I added a simple cone check in order to quickly accept or reject the object as being inside the view frustum. This saves the fog check and the more complicated frustum check in 90% of the cases. Savings were less than 1% on my machine, but it was clearly a positive move.

4. inlines. There are several functions that should be inlines. If you are doing a lot of ts animation you will see a benefit of inlining many of the functions in tsthread. More importantly, inline all of the FrameAllocator functions. This will get you about 2% (mostly because of the terrain, which uses the frame allocator extensively).

5. dglSetCanonicalState. Get rid of all calls to dglSetCanonicalState inside rendered objects. This will cause asserts in some cases where some object didn't reset the state properly. Just fix that problem and don't pay the price of resetting the gl state every object.

6. renderImage new. Overrode the new operator on render images. Now allocate them from a pool (be careful because interior SubObjectRenderImage has a different size than all others). This fix will give you a modest speedup, but will fight memory fragmentation which is important.

7. sleep. If you are making a single player game, consider removing the call to sleep in winWindow. The purpose of this call is to allow dedicated servers in the background to get some cycles. If you are making a single player game, that shouldn't matter (windows is a pre-emptive multi-tasking OS, so it can handle apps that don't give their cycles away to charity). This can get you a few percent (maybe more on slow machines).

8. GameTSCtrl::onRender. Remove the call to renderChildControls. The parent class already does this. Currently, all gui ctrls are being drawn twice. If any of your controls are time consuming you will get hit twice.

10. Reduce blockShift. Ok, this was a biggie for us. Added the ability to change the terrain block shift and whether or not the terrain tiles. So, by changing block shift to 7 instead of 8, we reduce the terrain block from 2048 x 2048 to 1024 x 1024. The result is fewer terrain polys and fewer textures to be blended. If you have a game like ours which is best played on a reduced playfield, this can provide a nice speedup without sacrificing any visual quality. Reducing the terrain size one more notch, brings the terrain down to 512 x 512. This is our full playfield. The low end machines sacrifice nothing important for drastically improved performance. The framerates at the top of my .plan show the drastic improvements this provides on some machines.

11. projection/viewport state. If you look at the rendering code for all the objects in the Torque you will notice all these calls that setup the projection and save off the viewport (and later restore it). Those are really a waste of time for most objects in the game (anything on the terrain rather than inside an interior). I added some state management code so that it doesn't end up doing all this work just to set up the viewport and projection matrix to be exactly like it already was. This optimmization saves about 5%.

12. state switches. One of the places where the Torque does not do a great job is the rendering pipeline. There is no management of the graphics states, so if you ever look through a gl log of a Torque rendered scene, you start seeing things like glEnable(GL_BLEND), glDisable(GL_BLEND), glEnable(GL_BLEND). You get the idea. Since most of our shapes are 3space shapes, I added a mode to the scene graph whereby it tracks whether it is in "3space mode" rather than "cannonical mode". This allows us to render all the rocks without switching state, all the trees without switching state, etc. This optimization got rid of about 8% of cpu cycles that were being spent in the state switchin code, but seemed to also create a further boost due to the remaining rendering calls taking less time. Getting this optimization to the community will be difficult. If someone wants to step forward to help out please let me know.

Ok, that's all for now. Look for some of these in resources in the coming weeks.

clark
Technical Director
BraveTree Productions

Recent Blog Posts
List:06/14/05 - Plan for Clark Fagot
04/21/05 - Plan for Clark Fagot
02/06/05 - Plan for Clark Fagot
04/14/04 - Plan for Clark Fagot
10/22/03 - Plan for Clark Fagot
01/06/03 - Plan for Clark Fagot

Submit ResourceSubmit your own resources!

Prairie Games   (Jan 06, 2003 at 20:21 GMT)
Extremely good work! In fact, GREAT work!

-J

Nicolas Quijano   (Jan 06, 2003 at 20:24 GMT)   Resource Rating: 5
Right on, Clark, very cool to hear this is in the pipeline...
Thanks for all your hard work

Luc Jordan   (Jan 06, 2003 at 20:40 GMT)
I was just thinking that for #10, you know, Orbz is played on a pretty reduced playing field. Maybe they could implement this? I have a GeForce 4 Ti4200, and I get slowdowns (probably from the large amounts of particle effects, but I don't know).

John Quigley   (Jan 06, 2003 at 22:07 GMT)
Nice work Clark. I expect this plan update will become a key resource on the subject of performance.

How exactly did you measure the performance...did you use the engine's profiler? Also did you use prerecorded demos to get a consistent set of tests? Just curious...really don't know the best methodology for instrumenting this beast.

Ben Garney   (Jan 06, 2003 at 22:59 GMT)
I look forward to seeing these things come out as resources - I already added a few of the basic ones to my own copy of Torque. Good stuff!
Edited on Jan 06, 2003 23:02 GMT

Pat Wilson   (Jan 06, 2003 at 23:02 GMT)
Wow. That was supurb.

Ernest   (Jan 07, 2003 at 00:41 GMT)
At the moment I am still quite ignorant about Torque's innards, so I am tremendously grateful for this.

Keep up the great work, man!

Clark Fagot   (Jan 07, 2003 at 00:57 GMT)
Thanks for the comments.

@Luc - 21-6 will be in town next weekend so we'll be talking to them about getting some of this into Orbz.

@John - I simply used the built in Torque profiler. It's quite excellent. I have it hooked up to a key so I simply hold it down and it profiles till I release the key. Perhaps I can put that up as a resource too. I have used demo recordings in the past. That's a good route. For this, though, I simply layed down a marker and set the tank transform to that marker before recording fps and doing a profile. This worked well because I was mostly concerned about the rendering pipeline (collisions, physics, and everything else is pretty low cpu hit in our game).

Davis Ray Sickmon, Jr   (Jan 07, 2003 at 05:13 GMT)
Excellent work, Clark! Glad you pointed all this out. Truthfully, I couldn't figure out how to quickly implement a couple o' items there (no big surprise to me - there's parts I still haven't bothered to get familiar with), and still got a pretty nice boost rather quickly. I never looked through stuff that closely, and didn't realize the UI items were being drawn twice - that made a huge difference when stuff like my radar controls are onscreen! Very cool :-)

Clark Fagot   (Jan 07, 2003 at 05:36 GMT)
@Davis - We're using the radar resource that was posted by Matt Webster. That thing is nice because it is general, but we found that iterating through the entire client group was a little time consuming (we have over 100 static shapes, so it was taking up about 1%). I changed it so that objects that wanted to be on the radar had to add themselves to a list. A little less general (objects need to know that they want to be added to the radar) but quicker. I like the idea of a general resource like that which can be tailored for speed for a particular game during optimization stages. I don't know if you're using the same resource, but that might present another optimization opportunity for you.

Davis Ray Sickmon, Jr   (Jan 07, 2003 at 06:12 GMT)
At the moment - yep, I'm using a modified version of Matt's radar control. However, I think it's about to go out the window anyway (or it may say, but it's usefulness is severely downgraded.) With TZ, one of the problems I've been having was that since it's all static bases (think Scorched Earth) finding your opponent in a 3D landscape that is deformable can be no small problem ;-) I'm hackin' out a different way now - I've modified the ShapeBaseName control to act as 'x-ray vision'. When an opponent is behind a mountain (or anything else) you still see thier name, and an indication of how far away they are (an imprecise reading.)

So I'll take the optimization a step further, and kill off the GuiRadarControl I think. ;-)

Josef Jahn   (Jan 07, 2003 at 08:33 GMT)   Resource Rating: 5
Very good!

Just one thing though: Be careful with the GL states. Either you go all the way through adding a state system that optimizes state changes, or you leave it as it is. I personally don't think that getting rid of a few "glEnable" and "glDisable"-calls is worth the whole effort. Yes, state changes do cost CPU time. But it really isn't that much. If you mess up the states however, you'll get all kinds of fancy effects and you'll have a hard time to find out what's causing it ;)

Clark Fagot   (Jan 08, 2003 at 00:28 GMT)
@Josef - I hear where you are coming from in terms of causing rendering bugs, but I think you underestimate the impact of excessive state switching. It not only takes up CPU time, but slows down the video card too. You'll notice in the screen shot at the top of this page that the same shape is repeated quite a bit. It's a huge win to be able to draw all of the shapes of the same type without reseting any states (except for lighting and fogging).

Mike Nelson   (Jan 08, 2003 at 20:19 GMT)
Clark, very nice stuff here.

Can't wait to see how this stuff works in Orbz.

Jarrod Roberson   (Jan 10, 2003 at 05:08 GMT)   Resource Rating: 5
5. dglSetCannonical. Get rid of all calls to dglSetCannonical inside rendered objects. This will cause asserts in some cases where some object didn't reset the state properly. Just fix that problem and don't pay the price of resetting the gl state every object.

Could someone elaborate on the "just fix that problem" portion of this optomization???

James Lupiani   (Jan 10, 2003 at 07:05 GMT)
Jarrod: All renderable objects should restore themselves to canonical state on their own, based on the changes they made to diverge from the canonical state. This is quicker than calling the dgl function, which sets all state variables to make sure they're proper.

Ken Finney   (Jan 10, 2003 at 14:42 GMT)
hmmm, I grepped a December HEAD for dglSetCannonical and found absolutely *no* instances of it....

Ken Finney   (Jan 10, 2003 at 14:43 GMT)
<oops, double-post. when I posted the above message, I got an "Access forbidden" message and I thought the post didn't happen>
Edited on Jan 10, 2003 14:44 GMT

Clark Fagot   (Jan 10, 2003 at 15:09 GMT)
@Ken - I mis-spelled "canonical". Try searching for "dglSetCanonical" (actually, "dglSetCanonicalState"). I'll fix it above.

Ken Finney   (Jan 11, 2003 at 02:16 GMT)
Thanks Clark - of course I just copy&pasted from here into the find box ... :-)
Found 16 instances this time !

Jarrod Roberson   (Jan 13, 2003 at 00:18 GMT)   Resource Rating: 5
So if any of the states that are listed in dglSetCanonicalState are changed they should be reset to the values listed in dglSetCanonicalState locally also? Is it as simple as that? I am just diving into the rendering code, thanks for the help.

Clark Fagot   (Jan 13, 2003 at 01:17 GMT)
@Jarrod - yes, that's it.

Clark Fagot   (Jan 17, 2003 at 15:41 GMT)
Wanted to mention that I wrote up a resource for optimizatoin #2 and it is now approved -- check it out

Jeffrey P. Bakke   (Feb 24, 2003 at 21:24 GMT)
Item #12.
What if you were to write a 'wrapper' function around glEnable/glDisable. Call it oglEnable, oglDisable and it internally maintains the current OpenGL state. Inline the functions so that basically they check to see if the state request REALLY changes anything, otherwise if its the same state, return immediately. This won't remove the problem of changing states back and forth but it would remove multiple GL_BLEND (etc) calls in a row.

Bill Koons   (Feb 24, 2005 at 22:05 GMT)
Clark,

I'm interested in reducing the terrain size. I assume that there is more involved than simply setting the BlockShift variable to 7 or 6. Can you give me some insight on what all is involved here? A 512 x 512 map is just what I'm looking for!

Thanks!

Stefan Lundmark   (Mar 01, 2005 at 13:32 GMT)
Bill, this plan is ooooold. You might wanna approach him directly.

Markus Nuebel   (Jan 29, 2006 at 18:03 GMT)
Although this resource is somewhat old, it might still be useful to people.

For anyone using this resource I would like to leave a comment about a problem I recently encountered with tip1:
Quote:


1. glFinish. Go into guiCanvas.cc and get rid of the call to glFinish. It simply isn't needed.



After getting rid of glFinish the Direct3D rendering using the opengl2d3d dll no more worked, in cases where you have multiple guis stacked behind each other on your canvas.
Withouth the glFinish() call, the direct3d translation is no more able to produce a correct z-order of your guis.

My stacked guis were displayed in a kind of interleaved way, so that covered guis could be seen through the guis in front.

Bringing back glFinish() solved the problem.

-- Markus

Stephen Lujan   (Apr 05, 2007 at 09:08 GMT)
8. GameTSCtrl::onRender. Remove the call to renderChildControls. The parent class already does this. Currently, all gui ctrls are being drawn twice. If any of your controls are time consuming you will get hit twice.

this no longer applies apparently...
you can see here

   if(!skipRender && !mApplyFilterToChildren)
Parent::renderChildControls(offset, updateRect);

and here
   if(mApplyFilterToChildren)
renderChildControls(offset, updateRect);

mApplyFilterToChildren assures it only happens once

You must be a member and be logged in to either append comments or rate this resource.