Plan for Clark Fagot
by Clark Fagot · 01/06/2003 (12:16 pm) · 28 comments

First some numbers
Frame-rates
| Before | After | After+ | After++ |
Athlon 900 | Geforce 4 | 640 x 480 | 34 | 49 | 56 | 62 |
Athlon 900 | Geforce 4 | 800 x 600 | 31 | 42 | 44 | 47 |
P1000 | Geforece 4 | 800 x 600 | 33 | 47 | 50 | 57 |
P1000 | Geforce2MX | 640 x 480 | 35 | 36 | 37 | 40 |
P1000 | Geforce2MX | 800 x 600 | 23 | 24 | 24 | 26 |
P300 | Geforce2MX | 640 x 480 | 12 | 22 | 24 | 26 |
P400 | TNT | 800 x 600 | 8 | 12 | 12 | 14 |
P400 | TNT | 640 x 480 | 9 | 13 | 14 | 20 |
Poly counts for 640 x 480
| Terrain | Three-space |
Before/After | 6622 | 8250 |
After+ | 5520 | 8250 |
After++ | 2580 | 8250 |
Poly counts for 800 x 600
| Terrain | Three-space |
Before/After | 7602 | 10774 |
After+ | 6605 | 10774 |
After++ | 3027 | 10774 |These tables compares frame-rates (and list poly counts) for various machines before and after optimizations. The "After" condition includes several optimizations I'll mention below. The "After+" includes a terrain optimization which reduces the size of the terrain (block shift) but not so much that it affects the visual quality of our game. The "After++" reduces the terrain one more time, leaving it the precise size of our play field. This provides a nice detail setting for low end machines (and high end machines that want the best framerate without sacrificing any important visual details). ThinkTanks will use the After+ and After++ terrains only (the standard terrain is simply too large for our game).
Notice that the P400 with a TNT video card (an old card) is now in the playable range (the above numbers are worst case, standing at one end of the playfield and viewing everything -- framerate jumps considerably when in the middle of the playfield or facing away from it). This also doesn't include any detail preferences ($TS::detailAdjust, $Terrain::ScreenError and everything else is on full).
Below is a list of some of the optimizations I performed. Before you ask, yes we plan to share. Over the next couple weeks I'll post resources for some of the optimizations. Some of them will be difficult to extend to the community due to the extensiveness of the changes, but we will do our best.
1. glFinish. Go into guiCanvas.cc and get rid of the call to glFinish. It simply isn't needed. I've never been able to find any resource on the net or any book suggesting it should be used for real-time work. If you can find such a thing, let me know and I'll eat my hat. Removing this will improve framerates as much as 30% on low end machines. Note: even the "Before" framerates above had this optimization, so in a sense those numbers are understated.
2. terrCheck. SceneTraversal performs an occlusion check against the terrrain on all potentially rendered objects. I found that this was taking up about 5% of the cycles when everything was visible (we have upwards of 100 objects out there). I optimized it to use the results from the previous frame in order to decide whether to perform the check on an object that frame or wait for a few frames (16) before rechecking. Since objects tend to remain occluded or not occluded from one frame to the next this is very effective. This saved about 5% in the worst case (when nothing was really occluded) and about 4% in the case where many things were! I will post a resource on this optimization soon. Note: the terrCheck function itself is a bit odd, and probably can use a re-work. This has been submitted as a resource -- check it out.
3. cone check. The scene graph compares the bounding box of the potentially rendered object to the view frustum after already doing a bit of work (e.g., checking to see if object is fully fogged). I added a simple cone check in order to quickly accept or reject the object as being inside the view frustum. This saves the fog check and the more complicated frustum check in 90% of the cases. Savings were less than 1% on my machine, but it was clearly a positive move.
4. inlines. There are several functions that should be inlines. If you are doing a lot of ts animation you will see a benefit of inlining many of the functions in tsthread. More importantly, inline all of the FrameAllocator functions. This will get you about 2% (mostly because of the terrain, which uses the frame allocator extensively).
5. dglSetCanonicalState. Get rid of all calls to dglSetCanonicalState inside rendered objects. This will cause asserts in some cases where some object didn't reset the state properly. Just fix that problem and don't pay the price of resetting the gl state every object.
6. renderImage new. Overrode the new operator on render images. Now allocate them from a pool (be careful because interior SubObjectRenderImage has a different size than all others). This fix will give you a modest speedup, but will fight memory fragmentation which is important.
7. sleep. If you are making a single player game, consider removing the call to sleep in winWindow. The purpose of this call is to allow dedicated servers in the background to get some cycles. If you are making a single player game, that shouldn't matter (windows is a pre-emptive multi-tasking OS, so it can handle apps that don't give their cycles away to charity). This can get you a few percent (maybe more on slow machines).
8. GameTSCtrl::onRender. Remove the call to renderChildControls. The parent class already does this. Currently, all gui ctrls are being drawn twice. If any of your controls are time consuming you will get hit twice.
10. Reduce blockShift. Ok, this was a biggie for us. Added the ability to change the terrain block shift and whether or not the terrain tiles. So, by changing block shift to 7 instead of 8, we reduce the terrain block from 2048 x 2048 to 1024 x 1024. The result is fewer terrain polys and fewer textures to be blended. If you have a game like ours which is best played on a reduced playfield, this can provide a nice speedup without sacrificing any visual quality. Reducing the terrain size one more notch, brings the terrain down to 512 x 512. This is our full playfield. The low end machines sacrifice nothing important for drastically improved performance. The framerates at the top of my .plan show the drastic improvements this provides on some machines.
11. projection/viewport state. If you look at the rendering code for all the objects in the Torque you will notice all these calls that setup the projection and save off the viewport (and later restore it). Those are really a waste of time for most objects in the game (anything on the terrain rather than inside an interior). I added some state management code so that it doesn't end up doing all this work just to set up the viewport and projection matrix to be exactly like it already was. This optimmization saves about 5%.
12. state switches. One of the places where the Torque does not do a great job is the rendering pipeline. There is no management of the graphics states, so if you ever look through a gl log of a Torque rendered scene, you start seeing things like glEnable(GL_BLEND), glDisable(GL_BLEND), glEnable(GL_BLEND). You get the idea. Since most of our shapes are 3space shapes, I added a mode to the scene graph whereby it tracks whether it is in "3space mode" rather than "cannonical mode". This allows us to render all the rocks without switching state, all the trees without switching state, etc. This optimization got rid of about 8% of cpu cycles that were being spent in the state switchin code, but seemed to also create a further boost due to the remaining rendering calls taking less time. Getting this optimization to the community will be difficult. If someone wants to step forward to help out please let me know.
Ok, that's all for now. Look for some of these in resources in the coming weeks.
clark
Technical Director
BraveTree Productions
About the author
Recent Blogs
• Plan for Clark Fagot• Plan for Clark Fagot
• Plan for Clark Fagot
• Plan for Clark Fagot
• Plan for Clark Fagot
#22
01/12/2003 (5:17 pm)
@Jarrod - yes, that's it.
#23
01/17/2003 (7:41 am)
Wanted to mention that I wrote up a resource for optimizatoin #2 and it is now approved -- check it out
#24
What if you were to write a 'wrapper' function around glEnable/glDisable. Call it oglEnable, oglDisable and it internally maintains the current OpenGL state. Inline the functions so that basically they check to see if the state request REALLY changes anything, otherwise if its the same state, return immediately. This won't remove the problem of changing states back and forth but it would remove multiple GL_BLEND (etc) calls in a row.
02/24/2003 (1:24 pm)
Item #12.What if you were to write a 'wrapper' function around glEnable/glDisable. Call it oglEnable, oglDisable and it internally maintains the current OpenGL state. Inline the functions so that basically they check to see if the state request REALLY changes anything, otherwise if its the same state, return immediately. This won't remove the problem of changing states back and forth but it would remove multiple GL_BLEND (etc) calls in a row.
#25
I'm interested in reducing the terrain size. I assume that there is more involved than simply setting the BlockShift variable to 7 or 6. Can you give me some insight on what all is involved here? A 512 x 512 map is just what I'm looking for!
Thanks!
02/24/2005 (2:05 pm)
Clark,I'm interested in reducing the terrain size. I assume that there is more involved than simply setting the BlockShift variable to 7 or 6. Can you give me some insight on what all is involved here? A 512 x 512 map is just what I'm looking for!
Thanks!
#26
03/01/2005 (5:32 am)
Bill, this plan is ooooold. You might wanna approach him directly.
#27
For anyone using this resource I would like to leave a comment about a problem I recently encountered with tip1:
After getting rid of glFinish the Direct3D rendering using the opengl2d3d dll no more worked, in cases where you have multiple guis stacked behind each other on your canvas.
Withouth the glFinish() call, the direct3d translation is no more able to produce a correct z-order of your guis.
My stacked guis were displayed in a kind of interleaved way, so that covered guis could be seen through the guis in front.
Bringing back glFinish() solved the problem.
-- Markus
01/29/2006 (10:03 am)
Although this resource is somewhat old, it might still be useful to people.For anyone using this resource I would like to leave a comment about a problem I recently encountered with tip1:
Quote:
1. glFinish. Go into guiCanvas.cc and get rid of the call to glFinish. It simply isn't needed.
After getting rid of glFinish the Direct3D rendering using the opengl2d3d dll no more worked, in cases where you have multiple guis stacked behind each other on your canvas.
Withouth the glFinish() call, the direct3d translation is no more able to produce a correct z-order of your guis.
My stacked guis were displayed in a kind of interleaved way, so that covered guis could be seen through the guis in front.
Bringing back glFinish() solved the problem.
-- Markus
#28
this no longer applies apparently...
you can see here
04/05/2007 (2:08 am)
8. GameTSCtrl::onRender. Remove the call to renderChildControls. The parent class already does this. Currently, all gui ctrls are being drawn twice. If any of your controls are time consuming you will get hit twice.this no longer applies apparently...
you can see here
if(!skipRender && !mApplyFilterToChildren)
Parent::renderChildControls(offset, updateRect);and hereif(mApplyFilterToChildren)
renderChildControls(offset, updateRect);mApplyFilterToChildren assures it only happens once 
Torque Owner Jarrod Roberson