by date
Plan for Clark Fagot
Plan for Clark Fagot
| Name: | Clark Fagot | ![]() |
|---|---|---|
| Date Posted: | Jan 06, 2003 | |
| Rating: | 4.9 out of 5 | |
| Public: | YES | |
| Comments: | YES | |
| RSS Feed: | or Subscribe with . | |
| Profile Page: | View profile page for Clark Fagot |
Blog post
Improved frame-rate for ThinkTanks 50-100% in December. Optimizations were mostly rendering speed-ups. Low end machines saw largest impact. These are worst case improvements (highest poly count scenes) -- average case seems to have improved even more.

First some numbers
Frame-rates
| Before | After | After+ | After++ |
Athlon 900 | Geforce 4 | 640 x 480 | 34 | 49 | 56 | 62 |
Athlon 900 | Geforce 4 | 800 x 600 | 31 | 42 | 44 | 47 |
P1000 | Geforece 4 | 800 x 600 | 33 | 47 | 50 | 57 |
P1000 | Geforce2MX | 640 x 480 | 35 | 36 | 37 | 40 |
P1000 | Geforce2MX | 800 x 600 | 23 | 24 | 24 | 26 |
P300 | Geforce2MX | 640 x 480 | 12 | 22 | 24 | 26 |
P400 | TNT | 800 x 600 | 8 | 12 | 12 | 14 |
P400 | TNT | 640 x 480 | 9 | 13 | 14 | 20 |
Poly counts for 640 x 480
| Terrain | Three-space |
Before/After | 6622 | 8250 |
After+ | 5520 | 8250 |
After++ | 2580 | 8250 |
Poly counts for 800 x 600
| Terrain | Three-space |
Before/After | 7602 | 10774 |
After+ | 6605 | 10774 |
After++ | 3027 | 10774 |
These tables compares frame-rates (and list poly counts) for various machines before and after optimizations. The "After" condition includes several optimizations I'll mention below. The "After+" includes a terrain optimization which reduces the size of the terrain (block shift) but not so much that it affects the visual quality of our game. The "After++" reduces the terrain one more time, leaving it the precise size of our play field. This provides a nice detail setting for low end machines (and high end machines that want the best framerate without sacrificing any important visual details). ThinkTanks will use the After+ and After++ terrains only (the standard terrain is simply too large for our game).
Notice that the P400 with a TNT video card (an old card) is now in the playable range (the above numbers are worst case, standing at one end of the playfield and viewing everything -- framerate jumps considerably when in the middle of the playfield or facing away from it). This also doesn't include any detail preferences ($TS::detailAdjust, $Terrain::ScreenError and everything else is on full).
Below is a list of some of the optimizations I performed. Before you ask, yes we plan to share. Over the next couple weeks I'll post resources for some of the optimizations. Some of them will be difficult to extend to the community due to the extensiveness of the changes, but we will do our best.
1. glFinish. Go into guiCanvas.cc and get rid of the call to glFinish. It simply isn't needed. I've never been able to find any resource on the net or any book suggesting it should be used for real-time work. If you can find such a thing, let me know and I'll eat my hat. Removing this will improve framerates as much as 30% on low end machines. Note: even the "Before" framerates above had this optimization, so in a sense those numbers are understated.
2. terrCheck. SceneTraversal performs an occlusion check against the terrrain on all potentially rendered objects. I found that this was taking up about 5% of the cycles when everything was visible (we have upwards of 100 objects out there). I optimized it to use the results from the previous frame in order to decide whether to perform the check on an object that frame or wait for a few frames (16) before rechecking. Since objects tend to remain occluded or not occluded from one frame to the next this is very effective. This saved about 5% in the worst case (when nothing was really occluded) and about 4% in the case where many things were! I will post a resource on this optimization soon. Note: the terrCheck function itself is a bit odd, and probably can use a re-work. This has been submitted as a resource -- check it out.
3. cone check. The scene graph compares the bounding box of the potentially rendered object to the view frustum after already doing a bit of work (e.g., checking to see if object is fully fogged). I added a simple cone check in order to quickly accept or reject the object as being inside the view frustum. This saves the fog check and the more complicated frustum check in 90% of the cases. Savings were less than 1% on my machine, but it was clearly a positive move.
4. inlines. There are several functions that should be inlines. If you are doing a lot of ts animation you will see a benefit of inlining many of the functions in tsthread. More importantly, inline all of the FrameAllocator functions. This will get you about 2% (mostly because of the terrain, which uses the frame allocator extensively).
5. dglSetCanonicalState. Get rid of all calls to dglSetCanonicalState inside rendered objects. This will cause asserts in some cases where some object didn't reset the state properly. Just fix that problem and don't pay the price of resetting the gl state every object.
6. renderImage new. Overrode the new operator on render images. Now allocate them from a pool (be careful because interior SubObjectRenderImage has a different size than all others). This fix will give you a modest speedup, but will fight memory fragmentation which is important.
7. sleep. If you are making a single player game, consider removing the call to sleep in winWindow. The purpose of this call is to allow dedicated servers in the background to get some cycles. If you are making a single player game, that shouldn't matter (windows is a pre-emptive multi-tasking OS, so it can handle apps that don't give their cycles away to charity). This can get you a few percent (maybe more on slow machines).
8. GameTSCtrl::onRender. Remove the call to renderChildControls. The parent class already does this. Currently, all gui ctrls are being drawn twice. If any of your controls are time consuming you will get hit twice.
10. Reduce blockShift. Ok, this was a biggie for us. Added the ability to change the terrain block shift and whether or not the terrain tiles. So, by changing block shift to 7 instead of 8, we reduce the terrain block from 2048 x 2048 to 1024 x 1024. The result is fewer terrain polys and fewer textures to be blended. If you have a game like ours which is best played on a reduced playfield, this can provide a nice speedup without sacrificing any visual quality. Reducing the terrain size one more notch, brings the terrain down to 512 x 512. This is our full playfield. The low end machines sacrifice nothing important for drastically improved performance. The framerates at the top of my .plan show the drastic improvements this provides on some machines.
11. projection/viewport state. If you look at the rendering code for all the objects in the Torque you will notice all these calls that setup the projection and save off the viewport (and later restore it). Those are really a waste of time for most objects in the game (anything on the terrain rather than inside an interior). I added some state management code so that it doesn't end up doing all this work just to set up the viewport and projection matrix to be exactly like it already was. This optimmization saves about 5%.
12. state switches. One of the places where the Torque does not do a great job is the rendering pipeline. There is no management of the graphics states, so if you ever look through a gl log of a Torque rendered scene, you start seeing things like glEnable(GL_BLEND), glDisable(GL_BLEND), glEnable(GL_BLEND). You get the idea. Since most of our shapes are 3space shapes, I added a mode to the scene graph whereby it tracks whether it is in "3space mode" rather than "cannonical mode". This allows us to render all the rocks without switching state, all the trees without switching state, etc. This optimization got rid of about 8% of cpu cycles that were being spent in the state switchin code, but seemed to also create a further boost due to the remaining rendering calls taking less time. Getting this optimization to the community will be difficult. If someone wants to step forward to help out please let me know.
Ok, that's all for now. Look for some of these in resources in the coming weeks.
clark
Technical Director
BraveTree Productions
Recent Blog Posts
| List: | 06/14/05 - Plan for Clark Fagot 04/21/05 - Plan for Clark Fagot 02/06/05 - Plan for Clark Fagot 04/14/04 - Plan for Clark Fagot 10/22/03 - Plan for Clark Fagot 01/06/03 - Plan for Clark Fagot |
|---|
Submit your own resources!| Prairie Games (Jan 06, 2003 at 20:21 GMT) |
-J
| Nicolas Quijano (Jan 06, 2003 at 20:24 GMT) Resource Rating: 5 |
Thanks for all your hard work
| Luc Jordan (Jan 06, 2003 at 20:40 GMT) |
| John Quigley (Jan 06, 2003 at 22:07 GMT) |
How exactly did you measure the performance...did you use the engine's profiler? Also did you use prerecorded demos to get a consistent set of tests? Just curious...really don't know the best methodology for instrumenting this beast.
| Ben Garney (Jan 06, 2003 at 22:59 GMT) |
Edited on Jan 06, 2003 23:02 GMT
| Pat Wilson (Jan 06, 2003 at 23:02 GMT) |
| Ernest (Jan 07, 2003 at 00:41 GMT) |
Keep up the great work, man!
| Clark Fagot (Jan 07, 2003 at 00:57 GMT) |
@Luc - 21-6 will be in town next weekend so we'll be talking to them about getting some of this into Orbz.
@John - I simply used the built in Torque profiler. It's quite excellent. I have it hooked up to a key so I simply hold it down and it profiles till I release the key. Perhaps I can put that up as a resource too. I have used demo recordings in the past. That's a good route. For this, though, I simply layed down a marker and set the tank transform to that marker before recording fps and doing a profile. This worked well because I was mostly concerned about the rendering pipeline (collisions, physics, and everything else is pretty low cpu hit in our game).
| Davis Ray Sickmon, Jr (Jan 07, 2003 at 05:13 GMT) |
| Clark Fagot (Jan 07, 2003 at 05:36 GMT) |
| Davis Ray Sickmon, Jr (Jan 07, 2003 at 06:12 GMT) |
So I'll take the optimization a step further, and kill off the GuiRadarControl I think. ;-)
| Josef Jahn (Jan 07, 2003 at 08:33 GMT) Resource Rating: 5 |
Just one thing though: Be careful with the GL states. Either you go all the way through adding a state system that optimizes state changes, or you leave it as it is. I personally don't think that getting rid of a few "glEnable" and "glDisable"-calls is worth the whole effort. Yes, state changes do cost CPU time. But it really isn't that much. If you mess up the states however, you'll get all kinds of fancy effects and you'll have a hard time to find out what's causing it ;)
| Clark Fagot (Jan 08, 2003 at 00:28 GMT) |
| Mike Nelson (Jan 08, 2003 at 20:19 GMT) |
Can't wait to see how this stuff works in Orbz.
| Jarrod Roberson (Jan 10, 2003 at 05:08 GMT) Resource Rating: 5 |
Could someone elaborate on the "just fix that problem" portion of this optomization???
| James Lupiani (Jan 10, 2003 at 07:05 GMT) |
| Ken Finney (Jan 10, 2003 at 14:42 GMT) |
| Ken Finney (Jan 10, 2003 at 14:43 GMT) |
Edited on Jan 10, 2003 14:44 GMT
| Clark Fagot (Jan 10, 2003 at 15:09 GMT) |
| Ken Finney (Jan 11, 2003 at 02:16 GMT) |
Found 16 instances this time !
| Jarrod Roberson (Jan 13, 2003 at 00:18 GMT) Resource Rating: 5 |
| Clark Fagot (Jan 13, 2003 at 01:17 GMT) |
| Clark Fagot (Jan 17, 2003 at 15:41 GMT) |
| Jeffrey P. Bakke (Feb 24, 2003 at 21:24 GMT) |
What if you were to write a 'wrapper' function around glEnable/glDisable. Call it oglEnable, oglDisable and it internally maintains the current OpenGL state. Inline the functions so that basically they check to see if the state request REALLY changes anything, otherwise if its the same state, return immediately. This won't remove the problem of changing states back and forth but it would remove multiple GL_BLEND (etc) calls in a row.
| Bill Koons (Feb 24, 2005 at 22:05 GMT) |
I'm interested in reducing the terrain size. I assume that there is more involved than simply setting the BlockShift variable to 7 or 6. Can you give me some insight on what all is involved here? A 512 x 512 map is just what I'm looking for!
Thanks!
| Stefan Lundmark (Mar 01, 2005 at 13:32 GMT) |
| Markus Nuebel (Jan 29, 2006 at 18:03 GMT) |
For anyone using this resource I would like to leave a comment about a problem I recently encountered with tip1:
Quote:
1. glFinish. Go into guiCanvas.cc and get rid of the call to glFinish. It simply isn't needed.
After getting rid of glFinish the Direct3D rendering using the opengl2d3d dll no more worked, in cases where you have multiple guis stacked behind each other on your canvas.
Withouth the glFinish() call, the direct3d translation is no more able to produce a correct z-order of your guis.
My stacked guis were displayed in a kind of interleaved way, so that covered guis could be seen through the guis in front.
Bringing back glFinish() solved the problem.
-- Markus
| Stephen Lujan (Apr 05, 2007 at 09:08 GMT) |
this no longer applies apparently...
you can see here
if(!skipRender && !mApplyFilterToChildren)
Parent::renderChildControls(offset, updateRect);
and here
if(mApplyFilterToChildren)
renderChildControls(offset, updateRect);
mApplyFilterToChildren assures it only happens once
You must be a member and be logged in to either append comments or rate this resource.



4.9 out of 5


