Game Development Community

dev|Pro Game Development Curriculum

Plan for Phil Carlisle

by Phil Carlisle · 07/20/2004 (10:44 am) · 8 comments

Lets look at a profile I just grabbed. This is from a debug build of my game with 20 AI players (using the blue box guy from Joe M) all in the visible scene.

Note: When profiling, its important to remember that you should ideally have more than a single snapshot of your game (which this is), ideally something which allows you to profile all aspects of the game etc. One idea for this is to record a scene and play that back while capturing profile snapshots over time. Another idea might be to have a scripted sequence (like a benchmark mode), which allows you to dump the information over time but which also allows you to script typical scenes).

So given the profile (which Ive simplified for clarity).

Ordered by stack trace total time -
% Time  % NSTime  Invoke #  Name
100.000   0.005      153   MainLoop 
 87.093   0.018      153     TimeManagerProcess
 87.075   0.017      153       ProcessTimeEvent
 36.443   0.001      153         ServerProcess
 36.442   0.049      153           AdvanceServerTime
 36.393   1.166     2079             AdvanceObjects
 35.227   4.432    41580               Player_ProcessTick
 30.071  18.605    41580                 Player_PhysicsSection
 11.466  11.466    41580                   Player_UpdatePos
 30.665   0.005      153         RenderFrame
 30.579   0.387      153           CanvasRenderControls
 30.192   0.066      153             SceneGraphRender
 21.319   0.104      153               TraverseScene
 21.209   0.611     7956                 SceneStateRenderImage
 16.101   1.813     7038                   ShapeBaseRenderObject
  7.739   0.505     6120                     PlayerRenderPrimary
  5.627   5.627     6120                     PlayerRenderShadow
  4.447   0.003      153                   TerrainRender
  8.163   7.603      153               BuildSceneTree
 18.406   0.075      153         ClientProcess
 18.327   0.743      153           AdvanceClientTime
 17.400   1.054     2079             AdvanceObjects
 16.346   1.253    41580               Player_ProcessTick
 15.093   5.978    35611                 Player_PhysicsSection
  9.115   9.115    35611                   Player_UpdatePos
 12.887  12.882      153     PlatformProcess

Note: In torque, you can dump a profile to the console, which is in two parts. The first is the profile at the current time, order by time in a given routine. The second (and probably more important part for optimisation) is a list of the given profile macro enabled area's including thier own time as a percentage AND thier time which is NOT part of a child.

You will see here above that the profiles are arranged in a hierarchy. Basically what this means is that the uppermost profiles are a summation of thier child profiles PLUS any remainder. (remainder being the second figure %NSTTime).

So lets dive into the profile to see what we can see.

The first thing that grabs me when I look at it, is that the servers player_processTick profile seems extraordinarily high. Even for the 20 players youve got here, 35% of the mainloop is a lot for what is effectively "move here, stop if you hit anything". Thats something definitely worth investigating. If you take a look in there, a lot of this could be tied up with creating the collision objects and processing collisions. So with a simpler game (like mine), it might be a big win to reduce that collision hit by simplifying things down.

Next up is shadow rendering. Ok, 5% of mainloop isnt all that bad. But its a graphical nicety that is completely non-game-required. So there is another possibility, but I'd look at that AFTER looking at the player physics.

Next is BuildSceneTree code. 8% of mainloop seems overly large here. My test scene is pretty damn simple anyway and all objects are within the view frustum, so at a guess I'd say that there is some fairly complex culling code in there, which is fine when you have a complex scene, but isnt that useful when your game is pretty simple visually. So mark that up as "interesting for later".

Another one thats interesting for later is that with my fixed camera view (semi isometric) I can live with a much more simplified map. So doing what Clark mentioned in his optimisation and simplifying the terrain rendering code is another potential saving.

One final thing to note, is that I'm running this as a single player game, so one thing that you should see in there, is that there are TWO lots of player_processTick!, the second one is taking some 16% of a mainloop.

So the conclusion here, is that obviously, there is one area that can CLEARLY give me a big speed boost if I look at a method of reimplementing it. If you sum the two player_processTick profiles (which you will get on a single player game), you'll get some 50% of the mainloop is going into that single profile. Thats a SIGNIFICANT amount, so it seems clear that even minor improvements here will give significant savings in CPU time.

So obviously, my next port of call is to look at the players profile and actually single step the whole code, do some sub-profiling of the whole thing so I can tell EXACTLY what is happening, maybe add more players and see where the sub-profile changes most.

Think of profiling as you would bugfixing (because thats what it is, its fixing a "bug" that is basically your game runs too slowly). How do you fix bugs? you isolate the test cases that reproduce the bug, then you isolate the absolute code that reproduces the bug, then you drill down and find the actual code that causes the bug. So I'm in "isolate the exact code" mode for this player process update.

Next time, I'll let you know how the profiling process helped me drill down and isolate the problem area, how I overcame it and maybe post a more complete "tutorial" on the whole profiling system, along with some updates to make profiling easier.

Until next time.

#1
07/20/2004 (11:03 am)
Hey Phil,

Have you thought about doing a Torque profiling session at this year's IGC (assuming you're going)? Once you've gone through this project I'd think you could talk about: setting it up, what it all means, key Torque specific areas to watch, how it could be improved with game specific examples, etc. With your recent teaching background I think it would be a natural fit.

It certainly would be something I'd be interested in if I make out to the IGC this year.

- LightWave Dave
#2
07/20/2004 (11:55 am)
Yeah, I'll be going. I'd be interesting in taking part in a profiling/optimisation thread too. I'd love to see clark do that one, cus he is the optimisation master :) I'm just a hopeful novice :))

Hope to see you there.

I'll maybe write up a quick sort of tutorial document about my experiments and release some of my profiler tools once theyre more up to date.
#3
07/20/2004 (12:24 pm)
Nice writeup, Phil!
#4
07/20/2004 (12:35 pm)
Nice stuff, would be cool if you write more of these, Phil. Might make a good addition to the official documentation somewhere.

David, not a bad idea for IGC. We'll have to see how everything comes together.
#5
07/20/2004 (1:05 pm)
Excellent explanation and description Phil. Always wanted to know more about this. I agree with Josh. It would be great to have this as part of a HOWTO in the official docs.
#6
07/20/2004 (1:51 pm)
Ok, yeah, I'll have a go at doing something a bit more comprehensive that explains how to do it (as well as how to read the output).
#7
07/20/2004 (5:22 pm)
ok that's well and fine. But what about *me*. What about *my* filthy needs eh?
#8
07/26/2004 (4:45 am)
I haven't dug into it much but from HEAD player.cc@2429 we've got a couple bits of code that are clearly repeats.... why cant' the early out list populate those other ones and make the decision there instead of walking the list 2x? Or am I just missing something here.... it'd seem to me to be faster to remove all that, and do a check on those lists (isEmpty) after walking the list only once? right? i don't know i haven't tried it nor looked at it much.
// Build list from convex states here...
         CollisionWorkingList& rList = mConvex.getWorkingList();
         CollisionWorkingList* pList = rList.wLink.mNext;
         while (pList != &rList) {
            Convex* pConvex = pList->mConvex;
            if (pConvex->getObject()->getTypeMask() & sCollisionMoveMask) {
               Box3F convexBox = pConvex->getBoundingBox();
               if (wBox.isOverlapped(convexBox))
               {
                  // No need to seperate out the physical zones here, we want those
                  //  to cause a fallthrough as well...
                  pConvex->getPolyList(&eaPolyList);
               }
            }
            pList = pList->wLink.mNext;
         }

         if (eaPolyList.isEmpty())
         {
            totalMotion += (end - start).len();
            start = end;
            break;
         }
      }

      collisionMatrix.setColumn(3, start);
      sBoxPolyhedron.buildBox(collisionMatrix, mScaledBox);

      // Setup the bounding box for the extrudedPolyList
      Box3F plistBox = mScaledBox;
      collisionMatrix.mul(plistBox);
      Point3F oldMin = plistBox.min;
      Point3F oldMax = plistBox.max;
      plistBox.min.setMin(oldMin + (mVelocity * time) - Point3F(0.1, 0.1, 0.1));
      plistBox.max.setMax(oldMax + (mVelocity * time) + Point3F(0.1, 0.1, 0.1));

      // Build extruded polyList...
      VectorF vector = end - start;
      sExtrudedPolyList.extrude(sBoxPolyhedron,vector);
      sExtrudedPolyList.setVelocity(mVelocity);
      sExtrudedPolyList.setCollisionList(&collisionList);

      sPhysZonePolyList.extrude(sBoxPolyhedron,vector);
      sPhysZonePolyList.setVelocity(mVelocity);
      sPhysZonePolyList.setCollisionList(&physZoneCollisionList);

      // Build list from convex states here...
      CollisionWorkingList& rList = mConvex.getWorkingList();
      CollisionWorkingList* pList = rList.wLink.mNext;
      while (pList != &rList) {
         Convex* pConvex = pList->mConvex;
         if (pConvex->getObject()->getTypeMask() & sCollisionMoveMask) {
            Box3F convexBox = pConvex->getBoundingBox();
            if (plistBox.isOverlapped(convexBox))
            {
               if (pConvex->getObject()->getTypeMask() & PhysicalZoneObjectType)
                  pConvex->getPolyList(&sPhysZonePolyList);
               else
                  pConvex->getPolyList(&sExtrudedPolyList);
            }
         }
         pList = pList->wLink.mNext;
      }