Game Development Community

dev|Pro Game Development Curriculum

Fixing the Zombie Shuffle

by Dusty Monk · 11/15/2007 (12:29 pm) · 12 comments

So I haven't gotten any real work done in the past few weeks (stupid Hellgate!), and I'm headed out of town for the holidays for another week or so, so I thought I might assuage my guilt by posting up a bug fix I discovered a few months ago. This one cost me a weekend. I looked around the torque forums and resources, but didn't find anyone else that had posted a fix for this, so I thought I'd share. Hope it's useful!

Let's do it all official like.

Bug
At slow speeds, the move animations break for AI characters. That is, below a certain threshold, they no longer play their animations smoothly, and instead start to stutter and slide.

Repro
Starting with a stock TGE 1.5.2 install, load up the starter.fps module. Load Stronghold, and run over to where Kork is racing around his little circuit. Watch his animation. Notice it's playing quite nicely.

Load up example\starter.fps\server\scripts\player.cs, scroll down to the player datablock, and reduce the player's default run speed to 1.24 (approximate walking speed):

runForce = 48 * 90;
   runEnergyDrain = 0;
   minRunEnergy = 0;
   //maxForwardSpeed = 14;
   //maxBackwardSpeed = 13
   //maxSideSpeed = 13;

   maxForwardSpeed = 1.24;
   maxBackwardSpeed = 1.24;
   maxSideSpeed = 1.24;

Load starter.fps, and "run" over to watch Kork. Now, you'll notice that you are also moving at an ungodly slow pace. But just be patient, move on over to where you can see Kork, and watch him. Notice how his animations are stopping and stuttering, and sliding. This is even more noticeable on a "real" client, but should be plenty visible on the standalone version too. Notice however, that the animations for your player still seem to be fine. They are scaled down to slow (as they should be), but play smoothly throughout their entire cycle. If you add another player to the mix, and observe him, you'll see that his animations play fine too. It's only animations for AI players that are broken.

The Fix
Edit - So thanks to some diligence on behalf of some of the other community members, we got down to what we think is the real culprit here. While my original fix does in fact fix the problem, and doesn't (to my knowledge) break anything, it's not really fixing the bug.

Here's the right fix. Bring up aiPlayer.cc, and find the getAIMove function. At the very bottom, before you return, add the following statement:
.
   .
   // Add the following statement
   // Clamp the move for network traffic..
   movePtr->clamp();

   return true;
}

The original "fix" I leave here strictly for posterity's sake, but it is not necessary, and in fact if you've made this change I'd recommend rolling it back. I try to muck with the engine no more than absolutely necessary.


No you don't have to wade through my blog to get to the good stuff. Here's what I did to fix it. Load up player.cc, and move down to the processTick function. Find the part where they are about to perform the physics updates. Change the conditional as follows:
PROFILE_START(Player_PhysicsSection);
      //if(isServerObject() || (didRenderLastRender() || getControllingClient()))
      if(isServerObject() /*|| (didRenderLastRender() */ || getControllingClient())
      {
         updateWorkingCollisionSet();

         updateState();
         updateMove(move);
         updateLookAnimation();
         updateDeathOffsets();
         updatePos();
      }
      PROFILE_END();

Basically, remove "didRenderLastRender" form the conditional. Compile this, run (in slow motion) back over to Kork, and watch him. Hey!! Whatdya know! If everything went well, he should be playing his entire animation sequence, if ever so slow. Enjoy!


What's Going On Here??
Okay so this is where it gets good, imho. Here I get into the down and dirty of what I figured out over almost an entire weekend of debugging. From this point forward, we're getting down into the trenches.

So I stumbled across this bug while implementing walking for my mutants on patrol. I had a walk animation, set up a "mWalking" state on aiPlayer, and get all that working correctly. But I noticed that while walking, the mutants would start stuttering and sliding. Spent way to long just verifying that I hadn't screwed things up in implementing my walking stuff before it occured to me to check for this bug in stock 1.5.2. Live and learn!

Started with watching the animation. Closely. It appeared the sequence was getting reset prematurely, or gettin stuck on a certain frame. Debugging animation problems can be tricky, cause you can't *just* use breakpoints. The bugs occur over time, so you have to use lots of logging, and observe values over time.

player::updateAnimation seemed liked a good place to start. If you put a breakpoint here, you quickly figure out there are four players being updated -- the player on the server, the player on the client, the aiPlayer on the server, and the aiPlayer on the client. Here are their ID's. All things being equal, they are probably the same ID's for you.
1681 - aiPlayer on Server
1811 - Player on server
1812 - Player on client
1816 - aiPlayer on client
I had to do some tom jiggery with making some members of tsThread public so I could access them while logging, but hey it was all in the name of science. I added a log statement in updateAnimation to see how the aiPlayer on the client's animation was progressing. Here's what I saw..
For Unit ID: 1816.. animPos == 0.289882
For Unit ID: 1816.. animPos == 0.398241
For Unit ID: 1816.. animPos == 0.432000
For Unit ID: 1816.. animPos == 0.864000
For Unit ID: 1816.. animPos == 0.239268
For Unit ID: 1816.. animPos == 0.316829
For Unit ID: 1816.. animPos == 0.098171		BOOM! Reset!
For Unit ID: 1816.. animPos == 0.098171
For Unit ID: 1816.. animPos == 0.198194
For Unit ID: 1816.. animPos == 0.296234
For Unit ID: 1816.. animPos == 0.424000
Hot damn! Sure enough, it would progress through the cycle, then periodically just reset back to zero. WTF! So.. more code digging.. more log statements, and I figured out that animations will get reset if you change the current action animation. So I popped up to player::updateActionThread, which is responsible for determining which action animation to use, at any given time. Another log statement where we call pickActionAnimation revealed the following:
for unit 1816, current action animation is: 1
for unit 1816, current action animation is: 1
for unit 1816, current action animation is: 0
for unit 1816, current action animation is: 1
for unit 1816, current action animation is: 0
for unit 1816, current action animation is: 1
keyboard0 input device unacquired.
for unit 1816, current action animation is: 0
for unit 1816, current action animation is: 0
for unit 1816, current action animation is: 1
for unit 1816, current action animation is: 0
for unit 1816, current action animation is: 0
for unit 1816, current action animation is: 1
for unit 1816, current action animation is: 1
for unit 1816, current action animation is: 0
for unit 1816, current action animation is: 1
for unit 1816, current action animation is: 1
for unit 1816, current action animation is: 1
for unit 1816, current action animation is: 1
for unit 1816, current action animation is: 1
Holy crap! My animation's bouncing between root and run all over the place! Was it getting fucked up on the server? I changed the log statement and watched..
for unit 1681, current action animation is: 1
for unit 1681, current action animation is: 1
for unit 1681, current action animation is: 1
for unit 1681, current action animation is: 1
for unit 1681, current action animation is: 1
for unit 1681, current action animation is: 1
Nope! The server was keeping the animation exactly what it was supposed to be, but for some reason it was getting fubared on the client.

So. Who changes the animation? Well, player::pickActionAnimation does, and it does so based on your current world velocity. Now.. our velocity is set in aiPlayer, and is a constant.. so it shouldn't be fluctuating at all in magnitude -- only in direction, and only then when we turn. More logging down in the heart of pickActionAnimation..
for player 1816, translated vel is: (0.054765, 1.212516, -0.110260)
for player 1816, translated vel is: (0.054765, 1.212516, -0.110260)
for player 1816, translated vel is: (0.054765, 1.212516, -0.110260)
for player 1816, translated vel is: (0.000000, 0.000000, 0.000000)
for player 1816, translated vel is: (0.000000, 0.000000, 0.000000)
for player 1816, translated vel is: (0.054765, 1.212516, -0.110260)
for player 1816, translated vel is: (0.054765, 1.212516, -0.110260)
Boggle! See that? Right smack in the middle velocity's gettin zeroed out! So.. moving up the chain now.. we're getting close I can smell it. Turns out, velocity is calculated in player updateMove. updateMove is called in player::processTick. Now.. processTick is called both on the server and the client, and in both cases, for an aiPlayer object, it is called by the following piece of code at the bottom of gameProccess::advanceObjects..
if (obj->mProcessTick)
         obj->processTick(0);
This means, in all cases, on server and client, aiPlayer's processTick function is called with a null move object. On the server though, this is okay. This is exactly what we want, in fact, as its a marker to tell the aiPlayer to get his move!
Move aiMove;
   if (!move && isServerObject() && getAIMove(&aiMove))
      move = &aiMove;
So every update the aiPlayer calculates a valid move structure, and that gets used down at the bottom of processTick when we updateMove. But on the client.. ho ho! The client.. our move structure is never updated. Soo.. at the bottom of the processTick, when we updateMove, we do so with an invalid move structure every time! So updateMove with a move of ZERO causes us to degrade our velocity. Mind you.. we dont just stop.. cause our new velocity is zero, but this is applied against our current velocity using acceleration, so we just slow down slightly. Guess what! We slow down enough to cause the pickActionAnimtion function to switch from running anim to root!

But.. every time we get a new update from the server, which is quite often, the server replicates the RIGHT velocity down to us for the aiPlayer, so it gets set back up. So.. the server is blasting us the aiPlayer's velocity.. but in between updates, we're slowing down because we're recalculating our move using an invalid move structure. I know right??

So.. down below, in processTick.. the only reason that the aiPlayer is calling updateMove with an invalid move structure is that we have this check for "didRenderLastRender" in there, which allows all the functionality to be called for aiPlayers, which are really not real clients at all. So I removed that check, and now aiPlayers clientside get their movement only from replication and everything seems to be much happier.

Why doesn't this happen for other player's I see?
Because players on the client side actually have valid move structures calculated for them. So you're not constantly using an invalid structure, which degrades the velocity.

Why doesn't this happen all the time?
Because at higher speeds, the velocity degradation between movement updates isn't enough to cause the animation to switch. The degradation is still going on, but you never reset the animation, and your velocity gets reset every time the server sends an update.

Um.. don't you think that 'didRenderLastRender' check was there for a reason?
Yes. I do. Absolutely. Honestly though, I can't figure out why. This is the only place in the engine this function call is used, and if you look at it, it basically is checking to see if this update is an update we rendered in -- cause obviously render frequency and update frequency are not linked. So.. I'm guessing it was an attempt to make sure we only updated physics on the client immediately after we performed a render, to get tighter coupling between the sim and the display, but honestly that is only speculation. Near as I can tell, there's *no* reason to call those update functions if you're not either the server or the controlling client for this player. I can say I've had this fix in for about 3 months now, running regularely, and I haven't seen anything broken because of it yet. Perhaps a GG programmer might shed some light. But as always with any bug fix -- Caveat Emptor.

Okay.. that's all I got. Hope this helps, and have a safe holiday. See you later after I come down from the triptophan high!

Dev out!

About the author

Dusty Monk is founder and president of Windstorm Studios, an independant game studio. Formerly a sr. programmer at Ensemble Studios, Dusty has worked on AAA titles such as Age of Empires II & III, and Halo Wars.


#1
11/15/2007 (1:09 pm)
Nice research, and thanks for taking the time to write it all up!
#2
11/15/2007 (3:09 pm)
Wow! What a fix!

I almost got to debug this one myself a couple of months back. I'm glad evolving project needs pulled me away from the AI before I got a chance to dig into it. This will save me at least an entire weekend! Thank you!
#3
11/15/2007 (3:29 pm)
great post.
#4
11/15/2007 (4:01 pm)
Looking a little deeper into this I think there is more to it.

Doesn't the block directly above the indicated fix prevent the scenario you describe?
// If there is no move, the player is either an
      // unattached player on the server, or a player's
      // client ghost.
      if (!move) {
         if (isGhost()) {
            // If we haven't run out of prediction time,
            // predict using the last known move.
            if (mPredictionCount-- <= 0)
            {
               PROFILE_END();
               return;
            }
            move = &delta.move;
         }
         else
            move = &NullMove;

If there is no move and we are a ghost then use the last move sent as the next move. This is prediction if there is no update.

Also, the mLastRenderFrame variable is being set in the unpackUpdate method (the same place that the delta.move is updated to that of the server). This should be causing the update to happen when new information arrive. I still don't quite see how the velocity is getting reset but as a (so far) completely usable fix this is a wonderful post.
#5
11/15/2007 (4:56 pm)
@Brian - Yap.. you're right. That is, the invalid move still occurs, but it doesn't occur exactly in the way I describe because of that code. I had forgotten when I was reviewing how I had made this bug fix to dig into that little part of the code path.

So what happens, is occasionaly, we do run out of prediction count, and we early out there. But in all other cases, move is set to delta.move.

Now, for the aiPlayer on the client, we unpack delta.move in unpackUpdate. BUT, curiously, if you look in gameConnectionMoves.cc, you'll see that we never pack/unpack the .x, .y, and .z components of a move structure. Presumably, becuase they're always supposed to be calculated.

So the .x, .y, and .z components for delta.move are always zero.

So we fall down into updateMove in processTick, with a move component that has zero for .x, .y, and .z. And the first thing that updateMove does is re-assign delta.move to the move that was passed in. If you go further down into updateMove, you can see where we set moveSpeed based on those components..
if (move->y > 0.0f)
      {
         if( mWaterCoverage >= 0.9f )
            moveSpeed = getMax(mDataBlock->maxUnderwaterForwardSpeed * move->y,
                               mDataBlock->maxUnderwaterSideSpeed * mFabs(move->x));
         else
            moveSpeed = getMax(mDataBlock->maxForwardSpeed * move->y,
                               mDataBlock->maxSideSpeed * mFabs(move->x));
      }
      else
      {
         if( mWaterCoverage >= 0.9f )
            moveSpeed = getMax(mDataBlock->maxUnderwaterBackwardSpeed * mFabs(move->y),
                               mDataBlock->maxUnderwaterSideSpeed * mFabs(move->x));
         else
            moveSpeed = getMax(mDataBlock->maxBackwardSpeed * mFabs(move->y),
                               mDataBlock->maxSideSpeed * mFabs(move->x));
      }
Later on.. moveSpeed is moved into acceleration, which is in turn used to adjust mVelocity, which.. in turn.. is used to adjust our animation.

Least.. it seems that way. :)
#6
11/16/2007 (4:55 am)
This isn't quite true either.

It is true that the x, y and z values never get packed/unpacked. But maybe you are overlooking the px, py and pz members which are the transmitted, packed versions of the x, y and z components.

As soon as px, py and pz are transmitted they are unclamped into the x, y and z values.

Now, it may be that rounding errors in the clamp/unclamp process are causing the troubles.... I don't know.
#7
11/16/2007 (6:57 am)
@Brian - Good catch. I actually had missed the clamp/unclamp routines. I wondered why those values were not transmitted -- I just assumed they were supposed to be calculated somewhere clientside after coming in.

SO.. yeah, its not errors in rounding at all. It's just that the only place that clamps the move value, serverside, is in gameConnection::getNextMove -- which we don't do for aiplayers. DOH! So I restored the conditional at the bottom of processTick to its original state, and over in aiPlayer.cc, I added the clamp to the bottom of the getAIMove function:
.
   .
   // Clamp the move for network traffic..
   movePtr->clamp();

   return true;
}

Turned it on, and yap.. this fixes the problem. I'm going to update my blog to make this change instead. My fix addressed the symptoms, but isnt' the right fix. Clearly the thing that was broken here was that the move values were never getting clamped before coming across.

I think the *safest* thing to do would be probably to put the clamp call right at the top of the pack call, so you were ensured of never sending across a value that hadn't been packed. I may end up doing that anyway. But this more closely mirrored what was already being done for connection players.

And thanks for taking the time to dig through this stuff. This is why putting this stuff out to the community is strictly good, (tm), imho.
#8
11/16/2007 (8:43 am)
I don't have much to add here (haven't looked at the animation system much yet), but I wanted to say this is a great post. I think people sharing their approaches to debugging and problem solving is a great thing!
#9
11/16/2007 (12:32 pm)
Great post. Thank for sharing this.
#10
11/16/2007 (4:33 pm)
@Devon - Excellent :) I am sooo happy we (mostly you!) figured this out. Thank you again!

Edit: I just had a thought. Might it be better to instead call clamp() from Player::processTick() instead of within each derivative?

So I implemented this fix thusly (in Player::processTick()):
if (!move && isServerObject() && getAIMove(&aiMove)) {
      aiMove->clamp();
      move = &aiMove;
   }
#11
11/17/2007 (6:42 am)
I'm not a programmer, but that was a great read and explanation of what was going on. Thanks!
#12
04/23/2009 (5:51 am)
Thanks for the post, I wish I had read it before I found out the same problem and solution ;) After hard time debugging the animation with velocity, I got the idea to disable didRenderLastRender and volia it works like it should. Because I wasn't sure it breaks something, I found your post and can sleep well without didRenderLastRender :=)