Game Development Community

Differences between debug and release builds?

by Ben Galanti · in Torque Game Engine · 12/08/2006 (5:31 am) · 9 replies

I've been trying to compile a game that was developed on a PC for Mac. It's currently using Torque 1.4. there have been no platform specific changes (intentionally) so it hasn't been too bad to do. However, I'm having a lot of trouble with release builds. Parts that run fine with a debug build give a hard crash with release builds.

This is part of the new code, not stock Torque. This game randomly creates the map before each mission. The PC author did most of the legwork of grabbing functions from the terraformer and using them to create terrain, and as I said, it works fine with debug builds.

I've ironed out a lot of it, but this last one is giving me fits:

if (land == 5) {
     U32 size = (U32)TerrainBlock::BlockSize;
     U32 interval1 = 32;
     F32 scale10 = 1.0f / (F32)size * interval1;
     F32 scale11 = (2 * (F32)size) / interval1;

     for (S32 y=0; y<size; y++) {
       for (S32 x=0; x<size; x++) {
	// Con::printf("");	

         F32 vnoise1 = noise.getValue((F32)x * scale10, (F32)y * scale10, (S32)interval1);
         F32 total = vnoise1 * 1 + 20;
         U32 offset = x + (y << TerrainBlock::BlockShift);
         workingMap[offset] = total; 
       }
     }
     ....

This code works fine for a debug build, but crashes with a release build. If I uncomment that Con::printf statement, it works fine with a release build. However, I just can't justify 65000+ blank lines dumped to the log file... Is this some sort of compiler optimization issue? It all seems like pretty benign code... I'm just having trouble debugging without the the debugger and when the good ole' debug-printfs change the code generated. Any ideas? Thanks in advance.

#1
12/08/2006 (5:46 am)
Your problem will probably be caused an array overflow. By adding the console output line you change the code generated, when the array is overwritten it will then (or so it seems) be overwritting something relatively beign (like a local variable or such). When you dont have your console output then it is overwritting something more vital (most likely the stack return address).

Slap some checks around any array access you do in the nearby code (that method, methods calling it and methods it calls are usually enough) and see what comes up.
#2
12/08/2006 (5:50 am)
What is workingMap defined as and is it big enough to hold any offset? You could be trouncing memory there. Check that offset is in range.
#3
12/08/2006 (7:06 am)
Dang, I meant to add the working map definition. I'm not at home, but I'm almost certain it's:

F32 workingMap[TerrainBlock::BlockSize * TerrainBlock::BlockSize];

Array overflow was my first guess, so I added some bounds checking to 'offset' but it didn't make a difference. I dropped some printfs into the beginning of Noise2D::getValue ('noise' is just a Noise2D instance) and it didn't even reach those, which I thought was odd. As I type that though, Paul's post has me thinking I might be chasing a red herring. The stack may be getting stomped on further up in the code, but not showing up until the getValue call... I'll have to look into that.

I really appreciate the quick replys!
#4
12/08/2006 (10:48 am)
One thing I forgot to add earlier, if you get really frustrated with it you can get Compuwares boundchecker suite on 14 day free trial. Might help get it faster, but ofcourse ocne the 14 days are up its gone :)

Good luck!

http://www.compuware.com/products/devpartner/visualc.htm
#5
12/08/2006 (11:04 am)
If the problem manifests on linux, [or even if it doesn't], you could try building on linux and hitting it with Valgrind, and seeing if valgrind has anything to say about the problem.

Gary (-;
#6
12/08/2006 (11:54 am)
Great suggestion Gary - valgrind is a fantastic tool! Wish I had it for other platforms...
#7
12/12/2006 (9:04 pm)
Just a bit of followup. First, I want to say thanks to all who have posted so far. Unfortunately, I'm still struggling with this one. It's certainly some kind of memory smasher, which as Paul surmised is relatively benign with a debug build, but not so with a release build.

I've narrowed the offending line to:
F32 vnoise1 = noise.getValue((F32)x * scale10, (F32)y * scale10, (S32)interval1);

Just before initial chunk of code I posted is called, I initialize 'noise' and set it's seed with:

Noise2D noise;
noise.setSeed(randomSeed);

With randomSeed being a parameter being passed into this method. I'm guessing somewhere in the setSeed method, something is getting stomped. Time to start digging...
#8
01/03/2007 (6:06 pm)
Well, I thought I had tracked this down, but when I upgraded to 1.4.2, it reared it's ugly head again.

At least this time, it crashes with a debug compile, so I have a bit more insight. That being said, I'm still pulling my hair out on this.

Anyway, it's crashing in the exact same consoleFunction as before. However, now it is crashing as it enters the function. The debugger stops at the opening brace of the function call, just 20 bytes in. Looking at the assembly, two registers are being added to find an address to store the contents of one of the original registers (a 'stwux' operation). Trying to store something in this address (always the same one) causes OS X to kill the program.

Here's the consoleFunction definition (I expanded the 'consoleFunction' macro by hand to better see what was going on)
//ConsoleFunction(makeNewTerrain, void, 10, 10, "(string fileName, ...) - makes a new terrain file.")
static void cmakeNewTerrain(SimObject *, S32, const char **);                   
static ConsoleConstructor gmakeNewTerrainobj(NULL,"makeNewTerrain",cmakeNewTerrain,"",10,10);
static void cmakeNewTerrain(SimObject *, S32 argc, const char **argv)
{

In the debugger, argc and argv haven't even received their proper values before the bad memory access attempt. Going back one call in the stack though shows that the proper argc and argv are being sent to this function. Here is the call:
nsEntry->cb.mVoidCallbackFunc(gEvalState.thisObject, callArgc, callArgv);

callArgc has the right number of arguments, and digging through the callArgv arrays shows the proper strings. It's going to the correct consoleFunction, and the simObject is just ignored anyway, so that all looks right.

The PC coder is also getting crashes in this consoleFunction intermittently, but not every time like on the Mac side. Though, that just me be a difference in how the functions are set up across platforms. Any thoughts on what would cause a bad memory access at the opening bracket of a function call?

Thanks for any help on this, I'm at my wits end.

Ben
#9
01/09/2007 (7:38 am)
A final follow up.

After some more digging, I finally figured out the issue. It was a stack overflow. That workingMap array mentioned at top is 256k, plus there was another noiseMap array of the same size. Declaring those arrays with 'new' to get them allocated on the heap instead of the stack cleared up the crashes. When creating a pthread, OS X allocates 512k of the stack to that thread. So, having 512k worth of local arrays plus other local variables, not to mention the other 19 frames below this one, was a bit of a problem.

Does anyone know why most of the Torque functions are spun off from the main thread? It looks like the main thread is pretty much just an event loop. I have to admit I'm still getting up to speed on threading, so there probably is a good reason for it. Should the Torque thread be created with a stack larger then the 512k default for a pthread, maybe? Having 8 Megs of stack (well, minus 512k for each of the other threads) for an event loop seems silly when all the real meat is trying to get crammed into 512k.

Thanks again for the help.

Ben