Game Development Community

Shading/Normal Problems Under Linux

by Todd "zaz" Koeckeritz · in Torque Game Engine · 08/02/2006 (5:54 pm) · 11 replies

I was going to post a bug report for this, but before I do, I'm going to see if someone else has any bright ideas to help me track this problem down.

I was trying to verify correct normal exportation from blender when I first noticed the problem and wasted a lot of time before I finally realized I was dealing with a linux TGE problem. Below you'll find some screenshots of the problem. For testing, I moved the player.png file out of the data/shapes/player directory so it wouldn't be found. I did this to emphasize the shading issue. I'm running my linux and windoze binaries from the same directories (shared via samba).

My linux binaries are:
1.3) The binary as installed by TorqueGameEngineSDK-1.3.bin (downloaded from GG)
1.4) My build with GCC4 under SuSE 10.0
1.4.2) The binary as installed by TorqueGameEngineSDK-1.4.2-RC1.bin (downloaded from GG)

Most linux testing was done under SuSE 10.0 with an NVIDIA 5900 Ultra FX. I've also tested on my notebook running ubuntu 5.10 and the more recent on 6.06(?) with an NVIDIA Go 6600 chip. Drivers under linux I've tested with include the last 3 major releases, at least, from NVIDIA.

My windoze binary:
1.4) Built via msys/mingw by me

Here's a picture of the orc in the "show" mod:
www.visi.com/~zaz/screenshot_010-00001.png
Here's the exact same picture of the orc under windoze:
www.visi.com/~zaz/screenshot_011-00001.png
Notice that there is no shading under linux for the orc. If I use the "Lighting" panel to force the lighting direction under windoze, I get:
www.visi.com/~zaz/screenshot_011-00002.png
Under linux, there is no change when using the lighting panel, i.e. the model remains unshaded all the time.

Using the linux TGE 1.3 binaries, I get the same results as I do under windoze, i.e. I get good results. I'm unable to build 1.3.1 atm easily, so I haven't tested with that. However, diffs between the platformX86UNIX directories under 1.3 and 1.4 don't show any obvious changes that would be causing something like this.

I've also tested this problem with Bravetree's Jill model. The strange thing here is that the hair shades properly under all conditions I've tested. The remainder of the model gives similar results as the orc and my other blender exported models.

Possibly related, but likely a separate issue, when I test the models under TLK 1.4 under linux, I get shading, however the light parts of the models are on the opposite side from what they should be.

I've been poking and prodding around platformX86UNIX comparing it to platforWin32 and other than noticing that there's a lot of similar code that should maybe be refactored so its exactly the same, I don't see anything that I can pin this type of behavior on.

Any suggestions ?

#1
08/03/2006 (1:20 am)
Todd,

I tested it on linux ubuntu dapper with TGE 1.4.2

i got the folowing result : eviwo.free.fr/pictures/linux-blender.jpg
#2
08/03/2006 (1:26 am)
Todd,

my request is not related to your issue , i have read in your post that you test exportation from blender : do you test it with "orc" ?

I you have "orc" in blender format I am very interesting to get a copy for my own use.
#3
08/03/2006 (1:30 pm)
Thanks for the data point Philippe, you are seeing what I'd like to see.

I went back to my notebook to validate my results and I'm still not getting shading under ubuntu dapper there. I tested both my 1.4 build and the "official" 1.4.2 build. Man, if it was just my SuSE box that had the problem, I'd write it off to bad karma or something, but seeing it under two different distros with different nvidia chips sets and across 3 or more driver versions, I'd be very surprised if there isn't something that needs a little fixing up here.

What video chip and drivers are you using ?

Sorry, no blender orc here. I gave up on getting a non-max format of the orc long ago. That's one reason I bought Bravetree's Jill and Car content packs. Jill comes with a .3ds format, which can be loaded by blender, although I haven't tried yet. The car content pack actually comes with .blend files which I've looked at, but haven't tried to re-export yet (I don't think its setup for that, but at least its easy to look at the modeling).
#4
08/04/2006 (12:44 am)
Todd,

video chip :GeForce4 MX 420]

drivers :
nvidia-glx, NVIDIA binary XFree86 4.x/X.Org driver, version : 1.0.8762+2.6.15.11-3
nvidia-kernel-common, NVIDIA binary kernel module common files, version : 20051028+1

For you information the 3DS export to blender does not work with blender 2.41. I tested it with 2.42a and it seems not fixed at this moment . I opened a bug for this issue few months ago : launchpad.net/distros/ubuntu/+source/blender/+bug/39199
#5
08/11/2006 (6:47 pm)
Sorry, I've been out of town helping my dad transition to his apartment from a hospital stay.

Very strange that your configuration works and both of mine don't. I think I have an old GeForce2 or GeForce4 laying around I could try to verify here, but I'd like to get to the bottom of this. Hmm, maybe find some room on my windoze only box for a linux install or maybe do a live eval on it.

Just for kicks, I tried importing the .3ds model bravetree sent with the Jill content pack. It imported fine into blender for me. This worked with a stock 2.42a and scripts from a month ago or so copied from a CVS pull into my ~/.blender/scripts directory.
#6
10/06/2006 (11:41 am)
Ok, I've finally had time to track down the source of this problem. It has to do with a change to the renormalization of normals that happened between 1.3 and 1.4.

The problem lies in engine/ts/tsMesh.cc in method TSSkinMesh::updateSkin(). Towards the bottom of that method, the normals for the mesh are renormalized in this loop:
// normalize normals...
   for (i=0; i<norms.size(); i++)
   {
      // gotta do a check now since shared verts between meshes
      // may result in an unused vert in the list...
      Point3F & n = norms[i];
      F32 len2 = mDot(n,n);
      if (len2>0.01f)
         norms[i] *= InvSqrt_Lomont(len2); // 1.0f/mSqrt(len2);
   }

The old 1.3 code for this loop was:
// normalize normals...
   for (i=0; i<norms.size(); i++)
   {
      // gotta do a check now since shared verts between meshes
      // may result in an unused vert in the list...
      Point3F & n = norms[i];
      F32 len2 = mDot(n,n);
      if (len2>0.01f)
	 [b]norms[i] *= 1.0f/mSqrt(len2);[/b]
   }

Restoring norms[i] *= 1.0f/mSqrt(len2); inside the loop eliminates the normal problems for me and I get the expected rendering result.

I've verified the code for the InvSqrt_Lomont() function from several places and it looks good (very interesting gem of code too). For reference here is the function:
F32 InvSqrt_Lomont(F32 x)
{
   F32 xhalf = 0.5f*x;
   S32 i = *(S32*)&x;
   i = 0x5f375a84 - (i>>1); // hidden initial guess, fast
   x = *(F32*)&i;
   x = x*(1.5f-xhalf*x*x);
   return x;
}

With all that bit shifting and other bit magic going on in there, I'm wondering if we're running into a potential bug or compatibility problem with my version of gcc (4.0.2) or something. I think my compiler on my laptop is more current (ubuntu) than my desktop (SuSE 10.0), so I'll try testing a fresh build there instead of just using the desktop built binaries on it as I normally do.
#7
10/06/2006 (1:43 pm)
Ok, we're dealing with an aliasing problem here. I made a test program (x.c):
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

typedef float           F32;
typedef signed int      S32;

F32 InvSqrt_Lomont(F32 x)
{
   F32 xhalf = 0.5f*x;
   S32 i = *(S32*)&x;
   i = 0x5f375a84 - (i>>1); // hidden initial guess, fast
   x = *(F32*)&i;
   x = x*(1.5f-xhalf*x*x);
   return x;
}

float vals [] = { 1.0, 2.0, 9.0, 100.0,  37.25 };

int
main (int argc, char *argv [])
{
   int          i;

   for (i = 0; i < sizeof (vals) / sizeof (vals [0]); ++i)
   {
      printf ("%8.03f:  sqrt: %8.04f  lomont: %8.04f\n",
              vals [i], 1.0 / sqrt (vals [i]), InvSqrt_Lomont (vals [i]));
   }

   return (0);
}

Compiling and running it with gcc -O2 -Wstrict-aliasing=2 -o x x.c -lm with both SuSE 10.0's gcc4.0.2 and ubuntu's gcc4.0.3 gives this result:
torque-HEAD:4132% gcc -O2 -Wstrict-aliasing=2  -o x x.c -lm
x.c: In function 'InvSqrt_Lomont':
x.c:20: warning: dereferencing type-punned pointer will break strict-aliasing rules
x.c:22: warning: dereferencing type-punned pointer will break strict-aliasing rules
torque-HEAD:4133% ./x
   1.000:  sqrt:   1.0000  lomont:   0.0000
   2.000:  sqrt:   0.7071  lomont:   0.0000
   9.000:  sqrt:   0.3333  lomont: -123118990917576963365467045202415434833339752151121397586526158372995072.0000
 100.000:  sqrt:   0.1000  lomont: -157538121565762736865110309344186948648960.0000
  37.250:  sqrt:   0.1638  lomont:  -0.0000

Turning off strict aliasing and running it gives good results:
torque-HEAD:4134% gcc -O2 -Wstrict-aliasing=2 -fno-strict-aliasing  -o x x.c -lm
torque-HEAD:4135% ./x
   1.000:  sqrt:   1.0000  lomont:   0.9983
   2.000:  sqrt:   0.7071  lomont:   0.7069
   9.000:  sqrt:   0.3333  lomont:   0.3330
 100.000:  sqrt:   0.1000  lomont:   0.0998
  37.250:  sqrt:   0.1638  lomont:   0.1636


I made a few changes to the test program to try to eliminate the problem by allowing aliasing on the variabels in InvSqrt_Lomont(). It compiles and runs fine in my test program:
typedef float           F32;
typedef signed int      S32;
[b]
#define TORQUE_COMPILER_GCC     1
#if defined(TORQUE_COMPILER_GCC) && (__GNUC__ == 4)
typedef F32 __attribute__((__may_alias__)) F32_a;
typedef S32 __attribute__((__may_alias__)) S32_a;
#else
typedef F32     F32_a;
typedef S32     S32_a;
#endif
[/b]
F32 InvSqrt_Lomont([b]F32_a[/b] x)
{
   [b]F32_a[/b] xhalf = 0.5f*x;
   [b]S32_a[/b] i = *([b]S32_a[/b]*)&x;
   i = 0x5f375a84 - (i>>1); // hidden initial guess, fast
   x = *([b]F32_a[/b]*)&i;
   x = x*(1.5f-xhalf*x*x);
   return x;
}

The above changes give the same results as compiling with -fno-strict-aliasing in my test program. However, both gcc 4.0.2 and 4.0.3 abort compilation when I apply the above changes to engine/ts/tsMesh.cc. Optionally, strict aliasing can be turned off completely for the entire build by adding -fno-strict-aliasing to CLAGS.RELEASE in mk/conf.UNIX.mk and doing a make clean; make.

I'm not sure what the correct direction to go with this at the moment. There definitely is an aliasing problem in InvSqrt_Lomont() and I'm not sure why it isn't affecting more optimizing compilers. Also, I'm not sure how much of a performance impact there is from turning off strict aliasing over the entire build. It'd be nice if there was a way to just turn strict aliasing off for the engine/ts/tsMesh.cc file or even just the InvSqrt_Lomont() function, but I haven't found a generic gcc way of doing that yet.
#8
10/06/2006 (3:12 pm)
The sad thing is that, on modern CPUs, the "savings" from that function just aren't that big -- in fact, it may save nothing at all. Just use a call to sqrtf() and be happy. Especially when allowing inline intrinsics, it may be as fast or faster than all that type casting and storing through memory.
#9
10/06/2006 (8:58 pm)
Yeah, hard to say. When I was investigating the validity/correctness of InvSqrt_Lomont(), a few google searches found this paper by Chris Lomont written in Feb 2003. In the paper, its claimed that using InvSqrt_Lomont() is 4 times faster than 1.0/sqrt(x). I didn't do any performance analysis at all myself though.

Since sqrt()'s and divides are pretty expensive operations, I tend to believe that there is a savings in using InvSqrt_Lomont() over 1.0/sqrt(x). However, the savings may or may not be noticeable at all given all the other work being done to render a frame.
#10
10/07/2006 (11:24 am)
Quote:Since sqrt()'s and divides are pretty expensive operations

Writing to memory is even more expensive! The thing that saves this function is the fact that the stack is usually in cache already.

If you have SSE, using the inverse-square-root function (all in one function :-) is a lot faster. You could implement it with MOVSS to load a single float, run the inverse square root, and then move the result back, and it would still be faster -- no actual SIMD required.
#11
07/05/2008 (11:25 am)
This still fixes stuff

i used the -fno-strict-aliasing arg and it fixed my issue

im not sure why this whole thread is .. a bit over my head and confusing

but the -fno-strict-aliasing part alone totally fixed my lighting issues on 4.2.3