Previous Blog Next Blog
Prev/Next Blog
by date

G-Buffer Normals and Trig Lookup Textures

G-Buffer Normals and Trig Lookup Textures
Name:Pat Wilson
Date Posted:Aug 28, 2008
Rating:3.0 out of 5
Public:YES
Comments:YES
RSS Feed:GarageGames Blog feedor Subscribe with .
Profile Page:View profile page for Pat Wilson

Blog post
(Cross posted from personal blog. Hopefully someone has a rockin' idea about this.)

I've switched to using spherical co-ordinates to encode world-space G-buffer normals. This has a lot of advantages to it, especially for 8:8:8:8 G-buffers. You can now store [Theta, Phi, DepthHi, DepthLo] in the 8:8:8:8 target. Bumping the format to 16:16:16:16 not only gives greater precision, but (depending on the depth resolution you need) you can get an extra channel for information storage. (Mmmm virtual texture bitfield?)

There are a few problems with this, though. The first is the atan2 function. On good hardware, this will not be an issue. My GeForce 8800 chews through any shader I throw at it. My Radeon x1300...not so much. Of course it's easy to make things run fast on good hardware. The challenge is making it run decently on lower end hardware. This is how I am encoding/decoding G-buffer normals:

inline float2 cartesianToSpGPU( in float3 normalizedVec )
{
float atanYX = atan2( normalizedVec.y, normalizedVec.x );
float2 ret = float2( atanYX / PI, normalizedVec.z );

return POS_NEG_ENCODE( ret );
}

inline float3 spGPUToCartesian( in float2 spGPUAngles )
{
float2 expSpGPUAngles = POS_NEG_DECODE( spGPUAngles );
float2 scTheta;

sincos( expSpGPUAngles.x * PI, scTheta.x, scTheta.y );
float2 scPhi = float2( sqrt( 1.0 - expSpGPUAngles.y * expSpGPUAngles.y ), expSpGPUAngles.y );

// Renormalization not needed
return float3( scTheta.y * scPhi.x, scTheta.x * scPhi.x, scPhi.y );
}

Storing normal.z instead of acos( normal.z ) saves a decent chunk of encode/decode.

So I decided to try to use a lookup texture instead of calling atan2 to encode the normals. I made a 256x256 A8 texture and filled it with atan2 values. The texture can be seen, to the right. This is the code for generating the texture:

GFXTexHandle *RenderPrePassMgr::getAtan2Texture()
{
if( mAtan2Handle.isNull() )
{
// Create a lookup texture to output a normalized atan2 result
const U32 cLookupTexSz = 256;

mAtan2Handle.set( cLookupTexSz, cLookupTexSz, GFXFormatA8, &GFXLookupTextureProfile, 1 );
GFXLockedRect *atan2Mem = mAtan2Handle.lock();

for( int y = 0; y < cLookupTexSz; y++ )
{
for( int x = 0; x < cLookupTexSz; x++ )
{
F32 xval = ( ( x / F32(cLookupTexSz) ) * 2.0f - 1.0f );
F32 yval = ( ( y / F32(cLookupTexSz) ) * 2.0f - 1.0f );

U8 &outU8 = atan2Mem->bits[y * cLookupTexSz + x];

F32 atanRes = ( atan2( yval, xval ) + M_PI_F ) / M_2PI_F;
U8 u8Res = mFloor( atanRes * 255.0f );
outU8 = u8Res;
}
}

mAtan2Handle.unlock();
}

return &mAtan2Handle;
}

There is a possible discontinuity when V is near 0.5 and U < 0.5. So if normal.y is near 0.0, and normal.x < 0.0 than you get some artifacts that won't occur if you actually call atan2. The A8 format isn't the issue (I don't think) because even if you use the actual function, you are still encoding the result to an 8-bit value. The resolution of the texture could be an issue, but doubling the resolution did not effect the error rate, in my tests.

I haven't quite figured out what to do with this yet. A G-buffer shader is going to be heavy on math, light on texture operations (Well this depends on how you are doing deferred shading. See upcoming ShaderX7!) and doing the atan2 as a texture sample can save a lot of instructions and cycles, depending on the hardware.

Recent Blog Posts
List:08/28/08 - G-Buffer Normals and Trig Lookup Textures
06/17/08 - Multi-Threaded Mesh Skinning Showing Promise
08/14/06 - Torque X and Microsoft
02/11/06 - 10th Most Popular Game on Live
07/13/05 - Development of TSE360
07/03/05 - Where's the beef
06/09/05 - Recovery Time
04/30/05 - FrameTemp template class and COM in Torque

Submit ResourceSubmit your own resources!

Novack   (Aug 28, 2008 at 19:23 GMT)
OT: whats your personal blog address?

Ross Pawley   (Aug 28, 2008 at 21:02 GMT)
@Pat, really interesting idea there. No idea why you're getting those anomalies though.

Pat Wilson   (Aug 28, 2008 at 21:42 GMT)
I found the answer to this after a bunch of mucking around.

inline float2 cartesianToSpGPU( in float3 normalizedVec, in sampler2D atan2Sampler )
{
float atanYXOut = tex2D( atan2Sampler, floor( POS_NEG_ENCODE(normalizedVec.xy ) * 255.0 ) / 255.0 ).a;

float2 ret = float2( atanYXOut, POS_NEG_ENCODE( normalizedVec.z ) );
return ret;
}

The critical bit is:

floor( value * 255.0 ) / 255.0


Pat Wilson   (Aug 28, 2008 at 21:56 GMT)
Also, I am using a sincos() lookup texture to read-back the normal values. This doesn't suffer from the same issues, because the input value is [0, 255] always (8:8:8:8 G-buffer). The sincos HLSL function is much cheaper than atan2, but this is (again) one of those good GPU vs poor GPU issues. Mostly, though, it's for the damn spot light shader. I had to use a sampler to get it down from 66 instructions to 63 instructions, targeting PS2.0.

I'll take a proper GPU any day of the week over a lookup, but this is a pretty slick hack. This also lets you fold range-reduction logic into the results. I don't think I can do any kind of great range reduction on the result of atan2() because I am taking both the sine and cosine of the result.

Orion Elenzil   (Aug 29, 2008 at 01:24 GMT)
Quote:


I made a 256x256 A8 texture and filled it with atan2 values.


awesome.

Pat Wilson   (Aug 29, 2008 at 20:45 GMT)
My personal blog is: http://angrydev.com

It's not updated regularly, and it doesn't have any kind of consistent content.

Novack   (Aug 30, 2008 at 04:49 GMT)
Thanks.
I tought you were angry with me, but was sure you just needed some time to rethink things.
Nice name election, btw.

Hadoken   (Sep 03, 2008 at 13:29 GMT)
Pat, if your atan2Sampler has got POINT filtering, you shouldn't need to do the floor thing, the hardware should do it for you automatically, for free.

I'm interested in knowing more about this technique you're using, in particular what's the advantage of using spherical coords?
I mean, if you stored screen-space normal and did screen-space lighting instead of world-space, two cartesian coordinates would be enough to store a normal (assuming you're culling back-facing triangles), and that would save you all the encoding/decoding.

Apparently these days world-space deferred lighting is somewhat popular though, and I'd like to know why :)

You must be a member and be logged in to either append comments or rate this resource.