A better Interior::setupActivePolyList()
by asmaloney (Andy) · in Torque Game Engine · 01/14/2007 (3:56 pm) · 41 replies
In my continuing quest for a Stronghold which runs at a reasonable speed on my PowerBook G4, I've improved Interior::setupActivePolyList() which was near the top of my profile as a heavy function. The changes should be useful for everyone so I'm posting them here instead of in the Mac forum.
Interior::setupActivePolyList() is a fairly long function. To simplify profiling and to allow the compiler a better chance to optimize at the function level, I added a new function [doFogActive()] to separate out the cases when we have fog to deal with.
The main change though is the way activePoints is handled. This is used to mark points we've already visited so we don't have to do calculations on them again. In the original version, it was an array of U8s and was allocated as part of the frame using FrameAllocator::alloc(). The two problems with this are (1) activePoints is only used in this function, so there's no need to add it as part of the frame memory and (2) indexing into an array of U8s just for a 0-or-1 comparison is quite expensive because of the alignment of the data. So this version replaces activePoints with a bit vector and checks bits instead.
The initial version accounted for 8% of my run time on my profile journal. These changes reduced it to 6%.
If you try it, please let me know what kind of results you get [I'm interested in how it affects the Windows build too]. If you have suggestions for further improvement - I'm sure there are other things to do here - please post!
Aside: One of the things I see in profiling is that some classes/structs are not aligned. I think this is the cause of a lot of inefficient loads and stores, but some class/structs seem to be sensitive to their data layout, so I'm going to have to look at this more carefully.
-----
In interior/interior.h around line 708 add the function header for doFogActive():
continued...
[Edit: include no longer needed]
Interior::setupActivePolyList() is a fairly long function. To simplify profiling and to allow the compiler a better chance to optimize at the function level, I added a new function [doFogActive()] to separate out the cases when we have fog to deal with.
The main change though is the way activePoints is handled. This is used to mark points we've already visited so we don't have to do calculations on them again. In the original version, it was an array of U8s and was allocated as part of the frame using FrameAllocator::alloc(). The two problems with this are (1) activePoints is only used in this function, so there's no need to add it as part of the frame memory and (2) indexing into an array of U8s just for a 0-or-1 comparison is quite expensive because of the alignment of the data. So this version replaces activePoints with a bit vector and checks bits instead.
The initial version accounted for 8% of my run time on my profile journal. These changes reduced it to 6%.
If you try it, please let me know what kind of results you get [I'm interested in how it affects the Windows build too]. If you have suggestions for further improvement - I'm sure there are other things to do here - please post!
Aside: One of the things I see in profiling is that some classes/structs are not aligned. I think this is the cause of a lot of inefficient loads and stores, but some class/structs seem to be sensitive to their data layout, so I'm going to have to look at this more carefully.
-----
In interior/interior.h around line 708 add the function header for doFogActive():
void traverseZone(const RectD* inRects, const U32 numInputRects, U32 currZone, Vector<U32>& zoneStack);
[b] void doFogActive( bool environmentActive,
SceneState* state,
U32 outputCount, U16* output,
U16* planeSides,
const PlaneF &distPlane,
const F32 distOffset,
const Point3F &worldP,
const Point3F& osZVec,
const F32 worldZ );
[/b]
void setupActivePolyList(ZoneVisDeterminer&, SceneState*,
const Point3F&, const Point3F& rViewVector,
const Point3F&,
const F32 worldz, const Point3F& scale);continued...
[Edit: include no longer needed]
Torque Owner asmaloney (Andy)
Default Studio Name
In the FogCalc constructor:
And in FogCalc::CalcTextured():
return( Point2F( [b] (tex_visibleDistanceMod - (point.x * distPlane.x + point.y * distPlane.y + point.z * distPlane.z)) * tex_invVisibleDistance,[/b] (tex_newWorldZ + point.x * osZVec.x + point.y * osZVec.y + point.z * osZVec.z) * tex_invHeightRange ) );I will fix it above as well.