Need help with performance
by Hewster · in Torque Game Engine Advanced · 09/16/2006 (8:39 am) · 34 replies
Hi,
Have just finished porting a map from TLK1.4 to current TSE ms4
in TLK I get around 104 fps, in TSE I'm seeing just 0.5 to 0.8fps !
Note this is done with un-modified engine code too.
Geomarty is identical, textures are identical, terrain is identical... WTF is going on :(
Material wise in TSE I've simply created basic materials, with no shader passes, like so:
TGE:

TSE:

Anyone got any suggestions as to what is going on ?
Thanks in advance
Hewster
Have just finished porting a map from TLK1.4 to current TSE ms4
in TLK I get around 104 fps, in TSE I'm seeing just 0.5 to 0.8fps !
Note this is done with un-modified engine code too.
Geomarty is identical, textures are identical, terrain is identical... WTF is going on :(
Material wise in TSE I've simply created basic materials, with no shader passes, like so:
new Material(building_a)
{
baseTex[0] = "building_a";
};TGE:

TSE:

Anyone got any suggestions as to what is going on ?
Thanks in advance
Hewster
#22
09/17/2006 (6:01 pm)
There have been problems w/ 128mb ram video cards.
#23
09/17/2006 (9:33 pm)
And this is at the < 1 fps rate?
#24
The charge figures are from my test map with 24 models, which runs at 2 - 2.5 fps
charge figures from my original map (as shown in first post) are:
chargeCount 180, chargePB & chargeVB 19
Strangely enough this original map now is running at 1.8 - 2.1 fps
(this is after a shutdown for needed sleep ! )
Ok, I can trace the problem as far as :
GFXD3DDevice::drawIndexedPrimitive
most probably the D3DAssert call, as there is no time taken in the
mStateDirty check to updateStates()
I'm not a good enough C++ programmer to work out where to profile
further down stream ?
Have uploaded a test game, with 4 different missions, each mission
contains varing numbers of models with unique 512 x 512 textures
and bumpmaps applied.
My results below are taken like so:
1) load up mission, don't move, allow fps to settle, take reading
2) spin through 360, allow fps to settle, take reading
Its interesting to note that on my machine when I do the 10 model test,
after fps have dropped, if i go to or from windowed mode the fps will
go back to the original value (until I spin 360 again).
Is this because some cache is being flushed?
Also, maybe related, when running through the 10 model map, to the other side
(ie no models in view) the fps will stay low (~20 fps) and the only way to get them back
up again to to switch to or from windowed mode. Normally when running out of view of
the models, I will see fps jump back to max (~120 fps)
My results on my machine:
with 10 models I get ~110 fps after spin ~18 fps
With 23 models I get ~3.7 fps after spin ~4 fps
with 30 models I get ~3.5 fps after spin ~3.4 fps
with 39 models I get _2.5 fps after spin ~2.2 fps
You can download my standalone (doesn't need to be in your TSE folder) test app here:
www.the-wildwest.co.uk/forum_stuff/torque/West%20Test.zip
Password is:
torqueischeap
Don't expect anything spectacular, this is bog standard TSE code, and
the models are very roughly textured, and bump mapping / spec is not artistic lol
Would be really interesting to see what results others get.
Hewster
09/18/2006 (12:36 am)
I should have been a little more informative about my tests,The charge figures are from my test map with 24 models, which runs at 2 - 2.5 fps
charge figures from my original map (as shown in first post) are:
chargeCount 180, chargePB & chargeVB 19
Strangely enough this original map now is running at 1.8 - 2.1 fps
(this is after a shutdown for needed sleep ! )
Ok, I can trace the problem as far as :
GFXD3DDevice::drawIndexedPrimitive
most probably the D3DAssert call, as there is no time taken in the
mStateDirty check to updateStates()
I'm not a good enough C++ programmer to work out where to profile
further down stream ?
Have uploaded a test game, with 4 different missions, each mission
contains varing numbers of models with unique 512 x 512 textures
and bumpmaps applied.
My results below are taken like so:
1) load up mission, don't move, allow fps to settle, take reading
2) spin through 360, allow fps to settle, take reading
Its interesting to note that on my machine when I do the 10 model test,
after fps have dropped, if i go to or from windowed mode the fps will
go back to the original value (until I spin 360 again).
Is this because some cache is being flushed?
Also, maybe related, when running through the 10 model map, to the other side
(ie no models in view) the fps will stay low (~20 fps) and the only way to get them back
up again to to switch to or from windowed mode. Normally when running out of view of
the models, I will see fps jump back to max (~120 fps)
My results on my machine:
with 10 models I get ~110 fps after spin ~18 fps
With 23 models I get ~3.7 fps after spin ~4 fps
with 30 models I get ~3.5 fps after spin ~3.4 fps
with 39 models I get _2.5 fps after spin ~2.2 fps
You can download my standalone (doesn't need to be in your TSE folder) test app here:
www.the-wildwest.co.uk/forum_stuff/torque/West%20Test.zip
Password is:
torqueischeap
Don't expect anything spectacular, this is bog standard TSE code, and
the models are very roughly textured, and bump mapping / spec is not artistic lol
Would be really interesting to see what results others get.
Hewster
#25
It looks like the scene is probably GPU limited based on the info you're reporting - even though TSE is submitting the scene almost optimally, the card just can't crunch through it.
You might see a speedup from using an Atlas terrain on this scene, though probably not a big one, if any.
Turning down the settings on the lighting kit ought to make a big difference - disabling DRL or multiple shadowing, for instance.
09/18/2006 (1:08 am)
I'm able to run the 39 model demo at a solid 30-40fps if I make them all back to the same .dif. Having them all unique difs completely eliminates any possibility for batching or efficient rendering.It looks like the scene is probably GPU limited based on the info you're reporting - even though TSE is submitting the scene almost optimally, the card just can't crunch through it.
You might see a speedup from using an Atlas terrain on this scene, though probably not a big one, if any.
Turning down the settings on the lighting kit ought to make a big difference - disabling DRL or multiple shadowing, for instance.
#26
my original post shows identical maps in TLK1.4 and TSEms4, same models,
same textures (no bump or other shader passes in the TSE version)
I created the bank tests to simply find out how many different textures
TSE can render before barfing on my machine (I don't think my machine is slow)
If I swap out the .mis file models to only reference one, ie 1 diff and 1 unique texture,
I see ~120 fps.
I actually see a slight increase in fps when i turn off bloom & DRL !
Also, to prove that the bump mapping & specular passes are making very little
difference to the number, here is a alternate materials.cs with just a texture pass:
www.the-wildwest.co.uk/forum_stuff/torque/materials.cs
This replaces the one in WildWest\data\interiors\ww.town\
Here are my results when using this material.cs:
with 10 models I get ~125 fps after spin ~125 fps
With 23 models I get ~9 fps after spin ~10 fps
with 30 models I get ~3.5 fps after spin ~3.3 fps
with 39 models I get _2.6 fps after spin ~2.9 fps
Are you absolutely positive this kind of performance is to be expected?
I just can't get my head around the fact that identical maps in TGE & TSE
have such a hugh performance gap :(
Hewster
09/18/2006 (1:50 am)
Ben, all due respect, but I think you have missed my point here,my original post shows identical maps in TLK1.4 and TSEms4, same models,
same textures (no bump or other shader passes in the TSE version)
I created the bank tests to simply find out how many different textures
TSE can render before barfing on my machine (I don't think my machine is slow)
If I swap out the .mis file models to only reference one, ie 1 diff and 1 unique texture,
I see ~120 fps.
I actually see a slight increase in fps when i turn off bloom & DRL !
Also, to prove that the bump mapping & specular passes are making very little
difference to the number, here is a alternate materials.cs with just a texture pass:
www.the-wildwest.co.uk/forum_stuff/torque/materials.cs
This replaces the one in WildWest\data\interiors\ww.town\
Here are my results when using this material.cs:
with 10 models I get ~125 fps after spin ~125 fps
With 23 models I get ~9 fps after spin ~10 fps
with 30 models I get ~3.5 fps after spin ~3.3 fps
with 39 models I get _2.6 fps after spin ~2.9 fps
Are you absolutely positive this kind of performance is to be expected?
I just can't get my head around the fact that identical maps in TGE & TSE
have such a hugh performance gap :(
Hewster
#27
If I run in I get a steady 106 fps. I really think you should rethink your way of building your levels here. Are these interiors even portalled?
Anyway. Remember, alot of overhead was added last update - lighting and shadows. This will make a difference, though I do agree it looks a little aggresive with your demo.
09/18/2006 (2:45 am)
I get around 8 fps in the 39 models demo, looking down from the cube.If I run in I get a steady 106 fps. I really think you should rethink your way of building your levels here. Are these interiors even portalled?
Anyway. Remember, alot of overhead was added last update - lighting and shadows. This will make a difference, though I do agree it looks a little aggresive with your demo.
#28
Run Pix and see what your original super slow mission is doing memory-wise.
Also you're not really doing an apples-to-apples comparison here - your test mission is running DRL and Bloom, which aren't available in TLK for TGE, and both come at a heavy rendering cost (multiple down-sampling, complex shaders, large memory footprint, ...). Disabling those, disabling self-shadowing of Fluffy, and using only diffuse textures should give you similar results as with TLK for TGE.
09/18/2006 (8:09 am)
The sudden and extreme drop in performance is consistent with hitting the card's video memory limit. I'm running an ATI 9600 XT 128MB here in my workstation and if I crank up Fluffy's shadows in the TSE demo mission the performance goes from 100fps to <1fps. The video memory limit is like a brick wall.Run Pix and see what your original super slow mission is doing memory-wise.
Also you're not really doing an apples-to-apples comparison here - your test mission is running DRL and Bloom, which aren't available in TLK for TGE, and both come at a heavy rendering cost (multiple down-sampling, complex shaders, large memory footprint, ...). Disabling those, disabling self-shadowing of Fluffy, and using only diffuse textures should give you similar results as with TLK for TGE.
#29
Ok, am going to do that, (although in my previous post I said turning off bloom & DRL showed
an slight increase in fps, I meant to say turning on showed a slight increase !)
I will make a TLK test with multiple dif's, and a TSE with multiple diffs, with absolutly no
extra's (ie bloom ect) , post my results, and downloadable game files for others to test,
also with original .map files before compilation, just in-case its to-do with that.
PLEASE someone show me my results are because of someting I'm doing wrong,
and not that TSE is a poor relation (tex size rendering) compared to TGE / TLK :/
BTW I really do apprciate the help and responces so far, and applaud everyones
efforts in the creation of TSE :)
Hewster
BTW BTW, what is "Pix", how do I "run" that ? is it a metrics command?
09/18/2006 (8:31 am)
Hi John,Ok, am going to do that, (although in my previous post I said turning off bloom & DRL showed
an slight increase in fps, I meant to say turning on showed a slight increase !)
I will make a TLK test with multiple dif's, and a TSE with multiple diffs, with absolutly no
extra's (ie bloom ect) , post my results, and downloadable game files for others to test,
also with original .map files before compilation, just in-case its to-do with that.
PLEASE someone show me my results are because of someting I'm doing wrong,
and not that TSE is a poor relation (tex size rendering) compared to TGE / TLK :/
BTW I really do apprciate the help and responces so far, and applaud everyones
efforts in the creation of TSE :)
Hewster
BTW BTW, what is "Pix", how do I "run" that ? is it a metrics command?
#30
John is right - disabling or scaling down the very heavyweight DRL, bloom, and shadowing effects should help you tremendously. Those things can use a lot of video memory - which could push you over your limit regardless of how efficiently you built your models.
09/18/2006 (4:39 pm)
PIX is the Direct3D performance inspection tool. It might be more detail than you really want, but looking for it on MSDN should get you docs on it. It was originally an XBox tool that they brought over to the PC.John is right - disabling or scaling down the very heavyweight DRL, bloom, and shadowing effects should help you tremendously. Those things can use a lot of video memory - which could push you over your limit regardless of how efficiently you built your models.
#31
is probably what i should have said. a project that i am on has had extensive problems with this however as of 3.5 the nice guys @ gg hooked back up some of the following settings:
$pref::TextureManager::qualityMode = "0";
$pref::TextureManager::reductionLevel = "1";
//and set this to > 128mb of video ram so it forces lower quality if you dont have enough vram for it
$pref::TextureManager::scaleThreshold = "134217729";
I hope this was helpfull.
P.S. i hope this was 100% acurate as its been a long week allready and i'm alittle out of it but i didn't see anyone here suggest this and i wanted to get it into the list of things that may help
09/19/2006 (7:59 am)
Quote: The sudden and extreme drop in performance is consistent with hitting the card's video memory limit. I'm running an ATI 9600 XT 128MB here in my workstation and if I crank up Fluffy's shadows in the TSE demo mission the performance goes from 100fps to <1fps. The video memory limit is like a brick wall.
is probably what i should have said. a project that i am on has had extensive problems with this however as of 3.5 the nice guys @ gg hooked back up some of the following settings:
$pref::TextureManager::qualityMode = "0";
$pref::TextureManager::reductionLevel = "1";
//and set this to > 128mb of video ram so it forces lower quality if you dont have enough vram for it
$pref::TextureManager::scaleThreshold = "134217729";
I hope this was helpfull.
P.S. i hope this was 100% acurate as its been a long week allready and i'm alittle out of it but i didn't see anyone here suggest this and i wanted to get it into the list of things that may help
#32
The reason you are getting a boost when in fullscreen is that frees up some more video memory as the card no longer needs to display the desktop.
Using the standard Torque profiler in a GPU-limited situation is useless; you need a tool that actually looks at the GPU like PIX or nvPerfHUD to get accurate information about timing, memory usage, number of draw calls, etc.
I'll download and take a look.
09/19/2006 (1:24 pm)
This definitely sounds like it's a texture memory problem. There are a lot of things in TSE that eat up video memory including the glow buffer, dynamic shadows, and DRL. Each of these can be turned off in prefs, so try that. In addition, Pauliver's information is correct, and that should significantly decrease texture memory usage.The reason you are getting a boost when in fullscreen is that frees up some more video memory as the card no longer needs to display the desktop.
Using the standard Torque profiler in a GPU-limited situation is useless; you need a tool that actually looks at the GPU like PIX or nvPerfHUD to get accurate information about timing, memory usage, number of draw calls, etc.
I'll download and take a look.
#33
I didn't know the $pref::TextureManager::reductionLevel pref worked. Last time I tried it it did nothing at all.
09/19/2006 (2:38 pm)
We had problems hitting VRAM limits in TSE fairly quickly on 128MB cards. We got around that by moving all of our textures to compressed DDS files (a mix of DXT1 and DXT5, depending on the texture).I didn't know the $pref::TextureManager::reductionLevel pref worked. Last time I tried it it did nothing at all.
#34
Also keep in mind that every shader feature you tack on, whether it's normalmaps or specular, or DRL, is going to impact performance. Most pro game developers don't just blanket turn on every possible shader feature on all materials in their game, not even on the XBox 360. You must figure out a balance and put the fancy shadered materials where they will count most.
I was able to get it up to 100fps on the 39 bank map by turning off shadows, bumpmaps on all materials, specular, and turning on texture reduction. I don't know why it still runs faster in TGE though, or why I still need to reduce the diffuse texture sizes. Could be D3D, could be something else in TSE that's taking up some video memory, etc.
09/19/2006 (3:37 pm)
After taking a look at it, yeah it's texture memory that's killing the performance. It's most likely the normalmaps that are doing it, remember that they double your texture memory if you have one for every diffuse texture, so it's not very surprising that it's blowing through that when TGE did not.Also keep in mind that every shader feature you tack on, whether it's normalmaps or specular, or DRL, is going to impact performance. Most pro game developers don't just blanket turn on every possible shader feature on all materials in their game, not even on the XBox 360. You must figure out a balance and put the fancy shadered materials where they will count most.
I was able to get it up to 100fps on the 39 bank map by turning off shadows, bumpmaps on all materials, specular, and turning on texture reduction. I don't know why it still runs faster in TGE though, or why I still need to reduce the diffuse texture sizes. Could be D3D, could be something else in TSE that's taking up some video memory, etc.
Torque Owner Stefan Lundmark