DDS versus PNG performance
by Jeremiah Fulbright · in Torque Game Engine Advanced · 08/11/2006 (6:58 pm) · 17 replies
I know there is currently nothing more than a "hack" in place for loading of DDS files, although that is pending an update I suppose... I have considered working in DevIL, but am wanting to wait to see what GG has planned/implemented.
I am curious if there is a performance gain currently to using dds textures versus any other format (namely png), or if there would be a hit because of how they are handled currently...
I am curious if there is a performance gain currently to using dds textures versus any other format (namely png), or if there would be a hit because of how they are handled currently...
#2
The only texture concern was for larger textures (2048x2048+), as I knew anything smaller in most cases will be fast, overall. We're not overally concerned with image quality, since DXT compression isn't always a major factor in it, unless it is for a Normal Map, which we would do in PNG or TGA still.
I will try to get some test scenarios in place and do some fps benchmarking and look at numbers from NVPerf Hud, since it seems to be fairly solid with numbers.. As well as possible PIX, depending on if I can set it all up.
Thanks!
08/12/2006 (5:38 pm)
Awesome news. I had noticed that Atlas was using some of the DXT abilities in certain areas, thus why I was curious on performance.The only texture concern was for larger textures (2048x2048+), as I knew anything smaller in most cases will be fast, overall. We're not overally concerned with image quality, since DXT compression isn't always a major factor in it, unless it is for a Normal Map, which we would do in PNG or TGA still.
I will try to get some test scenarios in place and do some fps benchmarking and look at numbers from NVPerf Hud, since it seems to be fairly solid with numbers.. As well as possible PIX, depending on if I can set it all up.
Thanks!
#3
It seems the performance boost that should be there isn't... In fact, there seems to be performance degradation, in the end.
I did some varying tests with NVPerfHud.. which although isn't meant for Benchmarking gives enough information to know that there are performance issues within the entire pipeline.
PNG:
img414.imageshack.us/img414/3692/frostpngri4.jpg
DDS:
img414.imageshack.us/img414/7387/frostddswl0.jpg
As you can see, the screenshot is from the same scene minus a tiny difference in camera angle, but the number of TRIS drawn is the same, including the DrawPrimitive call numbers. The Vid Memory is 1 more for DDS, but thats no major ordeal, but the AGP memory value is 8 more for DDS (which I assume AGP means System memory). This is a bit odd, since I don't know what is being put into System Memory and it was only with DDS usage of textures.
The biggest concern here is the difference in FPS. A difference of 1 or 2 wouldn't be a major concern, but a difference of almost exactly 30 fps in the exact same rendering conditions is a bit of a concern. The best case for no major issues is to Cap/Limit FPS so that there isn't as much spiking in values when you move around or look in certain directions, but still doesn't explain the obvious performance hit.
I guess maybe its due to how textures are handled by TSE or how things are stored and processed, but still seems a bit surprising.
I just realized that I was using TGA for Normal Maps and thought maybe there was an issue with that, so I removed those and went back to the PNGs we had. I gained .5 FPS which still puts me close to a 30 fps difference from PNG to DDS.
08/16/2006 (2:09 pm)
Okay, so I did some testing today after converting textures over to DDS (by hand to make sure I got them to DXT1 or 5)...It seems the performance boost that should be there isn't... In fact, there seems to be performance degradation, in the end.
I did some varying tests with NVPerfHud.. which although isn't meant for Benchmarking gives enough information to know that there are performance issues within the entire pipeline.
PNG:
img414.imageshack.us/img414/3692/frostpngri4.jpg
DDS:
img414.imageshack.us/img414/7387/frostddswl0.jpg
As you can see, the screenshot is from the same scene minus a tiny difference in camera angle, but the number of TRIS drawn is the same, including the DrawPrimitive call numbers. The Vid Memory is 1 more for DDS, but thats no major ordeal, but the AGP memory value is 8 more for DDS (which I assume AGP means System memory). This is a bit odd, since I don't know what is being put into System Memory and it was only with DDS usage of textures.
The biggest concern here is the difference in FPS. A difference of 1 or 2 wouldn't be a major concern, but a difference of almost exactly 30 fps in the exact same rendering conditions is a bit of a concern. The best case for no major issues is to Cap/Limit FPS so that there isn't as much spiking in values when you move around or look in certain directions, but still doesn't explain the obvious performance hit.
I guess maybe its due to how textures are handled by TSE or how things are stored and processed, but still seems a bit surprising.
I just realized that I was using TGA for Normal Maps and thought maybe there was an issue with that, so I removed those and went back to the PNGs we had. I gained .5 FPS which still puts me close to a 30 fps difference from PNG to DDS.
#4
It could be that the API is decompressing the textures (which is a little odd). Have you tried setting breakpoints to make sure that the DDS files are the ones actually being loaded & used?
08/16/2006 (2:38 pm)
Basically what that says to me is that your scene isn't fill rate bound. You can test by running at different resolutions (or NVPerfHud has an option to set a viewport of 1x1) and seeing if the FPS varies.It could be that the API is decompressing the textures (which is a little odd). Have you tried setting breakpoints to make sure that the DDS files are the ones actually being loaded & used?
#5
I will do the 1x1 option, as I was also looking at the options and noticed it, but hadn't checked it out. I did do wireframe just to verify what was being rendered, etc.
I do doubt I am fillrate bound, but thats dependent on video card too, so while I may not be, somebody with lesser might be... Although that scene is pretty rough cause of all of the glowmaps and such.
Edit: hmm it was 2x2 texture option, not viewport option that I saw
08/16/2006 (4:31 pm)
I only have DDS textures to be used currently. I have a couple folders I just swap out between PNG and DDS.I will do the 1x1 option, as I was also looking at the options and noticed it, but hadn't checked it out. I did do wireframe just to verify what was being rendered, etc.
I do doubt I am fillrate bound, but thats dependent on video card too, so while I may not be, somebody with lesser might be... Although that scene is pretty rough cause of all of the glowmaps and such.
Edit: hmm it was 2x2 texture option, not viewport option that I saw
#6
I was running at 1024x768 with the above results... I shifted to 1280x1204 (windowed) and the different in fps isn't nearly as drastic.. roughly 10fps average difference with PNG still winning.. DDS is faster in the initial texture upload stage, since the textures can be pushed up faster and handled by the GPU, but after that, then there is almost no benefit it seems.
There seem to be less hiccup using PNG than DDS too, but it might've just been the areas I was looking at
Although I have changed rendering conditions a bit and probably should reset them back, although when I was testing earlier after changes, the results were the same.
Where yall have SwapEffect set to Flip, I have it set to Discard (which is more proper)
and I set PresentationInterval to 1, versus None.. which has to do with vsync.
I will reset it all just to verify everything
08/16/2006 (4:56 pm)
I did the 2x2 Texture option, and the performance was worse lol... DDS was average 230fps, while PNG average was 275fps..I was running at 1024x768 with the above results... I shifted to 1280x1204 (windowed) and the different in fps isn't nearly as drastic.. roughly 10fps average difference with PNG still winning.. DDS is faster in the initial texture upload stage, since the textures can be pushed up faster and handled by the GPU, but after that, then there is almost no benefit it seems.
There seem to be less hiccup using PNG than DDS too, but it might've just been the areas I was looking at
Although I have changed rendering conditions a bit and probably should reset them back, although when I was testing earlier after changes, the results were the same.
Where yall have SwapEffect set to Flip, I have it set to Discard (which is more proper)
and I set PresentationInterval to 1, versus None.. which has to do with vsync.
I will reset it all just to verify everything
#7
I did notice more hiccups with DDS than with PNG though. I thought I did before, but wasn't sure.. This time I tested and going from 23000 Tris to 200000 Tris didn't even stutter using PNG textures, but with DDS there was stuttering and lots of fluctation with Vid Memory.
So, I guess that means we'll be using PNG for now, since there doesn't seem to be any gain at all and at lower resolutions a solid hit on performance. The no hiccup is sort-of interesting, because on lower resolutions this Interior/Map I am looking at numbers with is a hiccuping beast.
08/16/2006 (5:11 pm)
I put everything back the way it was before and the performance of DDS was finally better than PNG, but only by an average of 2-3 fps, at 1280x1024 (Windowed).I did notice more hiccups with DDS than with PNG though. I thought I did before, but wasn't sure.. This time I tested and going from 23000 Tris to 200000 Tris didn't even stutter using PNG textures, but with DDS there was stuttering and lots of fluctation with Vid Memory.
So, I guess that means we'll be using PNG for now, since there doesn't seem to be any gain at all and at lower resolutions a solid hit on performance. The no hiccup is sort-of interesting, because on lower resolutions this Interior/Map I am looking at numbers with is a hiccuping beast.
#8
I watched the CreateTexture calls and the Interior LMManager made a ton of 256x256 textures created which increased texture memory usage fairly quickly. Atlas played a small role in the texture usage, but I removed Atlas from the Level and its a bit more playable, but still has high usage
I haven't tested without an Atlas terrain just yet, but if I forced textures into memory in either DXT1 or DXT5, the usage was roughly 70mb (down from 246mb)... but obviously stuff looked a bit odd in places
08/16/2006 (10:36 pm)
I have been doing some more experimenting with different options to see what gives the best results and it seems while DDS does in fact save memory and probably helps in the long run... It seems Interior Lightmaps which are all 256x256 are more responsible for the high memory usage, unfortunately.I watched the CreateTexture calls and the Interior LMManager made a ton of 256x256 textures created which increased texture memory usage fairly quickly. Atlas played a small role in the texture usage, but I removed Atlas from the Level and its a bit more playable, but still has high usage
I haven't tested without an Atlas terrain just yet, but if I forced textures into memory in either DXT1 or DXT5, the usage was roughly 70mb (down from 246mb)... but obviously stuff looked a bit odd in places
#9
It never allocates, it never frees, it just sits there and renders. Much friendlier.
This is for 32,768 pixel textures, of course.
Hmm... interesting the LMManager is what's eating loads of memory. But if you have a complex interior scene that's to be expected. You may see better results w/ the Constructor toolchain (but no promises :).
08/17/2006 (11:34 am)
Yup - and this is why I've rewritten the Atlas stuff to fit into a fixed memory target. It's configurable but for right now a blended terrain will live in a fixed, sub-16mb target, and for a unique TOC it'll live in 9.1 mb.It never allocates, it never frees, it just sits there and renders. Much friendlier.
This is for 32,768 pixel textures, of course.
Hmm... interesting the LMManager is what's eating loads of memory. But if you have a complex interior scene that's to be expected. You may see better results w/ the Constructor toolchain (but no promises :).
#10
I'm not surprised that LMManager is eating most of the memory, because the Interior I'm testing on is... huge.. As in, bigger than the vertex buffer huge, although we cleanly handle it (by just not allowing more vertices to be processed, and instead spamming the console to fix it lol)
If I forced everything to goto DXT1 in memory, then I was able to get it to 70mb, which was sexy as hell, but unfortunately didn't look too great... DXT5 looked decent, but still had issues on all fronts.
I can get the same kind of memory usage on a map without an Interior though and a smaller Terrain, but I haven't done a ton of testing on it to compare any differences.
For the most part, we're going to stay with DDS, because despite it being "hacked in", it is straightforward and lets DX handle it, which is what it should do....
08/17/2006 (5:51 pm)
Not freeing or allocating/reallocating is great, as that will kill alot of the thrashing that can happen at certain points currently, with certain configurations.I'm not surprised that LMManager is eating most of the memory, because the Interior I'm testing on is... huge.. As in, bigger than the vertex buffer huge, although we cleanly handle it (by just not allowing more vertices to be processed, and instead spamming the console to fix it lol)
If I forced everything to goto DXT1 in memory, then I was able to get it to 70mb, which was sexy as hell, but unfortunately didn't look too great... DXT5 looked decent, but still had issues on all fronts.
I can get the same kind of memory usage on a map without an Interior though and a smaller Terrain, but I haven't done a ton of testing on it to compare any differences.
For the most part, we're going to stay with DDS, because despite it being "hacked in", it is straightforward and lets DX handle it, which is what it should do....
#11
DDS is also a *must* for cubemaps. Those suckers can drain VRAM and eat fillreat like hogs. We even added support for loading cubemaps straight from packed DDS'es.
08/17/2006 (7:39 pm)
We couldn't argue with the 4:1 compression ratio (8:1 for opaque DXT1 stuff) and went all DDS. We went from ~150MB VRAM usage down to ~30MBs. Diffuse maps don't look that different, but normal maps can get nasty. Since're we're using all-custom shaders, we adpated them to use DXT5 normal map compression in our prototype. Those looks almost as good as the uncompressed ones.DDS is also a *must* for cubemaps. Those suckers can drain VRAM and eat fillreat like hogs. We even added support for loading cubemaps straight from packed DDS'es.
#12
We're pretty much moving to DDS for everything, except it seems GUI doesn't jive well with DDS, but I haven't taken alot of time to investigate the issues, which are more than likely just nonPOW2 texture problems.
For Normal maps, we looked at doing TGA and then I thought about DXT5 Normal map setup, but then realized that we might as well use CxV8U8 which will handle pretty solidly in hardware and if we want to, we can do some shader changes to make the Normal maps look better, but probably wont.. as most hardware can calculate the values faster natively versus running it thru a Shader for that format...
I haven't done much benching to see how much difference it makes, but for most things it has helped.. other than Lightmaps and Atlas...
For that much of a change in VRAM usage, are you also doing DXT compression on Lightmaps for Interiors (DIFS) or how are you handling that
08/17/2006 (7:52 pm)
Oh, I would be interested in the cubemap changes, if those are available or possibly available :)We're pretty much moving to DDS for everything, except it seems GUI doesn't jive well with DDS, but I haven't taken alot of time to investigate the issues, which are more than likely just nonPOW2 texture problems.
For Normal maps, we looked at doing TGA and then I thought about DXT5 Normal map setup, but then realized that we might as well use CxV8U8 which will handle pretty solidly in hardware and if we want to, we can do some shader changes to make the Normal maps look better, but probably wont.. as most hardware can calculate the values faster natively versus running it thru a Shader for that format...
I haven't done much benching to see how much difference it makes, but for most things it has helped.. other than Lightmaps and Atlas...
For that much of a change in VRAM usage, are you also doing DXT compression on Lightmaps for Interiors (DIFS) or how are you handling that
#13
For interiors you'd need some engineering. First you'd need to get the scene lighting code to save and load each lightmap from individual files. You can save them directly to DDS using D3D functions, but it's uncompressed. To compress, you'll need to either run a batch converter on the saved lightmaps, or try to integrate a library which can save compressed DDS files right into TSE.
NVidia has a library for compressing DDS files, but I haven't tried to use it straight from TSE yet. It's probably not a big deal to use, since ATI's cubeMapGen lib works fine.
08/18/2006 (11:54 am)
In our current prototype we're not using interiors nor terrains. It's a custom lightmapped class, so it's easier to store the lightmaps as DDS.For interiors you'd need some engineering. First you'd need to get the scene lighting code to save and load each lightmap from individual files. You can save them directly to DDS using D3D functions, but it's uncompressed. To compress, you'll need to either run a batch converter on the saved lightmaps, or try to integrate a library which can save compressed DDS files right into TSE.
NVidia has a library for compressing DDS files, but I haven't tried to use it straight from TSE yet. It's probably not a big deal to use, since ATI's cubeMapGen lib works fine.
#14
08/18/2006 (12:18 pm)
Actually, you can save to DXTn just fine using D3DX - just have to pass the right parameters for the format. :)
#15
Does D3D feature a way to convert plain RGBA surfaces to DXTn?
08/18/2006 (4:14 pm)
Really? I'm using D3DXSaveTextureToFile and I don't see any extra options. Maybe it'll work if the surfaces are already in DXTn format, but they are render targets (I'm saving some dynamically-generated cubemaps to disk). Does D3D feature a way to convert plain RGBA surfaces to DXTn?
#16
D3DXLoadSurfaceFromSurface is probably what you want. From the remarks: "This function handles conversion to and from compressed texture formats."
So allocate a system memory DXT surface and copy to that, then save it out.
08/18/2006 (5:28 pm)
Hmmm... Whaddya know. That's strange, I could swear I had a discussion with an MS guy about a function that exclusively just did compression. :PD3DXLoadSurfaceFromSurface is probably what you want. From the remarks: "This function handles conversion to and from compressed texture formats."
So allocate a system memory DXT surface and copy to that, then save it out.
#17
08/19/2006 (8:59 am)
Interesting, gotta give that one a try when I get some extra time.
Associate Kyle Carter
In theory it gives you a 4x speedup because it's decompressed in hardware at full texture read speed, and requires 25% as much bandwidth. In practice... you'll have to do some tests to see what the real win is. It'll vary from card to card, maybe from driver version to driver version, and is likely to be easily overshadowed by other issues like how much overdraw your scenes have.
If you do do some testing I'd love to see the numbers. We've not burned a lot of cycles internally on it as we've not been texture read bandwidth bound in any of our performance tuning scenarios. If you can stand the image quality hit, it's not a bad idea to use.
Of course, if you just want the mip loading and don't want to use DXT... AFAIK that works fine and you can go to town.