TGEA future
by Frank Geppert · in Torque Game Engine Advanced · 05/22/2008 (1:34 am) · 43 replies
Dear friends,
I create models and textures for several engine platforms but I really love the TGE / TGEA technology. Sometimes the workflow is a bit sophisticated but the rendering is fast, robust and reliable. Scene management is first class and multiplayer also.
I am in contact with some Torque users (e.g. Thomas Bang, Kiyaku) and got some feedback that helped me to write the following lines. They have similar and even better suggestions like I have.
I will try to mention some of them as ideas for future TGEA versions. Maybe I write something stupid or something that can be done with workarounds. So please dont be too harsh with me! ;)
Changing Assets on the Fy
Real-time replacement of models or dif structures after changing them sounds good to me. I dont want to restart the engine for re-loading a mesh
Shadows / Unified Lighting
Casting and receiving dynamic and static shadows for every mesh would be a fantastic feature. There are so many articles about shadowmapping available. Some of them use depth maps and a post-processing shader so they dont care about the number of polygons, only shadow texture size matters.
ShowTool Pro
Shader and shadow-preview is a real important feature. I would love to see a little material editor there to tweak my material until it looks good. It is really no option to start the engine again and again just to look how the material looks after changing a value.
Besides that ShowTool should be able to read Collada/FBX and convert to dts. All the options of a Max, Blender or Lightwave plugin could be integrated in Showtool. So GG needs only one single supported exporter (Showtool) where we can setup LOD, collision, create Nodes for mountings and similar.
Constructor
I want to see and edit materials for my structures there. So it would be great to have a TGEA mode where we can see shaders and tweak their materials.
Scripting
What do you think about a visual data-block designer? ;)
I would love it.
I hope this can be a base for a constructive discussion.
Regards,
Frank
I create models and textures for several engine platforms but I really love the TGE / TGEA technology. Sometimes the workflow is a bit sophisticated but the rendering is fast, robust and reliable. Scene management is first class and multiplayer also.
I am in contact with some Torque users (e.g. Thomas Bang, Kiyaku) and got some feedback that helped me to write the following lines. They have similar and even better suggestions like I have.
I will try to mention some of them as ideas for future TGEA versions. Maybe I write something stupid or something that can be done with workarounds. So please dont be too harsh with me! ;)
Changing Assets on the Fy
Real-time replacement of models or dif structures after changing them sounds good to me. I dont want to restart the engine for re-loading a mesh
Shadows / Unified Lighting
Casting and receiving dynamic and static shadows for every mesh would be a fantastic feature. There are so many articles about shadowmapping available. Some of them use depth maps and a post-processing shader so they dont care about the number of polygons, only shadow texture size matters.
ShowTool Pro
Shader and shadow-preview is a real important feature. I would love to see a little material editor there to tweak my material until it looks good. It is really no option to start the engine again and again just to look how the material looks after changing a value.
Besides that ShowTool should be able to read Collada/FBX and convert to dts. All the options of a Max, Blender or Lightwave plugin could be integrated in Showtool. So GG needs only one single supported exporter (Showtool) where we can setup LOD, collision, create Nodes for mountings and similar.
Constructor
I want to see and edit materials for my structures there. So it would be great to have a TGEA mode where we can see shaders and tweak their materials.
Scripting
What do you think about a visual data-block designer? ;)
I would love it.
I hope this can be a base for a constructive discussion.
Regards,
Frank
#22
Take a look at this:
http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html
Engle added some slides at the bottom of this post that give a much more in-depth look at this concept. The method, and hopefully my contributions, should be in Shader X7.
Because I love slinging screenshots around:
This is RB running the deferred renderer, and it's displaying the light-term accumulation buffer. It looks like a mess of colors but what you are seeing is actually pretty cool. Effecting the player are 5 lighting values, an ambient color, sunlight vector, and 3 point lights (red, green, blue). The luminance is clamped at 1, but you can see that the colors are not at all saturated or distorted by the values that are getting blended in. This is using a light-accumulation blend that I researched and fumbled through.
Instead of accumulating light color using RGB, I am using the CIE L'u'v' colorspace. The reason I chose CIELUV was because the color values are stored independently of lightness. At L = 0, the color is black, at L = 1, the color is the "full bright" color of it's u'v'. Why is this useful, well if you read the renderer proposal, you should notice that what we really want is separable lighting terms available in the forward render. By multiplying and summing RGB * N.L * Attenuation, you are duplicating information which would be very useful elseware; summation of N.L * Attenuation. You've also go the specular term to consider, because you're writing out to a render target meaning you have 4 channels of information. You can do bit-packing, but it's more shader overhead. This is where LUV really comes into play. The L component of LUV can be thrown away entirely, and a scaled summation of N.L*Attenuation can be used instead. This means that blended color information can be stored in 2 channels, instead of 3, and that information is not duplicated. Furthermore, the u'v' blends behave far, far better than RGB blends because blending any combination of (Ua', Va') and (Ub', Vb') will result in a (u', v') which is always a valid color, and always in the range [0..0.6] provided that (Ua', Va') and (Ub', Vb') are valid colors.
So, long story short: It's being worked on.
Edit: Oh, also, this implementation works on an ATI x300 video card. It don't run fast...but it runs. The limiting factor for this implementation is memory bandwidth.
06/03/2008 (1:00 pm)
Matt, I've actually been following deferred shading very closely. Until recently I have been unimpressed by the requirements needed to use it.Take a look at this:
http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html
Engle added some slides at the bottom of this post that give a much more in-depth look at this concept. The method, and hopefully my contributions, should be in Shader X7.
Because I love slinging screenshots around:
This is RB running the deferred renderer, and it's displaying the light-term accumulation buffer. It looks like a mess of colors but what you are seeing is actually pretty cool. Effecting the player are 5 lighting values, an ambient color, sunlight vector, and 3 point lights (red, green, blue). The luminance is clamped at 1, but you can see that the colors are not at all saturated or distorted by the values that are getting blended in. This is using a light-accumulation blend that I researched and fumbled through. Instead of accumulating light color using RGB, I am using the CIE L'u'v' colorspace. The reason I chose CIELUV was because the color values are stored independently of lightness. At L = 0, the color is black, at L = 1, the color is the "full bright" color of it's u'v'. Why is this useful, well if you read the renderer proposal, you should notice that what we really want is separable lighting terms available in the forward render. By multiplying and summing RGB * N.L * Attenuation, you are duplicating information which would be very useful elseware; summation of N.L * Attenuation. You've also go the specular term to consider, because you're writing out to a render target meaning you have 4 channels of information. You can do bit-packing, but it's more shader overhead. This is where LUV really comes into play. The L component of LUV can be thrown away entirely, and a scaled summation of N.L*Attenuation can be used instead. This means that blended color information can be stored in 2 channels, instead of 3, and that information is not duplicated. Furthermore, the u'v' blends behave far, far better than RGB blends because blending any combination of (Ua', Va') and (Ub', Vb') will result in a (u', v') which is always a valid color, and always in the range [0..0.6] provided that (Ua', Va') and (Ub', Vb') are valid colors.
So, long story short: It's being worked on.
Edit: Oh, also, this implementation works on an ATI x300 video card. It don't run fast...but it runs. The limiting factor for this implementation is memory bandwidth.
#23
06/03/2008 (2:13 pm)
Pat, this looks amazing. I'll admit, my experiences with deferred rendering have been quite limited outside of RenderMonkey, and I do not quite understand the need for a negative Z value, however this research looks fascinating. From the screenshots you posted in Flickr, it appears as if your world-space normals are intertwined with the material system, multiplying in the material normal maps. Could you elaborate on how you went about implementing the two? I'd be really interested in exchanging ideas with you. My email is eternaldark112@gmail.com.
#24
Z-fill pass
This render pass is done per-object, which means you have access to material information. At this point the only material information I store out is the transformed normal from the normal map. So it does this:
-Draw each opaque object
--z-write on, z-test on
--Stencil write on, op incr-sat
--Alpha blend off
--Store WorldSpace normal and eye-space depth
---Build world->tangent matrix in vertex shader
----reverse mul() op order and multiply sampled normal by matrix to get from tangent->world space
----store world space normal in R, G, B of a 64-bit render target, or R and G of a 32 bit render target
----store eye-space depth in A of a 64-bit target, or B and A of a 32-bit target
Notes: A normal, in tangent space, always has a +z value (x,y can be +/-, z is always +). This is not true for object or world space normals. You can reconstruct the z component of a normalized vector by solving this equation for z, when len(v) = 1:
len(v) = sqrt( x*x + y*y + z*z )
The problem with the reconstruction is that you end up with a sqrt(), the result of which can be +/-. To solve this, for the 32-bit targets, I am bit-packing depth and z-sign. Z sign is 1 bit, and depth is 15 bits.
Light Accumulation Pass
This pass renders lights as convex volumes, or full-screen quads (in the case of a vector light). Each of these renders is stenciled against the result of the opaque pass. The reason I put this in is for cases like Marble Blast Ultra, where the majority of the screen is skybox in many situations. (Note this is different than the stencil optimization described in the 2004 GDC presentation on deferred shading) The main issue is that alpha-blending is probably not going to be your friend here, because you are very limited as to what ops you can perform (and all values going to the alpha blend are pre-saturated). So bearing that in mind:
-For each light
--z-write off, z-test = LESS
--cull = CCW/CW, whatever reverse culling is for you, we want to be drawing the "inside" of a sphere, not the "outside"
--Draw convex geometry/fullscreen pass
---Sample from result of previous light pass
---accumulate N.L * attenuation, and (R.V)^n * N.L * Attenuation
---lerp( (Ua', Va'), (Ub', Vb'), N.L * Attenuation )
--StretchRect current target -> sampler input for next light pass
Then the forward pass happens, which is drawing the same render-instances as the z-fill pass. Samples from the light buffer, and poof, it works.
The Torque implementation bits are mostly related to Render Inst Manager changes, and the addition of new bins for different functionality.
This is a presentation I am working on, internally, to present my research. It's a draft, it's dense, and it doesn't have enough pretty pictures.
There is still a significant amount of research to be done.
Another purpose for the alternate blending was to allow for HDR without distortion of color hue. I'm currently not doing any HDR right now because I haven't figured out exactly how I want to do things. My current thinking is to combine the L term and the Depth in a downsampled texture, since downsampled depth is desireable for SSAO, "soft" particles and other effects, etc. The downsampled luminance is obviously useful for determining average luminance, so the idea is to combine these into an R16G16 buffer, and box filter them down to 1x1. Not really sure how that's all going to work out.
The next thing is performance analysis. Performance seems to scale with memory bandwidth only. It does not seem that shader ops are the limiting factor, but blit performance is key to a fast light-accumulation pass. It should be reasonable to come up with a range of video cards that can run the renderer, at varying levels of quality. For example, limiting the number of lights limits the number of StretchRect calls (must be used for MSAA support, boo). We can also reduce precision on the g-buffer pass, and light accumulation pass, to 32-bit targets. This has it's issues, but they are not horrible.
06/03/2008 (3:14 pm)
Sure, basic setup:Z-fill pass
This render pass is done per-object, which means you have access to material information. At this point the only material information I store out is the transformed normal from the normal map. So it does this:
-Draw each opaque object
--z-write on, z-test on
--Stencil write on, op incr-sat
--Alpha blend off
--Store WorldSpace normal and eye-space depth
---Build world->tangent matrix in vertex shader
----reverse mul() op order and multiply sampled normal by matrix to get from tangent->world space
----store world space normal in R, G, B of a 64-bit render target, or R and G of a 32 bit render target
----store eye-space depth in A of a 64-bit target, or B and A of a 32-bit target
Notes: A normal, in tangent space, always has a +z value (x,y can be +/-, z is always +). This is not true for object or world space normals. You can reconstruct the z component of a normalized vector by solving this equation for z, when len(v) = 1:
len(v) = sqrt( x*x + y*y + z*z )
The problem with the reconstruction is that you end up with a sqrt(), the result of which can be +/-. To solve this, for the 32-bit targets, I am bit-packing depth and z-sign. Z sign is 1 bit, and depth is 15 bits.
Light Accumulation Pass
This pass renders lights as convex volumes, or full-screen quads (in the case of a vector light). Each of these renders is stenciled against the result of the opaque pass. The reason I put this in is for cases like Marble Blast Ultra, where the majority of the screen is skybox in many situations. (Note this is different than the stencil optimization described in the 2004 GDC presentation on deferred shading) The main issue is that alpha-blending is probably not going to be your friend here, because you are very limited as to what ops you can perform (and all values going to the alpha blend are pre-saturated). So bearing that in mind:
-For each light
--z-write off, z-test = LESS
--cull = CCW/CW, whatever reverse culling is for you, we want to be drawing the "inside" of a sphere, not the "outside"
--Draw convex geometry/fullscreen pass
---Sample from result of previous light pass
---accumulate N.L * attenuation, and (R.V)^n * N.L * Attenuation
---lerp( (Ua', Va'), (Ub', Vb'), N.L * Attenuation )
--StretchRect current target -> sampler input for next light pass
Then the forward pass happens, which is drawing the same render-instances as the z-fill pass. Samples from the light buffer, and poof, it works.
The Torque implementation bits are mostly related to Render Inst Manager changes, and the addition of new bins for different functionality.
This is a presentation I am working on, internally, to present my research. It's a draft, it's dense, and it doesn't have enough pretty pictures.
There is still a significant amount of research to be done.
Another purpose for the alternate blending was to allow for HDR without distortion of color hue. I'm currently not doing any HDR right now because I haven't figured out exactly how I want to do things. My current thinking is to combine the L term and the Depth in a downsampled texture, since downsampled depth is desireable for SSAO, "soft" particles and other effects, etc. The downsampled luminance is obviously useful for determining average luminance, so the idea is to combine these into an R16G16 buffer, and box filter them down to 1x1. Not really sure how that's all going to work out.
The next thing is performance analysis. Performance seems to scale with memory bandwidth only. It does not seem that shader ops are the limiting factor, but blit performance is key to a fast light-accumulation pass. It should be reasonable to come up with a range of video cards that can run the renderer, at varying levels of quality. For example, limiting the number of lights limits the number of StretchRect calls (must be used for MSAA support, boo). We can also reduce precision on the g-buffer pass, and light accumulation pass, to 32-bit targets. This has it's issues, but they are not horrible.
#25
06/04/2008 (11:25 am)
@Benjamin: Plus, the possible costs of integrating PhysX are horrendous. If the end-user does not own PhysX, the integrated PhysX system is not activated. If you aren't making good use of the PPU, the costs rise to $50,000. See this thread: Code: PhysX in TGE Version 0.3
#26
What Does It Cost?
The NVIDIA PhysX SDK (binary) is 100% free and available right now for download by developers. In addition to the free software offering, NVIDIA offers additional support plans and a paid license program which enables developers to modify the SDK to suit their needs.
The binary version of the PC SDK is offered at no charge as outlined below. Source code (for integration purposes) is available for a fee as provided below:
Free SDK Package:
Most Current PC Binary
Commercial & non-commercial use on PC
Available for Windows & Linux
No PhysX hardware optimization requirement
PS3 platform (through Sony pre-purchase)
06/04/2008 (11:51 am)
@Cory - that thread is several years old. Physx works with and without the card (but of course you can do more if you have the card, or when the drivers are ready, an nVidia GPU). The cost is now free if you don't need the source code - straight from nVidia's website:What Does It Cost?
The NVIDIA PhysX SDK (binary) is 100% free and available right now for download by developers. In addition to the free software offering, NVIDIA offers additional support plans and a paid license program which enables developers to modify the SDK to suit their needs.
The binary version of the PC SDK is offered at no charge as outlined below. Source code (for integration purposes) is available for a fee as provided below:
Free SDK Package:
Most Current PC Binary
Commercial & non-commercial use on PC
Available for Windows & Linux
No PhysX hardware optimization requirement
PS3 platform (through Sony pre-purchase)
#27
06/04/2008 (4:51 pm)
Oh wow... I'm thoroughly embarrassed... :P I should have checked the earlier dates (I looked at the last post date but there are plenty of posts).
#28
Plus, if you hadn't mentioned it then Jaimi wouldn't have posted the info about Nvidia's version of the license and I wouldn't have known Nvidia had finally gotten of their duffs and posted it. ;)
I've been waiting for that info since they bought Ageia.
06/04/2008 (11:14 pm)
No reason to be embarrassed. Just because the resource was started two years ago doesn't mean it's not still being used, only that the license info hadn't been updated by anyone in it. Which is odd, because the SDK was made free from the beginning and the hardware restriction was removed by Ageia in '07.Plus, if you hadn't mentioned it then Jaimi wouldn't have posted the info about Nvidia's version of the license and I wouldn't have known Nvidia had finally gotten of their duffs and posted it. ;)
I've been waiting for that info since they bought Ageia.
#29
Its value is immense and the time frame does not seam too much of a stretch. As the support for DTS seams to be loosing ground, it may very well become a very heavy boat anchor on Torque and tend drag on it.
06/05/2008 (9:46 pm)
I'm hoping that importing the Collada format directly is right up there near the top of that list. At least some at GG sees its value. Its value is immense and the time frame does not seam too much of a stretch. As the support for DTS seams to be loosing ground, it may very well become a very heavy boat anchor on Torque and tend drag on it.
#30
Collada import would help, but i expect to see DTS improved as well.
06/05/2008 (10:12 pm)
Really DTS has been very succesful. The only issues we've had with DTS are the single uv limit and some ease of use issues with the exporters.Collada import would help, but i expect to see DTS improved as well.
#31
In my opinion the best solution would be a modified Show Tool Pro. We could import Collada and/or FBX, make a few changes like mount-points, rename animation-sequences and so on and export this model to DTS.
Advantages:
- the Engine has to handle only one file-format (except Interiors)
- no need to support several export-plugins for the most common modelers
- i like the lightwave-dts-exporter but it would make sense to create a unique interface for all Torque-Users
- each user has the same interface to manage LODs, CollisionMeshes, Animation Sequences, Mount-Points, Materials, Billboards, Decals, MipMaps and so on
06/06/2008 (1:06 am)
I think a engine should have it's own file-format because each engine has different internal requirements. In my opinion the best solution would be a modified Show Tool Pro. We could import Collada and/or FBX, make a few changes like mount-points, rename animation-sequences and so on and export this model to DTS.
Advantages:
- the Engine has to handle only one file-format (except Interiors)
- no need to support several export-plugins for the most common modelers
- i like the lightwave-dts-exporter but it would make sense to create a unique interface for all Torque-Users
- each user has the same interface to manage LODs, CollisionMeshes, Animation Sequences, Mount-Points, Materials, Billboards, Decals, MipMaps and so on
#32
having a converter like Tom said may be usefull for sure, but it can not compare to the freedom of using a 3d package and setting up for export (as you can still edit the model during and after setup in one place).
i like the dts format, it gets the job done, could be improved, but not needed to be replaced by collada etc.
just my $0.2
06/06/2008 (1:58 am)
Having support for other formats in the engine seems pointless to me, you have to set up each model specific to the way torque does things anyway, its not like you would be able to use any model and have it work right.having a converter like Tom said may be usefull for sure, but it can not compare to the freedom of using a 3d package and setting up for export (as you can still edit the model during and after setup in one place).
i like the dts format, it gets the job done, could be improved, but not needed to be replaced by collada etc.
just my $0.2
#33
True, the model will still need to be set up the way Torque expects (LOD groups, eye node, etc) but every engine has those issues. If the exporter were eliminate from the equation it would ease a lot of strain on the tempermental Torque art pipeline.
06/06/2008 (7:51 am)
That problem with DTS is not the format, it's the exporters. Since each one is for a different program and written (often) by different people they all have different features and things they can and can't do. An importer from Collada format to DTS would (I think) solve many of those issues. Every major modeling program can write (natively) to Collada and that is written by the actual application developer(s) so it will be a must better supported feature than the DTS Exporter.True, the model will still need to be set up the way Torque expects (LOD groups, eye node, etc) but every engine has those issues. If the exporter were eliminate from the equation it would ease a lot of strain on the tempermental Torque art pipeline.
#34
06/06/2008 (7:12 pm)
I agree (though a single importer would be tricky for the way maya handle differently the naming convention).
#35
But there is a serious problem with this and probably with many other exporters: they are powerful if they support a lot of DTS features but they are not very well documented.
If you make Showtool Pro an im- and exporter then you can write only one documentation and you need only one support area in the forum for it. If you think that the naming convention in Maya is off then you could change the file in this future Showtool. I think about features to rename nodes, to change the node tree structure, to re-assign lod or collision, add additional nodes, load several models and merge them into one (to add lod or collision). And you could change material parameters in this editor.
Such an editor sounds perfect to me. It would allow to create perfect DTS models and it does not matter what application you use. So it would work with Silo, Wings3d, Modo and all the other tools the same way.
Much more artists could play around with Torque and more contents would make it into the engine.
06/07/2008 (2:11 am)
I absolutely agree with Thomas Bang here. I also love the Lightwave exporter plugin. But there is a serious problem with this and probably with many other exporters: they are powerful if they support a lot of DTS features but they are not very well documented.
If you make Showtool Pro an im- and exporter then you can write only one documentation and you need only one support area in the forum for it. If you think that the naming convention in Maya is off then you could change the file in this future Showtool. I think about features to rename nodes, to change the node tree structure, to re-assign lod or collision, add additional nodes, load several models and merge them into one (to add lod or collision). And you could change material parameters in this editor.
Such an editor sounds perfect to me. It would allow to create perfect DTS models and it does not matter what application you use. So it would work with Silo, Wings3d, Modo and all the other tools the same way.
Much more artists could play around with Torque and more contents would make it into the engine.
#36
If the import/export allowed the saving of the set up scene (for future editing) and the refresh of the model you are setting up, then it would prob be a good general solution.
06/07/2008 (2:35 am)
I can see that an import/export method would be usefull to those who use apps with poor or no exporters. If the import/export allowed the saving of the set up scene (for future editing) and the refresh of the model you are setting up, then it would prob be a good general solution.
#37
06/09/2008 (9:33 am)
Perhaps some of you should look into Shaper? I haven't work with it myself but it does advertise the ability to import DTS files.
#38
10/21/2009 (3:01 am)
@Brian - This is a year old thread... TGEA with Deferred Shading has shipped... its called Torque3D.
#39
10/21/2009 (5:50 am)
Yes, it is an old thread, but it is interesting to see, that most of my points came reality after 1.5 years. Not bad! :)
#40
But it seems Frank is a visionnary man. :-)
Nicolas Buquet
www.buquet-net.com/cv/
10/21/2009 (6:00 am)
Brian is an automate that duplicated a part of the post of Matt Vitteli (the last on the previous page) to post its link about "simulation prêt".But it seems Frank is a visionnary man. :-)
Nicolas Buquet
www.buquet-net.com/cv/
Associate Tom Spilman
Sickhead Games