Game Development Community

Enabling multi-threaded opengl?

by Rob Terrell · in Torque Game Engine · 12/10/2006 (8:50 pm) · 11 replies

I know this is from WWDC, but I just stumbled across the web page: http://developer.apple.com/technotes/tn2006/tn2085.html

Basically, it's a one-liner to turn on OpenGL multithreading, with the potential to double the frame rate on dual core Intel Macs. I checked in MacCarbOGLVideo.cc but it's not in there, so this may be one way to help the sluggish Mac frame rate in 1.5.

Has anyone already tried this already? I'm currently on a TGB project right now, but if I get a chance to try it, I'll report the new old & fps.

#1
12/10/2006 (10:23 pm)
Quote:
Note: Enabling multi-threaded OpenGL execution does not make the OpenGL API thread safe. If you are drawing to a single OpenGL context from multiple threads, then your application code will need to manage locks and flushes to ensure correct visual results.

Looks like there are some Problems:
1. Runs only on Mac OS X 10.4.7 or later (no Windows / no Linux)
2. OpenGL is not thread safe.
3. a lot of locks and flushes, because of only one OpenGL context (the output window)
#2
12/11/2006 (7:57 am)
I didn't realize that TGE was making OpenGL calls from multiple threads. I guess I wasn't paying much attention back when Ben was talking about how they multi-threaded TGE. I do know there's already a crashing bug related to multiple threads making Carbon GUI calls.

Anyway, I went ahead and put this code in.

Con::printf("Enabling gl multi-threading");
CGLContextObj cctx = CGLGetCurrentContext();
CGLError err = CGLEnable( cctx, 313); // kCGLCEMPEngine
if (err != kCGLNoError )
{
Con::printf("Error enabling gl multi-threading: %i", err);
}

Before the change, in the Stronghold mission, my best FPS was 118 (just terrain, water block, & snow particles). Running between the huts, with the AI player nearby, I was getting about 36 fps. After the change, I got roughly 4 fps when running between the huts. (MacBook Pro Core2 Duo, 2.33 Ghz, OSX 10.4.8. Renderer: ATI Radeon X1600 OpenGL Engine, Version: 2.0 ATI-1.4.42, 800 x 600 x 32.)

So, I took it out and the FPS went back up.

Sorry, folks, nothing to see. This is a sign. Back to my TGB project!
#3
12/11/2006 (9:07 am)
Sorry, if you misunderstand me. (My english is poor (german))
TGE is normaly not multi-threaded as far as i know, but you need multi-threading to take advance of OpenGL multithreading. -> Something like: one thread for gamelogic, network and a second thread fo rendering.

Martin
#4
12/11/2006 (9:56 am)
@Martin: It's my misunderstanding, I'm sure. However, Apple's multithreaded OGL isn't related to Torque's threading. The technote I linked to above says, "OpenGL on Intel-based Macintosh systems can take advantage of Mac OS X's multi-threading capabilities to execute parts of the OpenGL framework on a second thread. When used appropriately, this multi-threaded approach can increase the performance of CPU-bound OpenGL applications." Basically, some portions of OpenGL run on the second core, to maximize OpenGL performance.

However, for that to work, it's probably making certain assumptions about how an app uses OpenGL. Maybe TGE isn't using OGL in the way that Apple expected.

However, TGE supports threads: tdn.garagegames.com/wiki/Torque/1.4/MultiThread and see all of the Mac Threading changes in tdn.garagegames.com/wiki/Torque/1.5/WhatsNew quite a bit of TGE is off the main thread, but I have no idea how much of the OpenGL code is off the main thread.
#5
12/11/2006 (12:39 pm)
ConsoleFunction(setMultithreadGL, bool, 2, 2, "(bool enable) You may not want this")
{
	bool enable = dAtob(argv[1]);
	Con::printf("Enabling Multithreaded OpenGL");
	CGLError err = 0;
	CGLContextObj ctx = CGLGetCurrentContext();
	
	// Enable the multi-threading
	if(enable) {
		err =  CGLEnable( ctx, kCGLCEMPEngine);
	} else {
		err =  CGLDisable( ctx, kCGLCEMPEngine);
	}
	
	if (err != kCGLNoError)
	{
		Con::warnf("  -- Couldn't %s Multithreaded OpenGL", enable?"enable":"disable");
	} else {
		Con::printf("  -- Successfully %s Multithreaded OpenGL", enable?"enabled":"disabled");
	}
}

icculus.org/~chunky/stuff/multithreadgl.mov

Well, that dropped my framerate by a healthy amount. The video is captured at a miserable framerate on purpose, the salient part is the FPS counter in the top left hand corner.

So then, re-reading the apple technote:
Quote:Enabling multi-threaded execution does not guarantee a performance increase for all applications. Your application most likely will not benefit if it:
- Frequently calls OpenGL functions that require synchronization.

Your application is likely to benefit if it:
- Doesn't require a lot of data synchronization or retrieval from OpenGL (glGet*, glPopAttrib, glFinish, etc).

So then we break open apple's GL profiler [man, Apple's profiling tools...] and we see this:
icculus.org/~chunky/stuff/multithreadgl.png
So from my limited understanding of openGL; at least on this laptop [the expensive 15" core2 duo mac book, with the bigger video memory thingy], torque is spending a *lot* of time flushing, which is basically generating round trips to the GPU. On the other hand, I really don't know openGL too well, so.. .that may not be the problem at all.

Gary (-;

EDIT! Oh I should add that this profile is with multithreaded opengl turned OFF.
#6
12/11/2006 (12:59 pm)
Is flush being called by the engine?
or is it an implicit call?

if the engine is calling it, remove it. and try again.
#7
12/11/2006 (3:09 pm)
Quote:To obtain the maximum performance with OpenGL multi-threaded execution, follow these guidelines:

Avoid the use of glGet type calls, or any call that will cause a round trip to and from the GPU. (shadow OpenGL state, check errors only on debug builds, and so forth)

Double buffer data to avoid stalls

Use Vertex Buffer Objects (VBO) for vertex data.

Use Pixel Bufffer Objects (PBO) for pixel data.

Torque does NONE of these. NONE. And there is why multithreaded OpenGL is not going to help Torque.


Also, CGLFlushDrawable is part of aglSwapBuffers (i.e. the call that makes things show up on screen), and that profiler output says Torque is only spending 8.69% of its time in OpenGL, 4.36% of which is spent swapping buffers. OpenGL is clearly not your main bottleneck here, and you should use Shark to see what is.
#8
12/11/2006 (3:51 pm)
Torque still uses the dglCanononical method no?

does it not perform some get's to check state?

Agreed, only <9% of apptime in gl seems really low.
#9
12/12/2006 (7:15 pm)
Thanks, Alex, I was hoping you'd post here.
#10
01/19/2007 (7:14 pm)
@All: As Alex said, Torque doesn't do the things that one needs to do to take advantage of Multithreaded OpenGL. I'd guess the main problem is the plethora of glGet*() calls that Torque makes, every single frame.

Calls to glGet*() type functions are quite expensive, and should be avoided by all OpenGL programs. They require communication with the graphics driver, which means a sync point between the driver thread and the application. This is true for D3D as well as OpenGL: getting data out of the driver can be expensive, and cause rendering stalls. The proper way to deal with this is to cache the data requested in the dgl*() functions that call glGet*() functions.

GFX, the rendering interface in TGEA ( TSE ) caches data, so you don't have to hit the driver when you want to know what the current modelview matrix is. An OpenGL implementation of GFX will, of course, also cache this data... And do many other lovely things that increase performance. So, I don't see that there's a huge advantage in spending time retrofitting DGL, when that time could be spent on GFX. Thus... I'm probably not going to patch up DGL and other parts of TGE to play nice with Apple's multithreaded OpenGL. If a patch is submitted that fixes this issue, I'll review it & get a point release out as fast as possible... But there are better performance wins and other cool things I can be chasing for Mac Torque users than retrofitting old tech to work with new tech.

/Paul
#11
01/19/2007 (7:49 pm)
Quote: GFX, the rendering interface in TGEA ( TSE ) caches data, so you don't have to hit the driver when you want to know what the current modelview matrix is. An OpenGL implementation of GFX will, of course, also cache this data... And do many other lovely things that increase performance. So, I don't see that there's a huge advantage in spending time retrofitting DGL, when that time could be spent on GFX.

Of course, I love being incendiary, so I gotta point out that I'll bet I could cache a few variables faster than I could reimplement an entire rendering layer using an API that said rendering layer isn't really designed to use.

Unless, of course, said rendering layer is *already* in the process of being reimplemented... is it?

*tumbleweed*

Of course, in practice, the other problem with patching to cache glGet calls may not actually fix the problem, which can only be discerned with an appropriate profiler.

Even better, to quote AlexS:
Quote:OpenGL is clearly not your main bottleneck here, and you should use Shark to see what is.

At least if you're using a mac, it's time to share the love in this thread and this one, which are the two real bottlenecks, on my system, according to shark.

Paul Scott [every time I think paul's name, I think of Rocky Horror Picture Show. Sad but true] and asmaloney (Andy) have been in some sort of coding frenzy to improve performance, and are achieving stellar results!

Gary (-;