Game Development Community

dev|Pro Game Development Curriculum

Torque on Amazon's EC2. Need your help!

by David Wyand · 01/15/2008 (10:26 pm) · 25 comments

DISCLAIMER: This is a personal project undertaken by me, David Wyand. It does not represent work being performed by GarageGames, nor is it an indication of any future product releases.

Introduction
For a while now I've been curious as to viability of Amazon's EC2 service for running a game server. Specifically TGE 1.5.2 under a flavour of Linux. In this blog I'll describe how I've done exactly that, and at the end is a request for your help in testing this out.

What is Amazon's EC2
Amazon's Elastic Compute Cloud (EC2) is a web service that allows you to bring online a virtual server (instance) within minutes. You only pay for the instance-hours and bandwidth that you actually consume. An instance starts up with an Amazon Machine Image (AMI) that contains your Linux OS, applications, libraries, etc. that you've set up and stored in your Amazon S3 account. You may also make use of a number of pre-configured AMI's as a starting point.

This instant-on flexibility has the potential to be a great boon to a game designed to take advantage of it. You could add or remove servers as the player load demands. You could also recover in minutes from a hardware failure by using a custom AMI, rather than the hours or days it could take with a typical dedicated server. Using the service for occasional test servers would also take advantage of this flexibility.

This flexibility does come with a cost however. If you were to compare the monthly cost of running a single Amazon EC2 virtual server to that of a typical dedicated game server, you'll find that the EC2 server costs more (based on my research). To get the most for your money your application will need to take advantage of the service's strengths.

There is also the question of whether or not the Amazon EC2 cloud, typically used for web services, is set up to handle a game server. This is what I'm hoping to find out in this little experiment of mine.

Setting up a Torque Server
As my Linux Fu is definitely at the white belt level, I had been hesitant to set up a dedicated TGE server under Amazon's EC2. However two articles pushed me over the edge to give it a try:

* Ubuntu 7.10 Gutsy Base Install - Run my preferred Linux variant on EC2
* VNC on Ubuntu Feisty - To break free of the command line.

With those in hand I set up an Amazon EC2 small instance (single, virtual 32-bit CPU) with the appropriate packages to compile a TGE 1.5.2 dedicated server. Everything went smoothly and it wasn't long before I had a dedicated server up and running. Unfortunately I couldn't connect to it from my PC.

At first I thought I hadn't set up the EC2 firewall correctly to allow the appropriate ports through. I wrote a tiny (8 lines I believe) Twisted Python-based UDP server to listen on the port. Sure enough I saw the messages from my TGE client attempting to connect, so the firewall was fine.

After some digging I found the problem was a Sleep() call in Platform::process() in x86UNIXWindow.cc. It was causing a 2 second delay in the TGE event loop rather than the milliseconds it should have. Checking the forums I see this has been an issue in the past, although it looked to have been corrected for most people. Not being a Linux expert I don't know why I had this issue. Was it Ubuntu? The kernel being used by EC2? The fact I was on a virtual server?

Fortunately the fix was easy: remove the Sleep(). It pegs the server at 100% CPU usage, but that is fine for my little experiment. I'll leave this for a future debug session (or for a Linux expert).

So in summary: It worked! I was running through the demo Stronghold mission giving Kork a hard time.

Give it a Go Yourself
I'm going to leave the server up for a few days so feel free to drop in and try it out. Use the TGE 1.5.2 demo's Starter.fps game and join the Amazon EC2 Test server.

www.gnometech.com/torque/images/ec2/2008-01-15-AmazonEC2TestServer.jpg
If for some reason you cannot find it, you may use my GarageGames' Master Server Monitor to help determine if the issue is at your end. Head to www.gnometech.com and click on the open monitor link on the left. This will open the server monitor window. Click on the FPS Starter Kit blue arrow and you should see the server listed.

www.gnometech.com/torque/images/ec2/2008-01-15-AmazonEC2TestMonitor.jpg
But can it Handle the Load
Since I've had my game server up for the last couple of weeks, I've had up to four people in and shooting at each other. From my perspective the user experience was smooth without any noticeable lag. To help with my testing I wanted to be able to pull some statistics from the server. Bandwidth usage, number of players, event loop processing speed, etc.

I decided that the best way of presenting this real-time data was through a web interface. But I didn't want to skew the results by using the game server to present this data -- especially the bandwidth usage. So I set up another Amazon EC2 instance to act as my presenter. Here's how I have it all set up:

www.gnometech.com/torque/images/ec2/2008-01-15-AmazonEC2Diagram.jpg
On the left is the game server. It includes the TGE dedicated server, as well as a Twisted Python process to collect the statistics and send them to the web server. On the right is a typical Apache web server running under Ubuntu, with some custom PHP scripts to make the real-time JpGraph-generated charts work. One item not shown is that I'm using MySQL to store the statistics. The two servers communicate over Amazon's internal network so that bandwidth usage is free. Here's what the real-time graphs look like under IE6 with me connected to the game server:

www.gnometech.com/torque/images/ec2/2008-01-15-AmazonEC2Alpha.jpg
You too can take a look at this data while running around my game server. Go to www.torquetest.info which will forward you to the EC2 web server. Set how often you'd like the graphs to redraw at the bottom of the page. Then run around the Stronghold mission and see how the network usage changes as you shoot at poor Kork.

A Call for Help
I'd really like to bang on this Amazon EC2 server to see how it holds up. To do this I'll need a lot of players on at once. This means you! :o) Here's the details:

When: Thursday January 17th at 9:00pm EST (6:00pm PST) for an hour
Where: Amazon EC2 Test server
Using: Standard TGE 1.5.2 Starter.fps Stronghold mission

It would be great to have at least 32 players on at once all shooting at each other. If you could connect during the testing time and even just leave your Orc standing around that would be great. I'll share all of my collected statistics with the community in a follow-up blog.

Thanks for reading!

- LightWave Dave

NOTE: I've written a follow up to my stress test. You may find it here

About the author

A long time Associate of the GarageGames' community and author of the Torque 3D Game Development Cookbook. Buy it today from Packt Publishing!

Page «Previous 1 2
#1
01/16/2008 (12:45 am)
I'll be there!
#2
01/16/2008 (12:56 am)
Gah! 2am GMT?? Ah well, silly timezones :)

Have fun dave, will be interesting to see how it fares.
#3
01/16/2008 (4:13 am)
Iam at GMT -3, some of you boys could help me calculate my local hour to this?
#4
01/16/2008 (6:00 am)
I'll do my best to be there, I had looked at this service before wondering if Torque would work on it or not. Please post your results! :D
#5
01/16/2008 (6:55 am)
Phil:
Sorry about the time. With the day job and putting kids to sleep it is hard to work on my stuff before 9pm. I'd like to do a test that includes Europe at some point in the future, assuming that Thursday's test works out. But you could also connect to the server before bed and leave your Orc standing around. Call him Cannon Fodder. :o)

Novack:
There are a number of sites that help with time zone conversion. I found this one specifically geared towards GMT-3: wwp.gmt-3.org/. I hope that helps.

- LightWave Dave
#6
01/16/2008 (11:35 am)
Very interesting. I'll try to help test.
#7
01/16/2008 (11:42 am)
Dave,

This is a very exciting and useful experiment. Thanks for trying it out. I'll do my best to be there and add to your data pool. Meanwhile, I look forward to a discussion of the results.

Hall Of Worlds - For Gamers
EdM|GPGT
#8
01/16/2008 (12:41 pm)
Quote:Fortunately the fix was easy: remove the Sleep(). It pegs the server at 100% CPU usage, but that is fine for my little experiment. I'll leave this for a future debug session (or for a Linux expert).

Since you're on linux, you can use usleep(3) to sleep for sub-second time units [the "u" is the closest ascii character to the greek micro - the parameter is an integer number of microseconds]:

#include <unistd.h>
// prototype: int usleep(useconds_t usec);
usleep(50); // one twentieth of a second

Alternatively, and what the really cool kids do if they want their code to work on all unix platforms dating back to the dark ages, is use a select(2) [or poll(2), according to taste]:

#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>

// prototype: int select(int nfds, fd_set *readfds, fd_set *writedfs, fd_set *exceptfds, struct timeval *timeout);
struct timeval timeout;
timeout.tv_sec=0;
timeout.tv_usec=50; // One twentieth of a second
select(0, NULL, NULL, NULL, &timeout);

Then there's the final and rather more meaningful way to solve the problem. I don't have the torque source immediately on hand, but search the code for where it actually poll(2)s a network socket and increase the timeout on it. That way you're essentially blocking a little longer on looking for network traffic. If network traffic arrives, then the code continues immediately. If there's no network traffic, then program execution stalls briefly. It's the ideal way to stop your app burning up the CPU in this case.

Uh, hope that somewhere in that meaningless drivel you find a helpful answer,
Gary (-;

EDIT: Yay, 1 line of code, 1 bug.
#9
01/16/2008 (2:19 pm)
Gary:
Thanks. That's some useful information.

My questions would then be: Why isn't the Torque Linux platform already making use of those options (considering the community centered development)? Is it just that the Amazon EC2 virtual server appears to be a special case, and the current Sleep() operation has been good enough?

Just curious as my Linux dev time has been quite limited. Thanks!

- LightWave Dave
#10
01/16/2008 (4:32 pm)
Mainly, it's probably just a tpyo in the source [the sleep was meant to be usleep], that hasn't gotten around to being fixed.

It's a little hard to tell exactly what's going on at GG regarding linux and TGE (forum linky), but I believe that the reason no changes get pushed in is that GG simply don't support their products on linux anymore - that the time invested could be better spent elsewhere.

Gary (-;

EDIT Edited to make as nonincendiary as I can :-)
#11
01/16/2008 (5:14 pm)
>Why isn't the Torque Linux platform already making use of those options (considering the community centered development)?

All Torque based Linux development has pretty much been pushed to those companies that need it and can afford it. I don't think there is much of a community effort left.

As for the virtual server, the virtual part of it does weird nasty things to anything that is sensitive to timing issues. For me it was running a twitch action game server and asterisk servers. It'll do fine on those games that aren't twitch action, but expect unpredicable weirdness otherwise.
#12
01/17/2008 (5:51 pm)
Just a reminder to anyone subscribed to this blog's comments:

The testing will begin in 10 minutes.

I've also updated www.torquetest.info with some more statistics and an auto updating Message of the Minute.

See you there!

- LightWave Dave
#13
01/17/2008 (7:17 pm)
Dave, this is really sweet. Had fun testing!
#14
01/17/2008 (7:18 pm)
Oh, one question. Do you think Vanilla TGEA would run in dedicated mode on this?
#15
01/17/2008 (7:24 pm)
Dave,

Sorry man. I missed it. I was all ready and then I blew it and showed up over an hour late. I blame dinner, but mostly I just forgot.

grumble....

I very much wanted to see how this worked out in person too.

So, I hope a lot of folks showed up and that you got some good data. I also hope you want to do this again, perhaps to gather more data?

Hall Of Worlds - For Gamers
EdM|GPGT
#16
01/17/2008 (7:28 pm)
@Jeremy and Dave you killme all the testing. lol


warme (MEX).
#17
01/17/2008 (7:31 pm)
The testing session is now over

Thanks to everyone that participated! :) I'll write a follow-up blog with the results.

Jeremy:
Thanks for dropping in to the stress test! I understand that TGEA out-of-the-box is not set up for a dedicated Linux server. Thomas Lund is the only one I've heard to have tackled this. I've not attempted it myself yet -- although I will need to at some point for my game.

Ed:
We missed you dude! While we had fun in Orc town, participation was disappointing. I'd like to reschedule for another stress test and will write it up in another blog.

- LightWave Dave
#18
01/17/2008 (7:33 pm)
Francisco:
Thanks for sticking it out for the whole stress test. Dueling Orc snipers was fun! :o)

- LightWave Dave
#19
01/17/2008 (7:38 pm)
I ready for more fun :-) , and don't forget add more ammo to the map, in the next test.


warme (MEX).
#20
01/17/2008 (8:12 pm)
I logged in for most of the hour but was away from my computer for about half that time. I rarely play first person shooters so I mostly just ran around looking for ammo and dying a lot. :p

I experienced little or no lag on my end.
Page «Previous 1 2