Torque on Amazon's EC2. Need your help!
by David Wyand · 01/15/2008 (10:26 pm) · 25 comments
DISCLAIMER: This is a personal project undertaken by me, David Wyand. It does not represent work being performed by GarageGames, nor is it an indication of any future product releases.
Introduction
For a while now I've been curious as to viability of Amazon's EC2 service for running a game server. Specifically TGE 1.5.2 under a flavour of Linux. In this blog I'll describe how I've done exactly that, and at the end is a request for your help in testing this out.
What is Amazon's EC2
Amazon's Elastic Compute Cloud (EC2) is a web service that allows you to bring online a virtual server (instance) within minutes. You only pay for the instance-hours and bandwidth that you actually consume. An instance starts up with an Amazon Machine Image (AMI) that contains your Linux OS, applications, libraries, etc. that you've set up and stored in your Amazon S3 account. You may also make use of a number of pre-configured AMI's as a starting point.
This instant-on flexibility has the potential to be a great boon to a game designed to take advantage of it. You could add or remove servers as the player load demands. You could also recover in minutes from a hardware failure by using a custom AMI, rather than the hours or days it could take with a typical dedicated server. Using the service for occasional test servers would also take advantage of this flexibility.
This flexibility does come with a cost however. If you were to compare the monthly cost of running a single Amazon EC2 virtual server to that of a typical dedicated game server, you'll find that the EC2 server costs more (based on my research). To get the most for your money your application will need to take advantage of the service's strengths.
There is also the question of whether or not the Amazon EC2 cloud, typically used for web services, is set up to handle a game server. This is what I'm hoping to find out in this little experiment of mine.
Setting up a Torque Server
As my Linux Fu is definitely at the white belt level, I had been hesitant to set up a dedicated TGE server under Amazon's EC2. However two articles pushed me over the edge to give it a try:
* Ubuntu 7.10 Gutsy Base Install - Run my preferred Linux variant on EC2
* VNC on Ubuntu Feisty - To break free of the command line.
With those in hand I set up an Amazon EC2 small instance (single, virtual 32-bit CPU) with the appropriate packages to compile a TGE 1.5.2 dedicated server. Everything went smoothly and it wasn't long before I had a dedicated server up and running. Unfortunately I couldn't connect to it from my PC.
At first I thought I hadn't set up the EC2 firewall correctly to allow the appropriate ports through. I wrote a tiny (8 lines I believe) Twisted Python-based UDP server to listen on the port. Sure enough I saw the messages from my TGE client attempting to connect, so the firewall was fine.
After some digging I found the problem was a Sleep() call in Platform::process() in x86UNIXWindow.cc. It was causing a 2 second delay in the TGE event loop rather than the milliseconds it should have. Checking the forums I see this has been an issue in the past, although it looked to have been corrected for most people. Not being a Linux expert I don't know why I had this issue. Was it Ubuntu? The kernel being used by EC2? The fact I was on a virtual server?
Fortunately the fix was easy: remove the Sleep(). It pegs the server at 100% CPU usage, but that is fine for my little experiment. I'll leave this for a future debug session (or for a Linux expert).
So in summary: It worked! I was running through the demo Stronghold mission giving Kork a hard time.
Give it a Go Yourself
I'm going to leave the server up for a few days so feel free to drop in and try it out. Use the TGE 1.5.2 demo's Starter.fps game and join the Amazon EC2 Test server.

If for some reason you cannot find it, you may use my GarageGames' Master Server Monitor to help determine if the issue is at your end. Head to www.gnometech.com and click on the open monitor link on the left. This will open the server monitor window. Click on the FPS Starter Kit blue arrow and you should see the server listed.

But can it Handle the Load
Since I've had my game server up for the last couple of weeks, I've had up to four people in and shooting at each other. From my perspective the user experience was smooth without any noticeable lag. To help with my testing I wanted to be able to pull some statistics from the server. Bandwidth usage, number of players, event loop processing speed, etc.
I decided that the best way of presenting this real-time data was through a web interface. But I didn't want to skew the results by using the game server to present this data -- especially the bandwidth usage. So I set up another Amazon EC2 instance to act as my presenter. Here's how I have it all set up:

On the left is the game server. It includes the TGE dedicated server, as well as a Twisted Python process to collect the statistics and send them to the web server. On the right is a typical Apache web server running under Ubuntu, with some custom PHP scripts to make the real-time JpGraph-generated charts work. One item not shown is that I'm using MySQL to store the statistics. The two servers communicate over Amazon's internal network so that bandwidth usage is free. Here's what the real-time graphs look like under IE6 with me connected to the game server:

You too can take a look at this data while running around my game server. Go to www.torquetest.info which will forward you to the EC2 web server. Set how often you'd like the graphs to redraw at the bottom of the page. Then run around the Stronghold mission and see how the network usage changes as you shoot at poor Kork.
A Call for Help
I'd really like to bang on this Amazon EC2 server to see how it holds up. To do this I'll need a lot of players on at once. This means you! :o) Here's the details:
When: Thursday January 17th at 9:00pm EST (6:00pm PST) for an hour
Where: Amazon EC2 Test server
Using: Standard TGE 1.5.2 Starter.fps Stronghold mission
It would be great to have at least 32 players on at once all shooting at each other. If you could connect during the testing time and even just leave your Orc standing around that would be great. I'll share all of my collected statistics with the community in a follow-up blog.
Thanks for reading!
- LightWave Dave
NOTE: I've written a follow up to my stress test. You may find it here
Introduction
For a while now I've been curious as to viability of Amazon's EC2 service for running a game server. Specifically TGE 1.5.2 under a flavour of Linux. In this blog I'll describe how I've done exactly that, and at the end is a request for your help in testing this out.
What is Amazon's EC2
Amazon's Elastic Compute Cloud (EC2) is a web service that allows you to bring online a virtual server (instance) within minutes. You only pay for the instance-hours and bandwidth that you actually consume. An instance starts up with an Amazon Machine Image (AMI) that contains your Linux OS, applications, libraries, etc. that you've set up and stored in your Amazon S3 account. You may also make use of a number of pre-configured AMI's as a starting point.
This instant-on flexibility has the potential to be a great boon to a game designed to take advantage of it. You could add or remove servers as the player load demands. You could also recover in minutes from a hardware failure by using a custom AMI, rather than the hours or days it could take with a typical dedicated server. Using the service for occasional test servers would also take advantage of this flexibility.
This flexibility does come with a cost however. If you were to compare the monthly cost of running a single Amazon EC2 virtual server to that of a typical dedicated game server, you'll find that the EC2 server costs more (based on my research). To get the most for your money your application will need to take advantage of the service's strengths.
There is also the question of whether or not the Amazon EC2 cloud, typically used for web services, is set up to handle a game server. This is what I'm hoping to find out in this little experiment of mine.
Setting up a Torque Server
As my Linux Fu is definitely at the white belt level, I had been hesitant to set up a dedicated TGE server under Amazon's EC2. However two articles pushed me over the edge to give it a try:
* Ubuntu 7.10 Gutsy Base Install - Run my preferred Linux variant on EC2
* VNC on Ubuntu Feisty - To break free of the command line.
With those in hand I set up an Amazon EC2 small instance (single, virtual 32-bit CPU) with the appropriate packages to compile a TGE 1.5.2 dedicated server. Everything went smoothly and it wasn't long before I had a dedicated server up and running. Unfortunately I couldn't connect to it from my PC.
At first I thought I hadn't set up the EC2 firewall correctly to allow the appropriate ports through. I wrote a tiny (8 lines I believe) Twisted Python-based UDP server to listen on the port. Sure enough I saw the messages from my TGE client attempting to connect, so the firewall was fine.
After some digging I found the problem was a Sleep() call in Platform::process() in x86UNIXWindow.cc. It was causing a 2 second delay in the TGE event loop rather than the milliseconds it should have. Checking the forums I see this has been an issue in the past, although it looked to have been corrected for most people. Not being a Linux expert I don't know why I had this issue. Was it Ubuntu? The kernel being used by EC2? The fact I was on a virtual server?
Fortunately the fix was easy: remove the Sleep(). It pegs the server at 100% CPU usage, but that is fine for my little experiment. I'll leave this for a future debug session (or for a Linux expert).
So in summary: It worked! I was running through the demo Stronghold mission giving Kork a hard time.
Give it a Go Yourself
I'm going to leave the server up for a few days so feel free to drop in and try it out. Use the TGE 1.5.2 demo's Starter.fps game and join the Amazon EC2 Test server.

If for some reason you cannot find it, you may use my GarageGames' Master Server Monitor to help determine if the issue is at your end. Head to www.gnometech.com and click on the open monitor link on the left. This will open the server monitor window. Click on the FPS Starter Kit blue arrow and you should see the server listed.

But can it Handle the Load
Since I've had my game server up for the last couple of weeks, I've had up to four people in and shooting at each other. From my perspective the user experience was smooth without any noticeable lag. To help with my testing I wanted to be able to pull some statistics from the server. Bandwidth usage, number of players, event loop processing speed, etc.
I decided that the best way of presenting this real-time data was through a web interface. But I didn't want to skew the results by using the game server to present this data -- especially the bandwidth usage. So I set up another Amazon EC2 instance to act as my presenter. Here's how I have it all set up:

On the left is the game server. It includes the TGE dedicated server, as well as a Twisted Python process to collect the statistics and send them to the web server. On the right is a typical Apache web server running under Ubuntu, with some custom PHP scripts to make the real-time JpGraph-generated charts work. One item not shown is that I'm using MySQL to store the statistics. The two servers communicate over Amazon's internal network so that bandwidth usage is free. Here's what the real-time graphs look like under IE6 with me connected to the game server:

You too can take a look at this data while running around my game server. Go to www.torquetest.info which will forward you to the EC2 web server. Set how often you'd like the graphs to redraw at the bottom of the page. Then run around the Stronghold mission and see how the network usage changes as you shoot at poor Kork.
A Call for Help
I'd really like to bang on this Amazon EC2 server to see how it holds up. To do this I'll need a lot of players on at once. This means you! :o) Here's the details:
When: Thursday January 17th at 9:00pm EST (6:00pm PST) for an hour
Where: Amazon EC2 Test server
Using: Standard TGE 1.5.2 Starter.fps Stronghold mission
It would be great to have at least 32 players on at once all shooting at each other. If you could connect during the testing time and even just leave your Orc standing around that would be great. I'll share all of my collected statistics with the community in a follow-up blog.
Thanks for reading!
- LightWave Dave
NOTE: I've written a follow up to my stress test. You may find it here
About the author
A long time Associate of the GarageGames' community and author of the Torque 3D Game Development Cookbook. Buy it today from Packt Publishing!
#22
Thanks for helping with the stress test. Your running around was exactly what we needed.
Francisco:
Thanks for the information.
And just to be complete:
I had no noticeable lag during the stress test. I've also recorded all network statistics from my TGE client during the play session, and will include them in my follow up.
- LightWave Dave
01/18/2008 (6:55 am)
Dunsany:Thanks for helping with the stress test. Your running around was exactly what we needed.
Francisco:
Thanks for the information.
And just to be complete:
I had no noticeable lag during the stress test. I've also recorded all network statistics from my TGE client during the play session, and will include them in my follow up.
- LightWave Dave
#23
Not out of the box. You'll have to get rid of Direct3D and modify the GFX class to not break the functions which calls it. Ask Thomas or throw me an email and I'll look it up.
01/18/2008 (2:24 pm)
Quote:
Oh, one question. Do you think Vanilla TGEA would run in dedicated mode on this?
Not out of the box. You'll have to get rid of Direct3D and modify the GFX class to not break the functions which calls it. Ask Thomas or throw me an email and I'll look it up.
#24
01/22/2008 (12:22 pm)
Wow, Dave you are a stupendous badass. Can't wait to read the followup. 
Torque Owner Francisco Canedo
I experienced medium lag
-P4 HT 1GB Geforce3@64M
-conecction type: cable 2048Kbps
-location: mazatlan, Mexico.
-timezone:7:00pm MST
note: at this hour always have lag in any game. good time for me at >9:00pm MST. :-)
warme (MEX).