Thursday, March 29, 2012

Basic AI

I finished the basic ai framework for my game! Have a look (better watch in HD, or else you wont see much). There are two teams, red and green (laser shots have team colors).


If you plan to develop AI for your game, I can highly recommend "Artificial Intelligence for Games" from Ian Millington. Its fantastic! Also all the "AI Game Programming Wisdom" books are very good. The core of my AI framework is inspired by an article from AI Game Programming Wisdom 4 (3.5 "Knowledge-Based Behavior System - A Decision Tree/Finite State Machine Hybrid").
I wont tell you details about my AI, because thats like revealing the secret behind a magic trick. It can ruin the whole game experience.
Did you know for example that in Half-Life 1 the marines in a squads were not really communicating with each other? The trick was, that a squad had two attack slots and every time a slot was filled or opened up, the marines yelled stuff like "Cover me". When you play Half-Life you don't want to know such things. By the way, heres a nice article about the AI in Half Life (Open Source rocks :-)

Wednesday, March 21, 2012

Parallel Game Architecture, Part 2

As promised, here is the second part of my game architecture post.
Last week I coded something I wanted for a long time in my game. A scheduling visualizer! With it you get a much better feeling how all the parallel stuff works. In the picture below you can see 20ms of game time. Each row represents one thread and the rightmost frame is the last frame rendered. On the basis of the picture I will try to explain how my architecture works. To see the scheduling visualizer in action, jump to the video at the bottom.

20ms of game time: Each row represents one thread (ThreadId is shown at the beginning of a row).
Firstly, we have four Systems with the following colors in the diagram:
  • Graphics: Green
  • Physics: Blue
  • AI: Orange
  • Input: Turquoise
Each of the systems is implemented as a separate Windows Game Library. To fit into the framework a system has to provide a few objects with specific interfaces. For example the ISystemScene interface is needed for object creation and destruction. The other important interface is ISystemTask. At the beginning of a frame, the framework calls every System's ISystemTask.Update(DeltaTime) method in parallel. In the update method all the work of the system is done. So far we have functional parallelism. Additionally we want data parallelism. To achieve this, every system gets a reference to the frameworks taskmanager (uses the ParallelTasks library) on initialization. The taskmanager can be used by the system to issue work, which should be executed in parallel.

Graphics System 
One example for this can be seen in the picture. The graphics system issues a task, which updates the positions of all particles in the scene. This is the single small green rectangle. The graphics system issues this task at the beginning of the frame rendering. At the end of the frame rendering, the particles are getting drawn. If the task hasn't finished till that point in time the graphics system has to wait. This occurs very seldom, but you never know what the OS scheduler does with your threads.

Physics System
Real data parallel work is done in the physics system. If you look at the picture you can identify four data parallel parts. To be honest, I don't know what exactly is happening there. Jitter, the physics library I use, was already parallelized. And luckily the library is implemented very well and the source code is available too. I only had to replace Jitter's Threadmanager with my own one. My new custom Threadmanager is using a parallel for loop, which the taskmanager provides, to iterate over the work which gets issued by Jitter. The for loop iteration are then getting distributed among the worker threads. Jitter is really nicely designed, only a few lines of code were needed to achieve this.

Change Distribution
Between every frame you can see ca. 0,6 ms wide gaps. This is the time needed to distributed the changes that were made by the systems. For instance, the physics system may have changed the positions of a object. This position is needed by the graphics system to draw it at the according position in the next frame.

To finish, heres a video where you can see the scheduling visualizer in action. The scene contained 2400 asteroids and 60 space ships with a rudimentary AI (only roaming). Without multithreading we've got 40 frames per second, with multithreading 60 frames per second. Without video capturing the speedup is even better (50 single vs. 90 multithreaded). Nevertheless, with only two systems doing hard work (physics and graphics) the four CPU cores can't be utilized fully. This may change when I implement more stuff in the AI system. Getting more data parallelism out of the physics system would also improve the utilization, but I don't think that can be easily achieved and its not worth the hassle.
Used hardware: Core Quad Q9300 2.5 GHz, Radeon HD 6700


XNA multithreading from Jan Tepelmann on Vimeo.

Monday, March 19, 2012

Parallel Game Architecture, Part 1

In this post I will describe the core idea behind the game architecture I chose. If you are more interested in the implementation have a look at the next post.

The Motivation
Last year I came across the Intel Smoke Demo while researching for the Multicore seminar I attended. Gladly I was allowed to pick my own topic, so I chose "Multicore Programming in Computer Games".
After the seminar I was really motivated to use the things I've learned to build my own game. So I used the Intel Smoke Demo as a template for my own game architecture. The architecture in the demo supports n-way threading, which is really a good thing if you consider the graph below.
Source: Steam Hardware Survey
You can see that there is a big diversity in number of CPU cores per gaming PC (Xbox has three cores, PS3 has one main core + 6 SPU cores). In the future there will be even more diversity when six and eight core CPUs are getting more common.
In the early days of multicore game programming many games utilized multicore CPUs by splitting their game loop into a render part and game logic part. Both parts were then executed separately on a own thread. QUAKE 4 for example took this approach (see these slides for more infos). On the Xbox 360, the game loop was often split into three parts to utilize all cores (have a look at these Kameo slides for instance).
This approach is called functional decomposition. Works great, but you are specializing for a fixed number of cores. If you have more cores then subsystems, you are wasting CPU power.
A n-way architecture, like the one used in the Intel Smoke Demo, tries to utilize all cores that are available on the machine. This can be achieved by additionally taking advantage of data parallelism. Simply put, this means calculating for loops in parallel (=distributing for loop iterations among threads). For example, if you have 1000 game object and four threads, you can calculate 250 object updates on each thread, ideally taking only 25% of the time (25% only iff each update takes exactly the same time).

The Idea
The core idea behind the Smoke Demo is best illustrated in two pictures.

1. Execution Phase with four threads

In the picture above you can see all the calculation needed for one frame. Each system (AI, Physics, Graphics, Audio, Input) is getting scheduled to one of the four available worker threads. Once a system is scheduled, it can split its work into subtasks to take advantage of data parallelism (AI and Physics are doing this in the picture).
Every system works on its own copy of game data. This is needed to make them independent from each other, so that they can run in parallel. After each frame, the system's game data copy is getting updated by the Change Manager (see picture below).
To make this possible, every relevant game data change has to be queued up in the Change Manager before (symbolized by the arrows in picture one).

2. Change distribution Phase
To sum this all up, here is a short list of the pros and cons for this architecture:

Pros:
  • Performance: Scales to an arbitrary number of CPU cores.
  • Good maintainability: Strict separation of concerns between the system. You can even replace a system completely without affecting the other systems.
Cons:
  • Higher memory consumption: Every system has to have an own copy of needed game data.
  • Overhead: Caused by change collection and distribution
  • Latencies:  All systems work on old data (from the last frame).
In the next post I will describe my implementation of the Smoke architecture.

Monday, March 12, 2012

Hitting a moving target with an accelerating missile

The last few days I spent quite some time solving this problem. My first idea was, that I could just calculate the time the missile needs to travel to the target at maximum acceleration. With this time I can predict the new position of the target and aim toward that point with the missile. It is clear that this is not very accurate, but I thought that it is accurate enough.  Have a look how this worked:


Hitting a moving target with an accelerating missile on Vimeo.

The stupid missile in the first part of the video uses the approach I described. On the moving target it performs well, if the missile has no start velocity. But if you do add a start velocity, it gets really bad. You can see this better on the none moving target. The missile does move in an orbit around its target, because it will always accelerate towards the target.
So how does the smart missile work? Luckily I have a brother who studies physics and he explained me patiently how I can solve this (if you are not interested in the math you can download the code here and use it as a black box in your game or what ever you are programming).

The Math behind it:
In 2D space the target is at position X at the time t (d = distance to missile, v0 = velocity of the target):

For the missile the equation looks like this (w = start velocity missile, a = acceleration missile):

The missile hits the target if both equations return the same position in space x:

To get a solution that fulfills the equation we have to use the quadratic formula (replaced vectors with the vector components - index i can be set to x or y) :


The missile and target are at the same position if :



(k = length of acceleration vector)
Now we want to know which value of the angle alpha fulfills the equation. Sadly, we cant easily calculate the angle, but have to use an approximation method. I used Newton's method because I knew it already from school. The problem is, that we have four equations that have to be checked (plus-plus, plus-minus, minus-plus, minus-minus). So we have to use newton four times and look which of the four solution is the best. For newtons method you also need to derive the above equation (used wolfram alpha for this). Writing the derivate down in code was a pain. Newtons method on the other side is very easy to implement:

delegate float Function(Vector3 velocityDiff, Vector3 distance, 
                        float acceleration, float phi);

static float SolveWithNewton(Function function, Function derivative, 
                             Vector3 v, Vector3 d, float a, 
                             float startPhi, int numIterations)
{
    float phi = startPhi;
    for (int iteration = 0; iteration < numIterations; iteration++)
    {
        // precalculate values that are needed in the function and the derivative, 
        // so that we calculate them only once
        cosPhi = (float)Math.Cos(phi);
        sinPhi = (float)Math.Sin(phi);
        tanPhi = (float)Math.Tan(phi);
        vxQuad = v.X * v.X;
        vyQuad = v.Z * v.Z;
        insideRootVx = vxQuad + 2 * a * d.X * cosPhi;
        insideRootVy = vyQuad + 2 * a * d.Z * sinPhi;
        if (insideRootVx > 0) rootVx = (float)Math.Sqrt(insideRootVx);
        if (insideRootVy > 0) rootVy = (float)Math.Sqrt(insideRootVy);

        float functionValue = function(v, d, a, phi);
        float gradient = derivative(v, d, a, phi);
        phi = phi - functionValue / gradient;

        if (float.IsNaN(phi) || Math.Abs(functionValue) < Epsilon) break;
    }
    return phi;
}


Performance:
Right now I calculate the angle for the missile every frame and use maximal eight iterations in Newton's method. Worst case this means 24 iterations, because we have to test all four possible sign combinations. In every iteration we have at least one Sin, Cos, Tan, Sqrt and a few multiplication/division calculations. On my machine (Quad Core Q9300) this works fine (tested with circa 40 missiles). But I cant recommend this, especially on the Xbox with XNA where the floating point performance isn't that great (but I haven't tested it). The good news is, that you don't have to calculate the angle every frame. Its enough to calculate it every time the velocity of the target changes.

Tuesday, March 6, 2012

Developing my own awesome game

Programming my own game was long time ago THE reason to learn programming. I started learning programming with Visual Basic 6 and Direct Draw 7. Then a couple of years later I tried Python and gave Panda3D a chance (really liked this one). But I never accomplished something that i could really show to other people.
This will change this time :-) Right now I am programming my own 2D multiplayer space shooter in XNA. 2D means in this case, that all action takes place on a 2D plane, but I use a 3D engine. In this blog I want to show you my progress and share the things I have learned working on my little project.
If you are interested in the current state of my game you can have a look at my pivotal tracker project.
Here are the tools, libraries etc. that I currently use:
  • SunBurn 2.0 - As far as I know the best XNA graphics engine.
  • Mercury - Really awesome particle engine. I can really recommend it. Also looked at DPSD but Mercury has an editor, which is a big plus.
  • ParallelTasks Threading Library - TPL for XNA. The Xbox has three cpu cores. We have to use all of them! With this library this is much easier in comparison to thread programming.
  • Jitter Physics - Nice multi threaded open source physics engine.
  • Indiefreaks Game Profiler - Nice tool to see how much time is spend and how much memory is getting allocated in each function.
Finally a screen shot how the game currently looks like. In motion it looks much better because of my cool parallax scrolling stars background (uses a vertex shader). But more on this in a later post. Next time I will describe the architecture of my game.