## Posts Tagged ‘**artificial intelligence**’

## The Princess and Monster game

The current assignment in my programming class (CIS 121) is to build two AIs for a variation on the classic Pursuit-Evasion game, the Princess and the Monster. I think it’s an extremely interesting problem and will present some thoughts on the game here, and would greatly appreciate any feedback or response to further guide me given how little I know about the subject.

For the uninitatiated, the Princess and Monster game goes like this: Take a connected graph of any size. Assign one node to be the location of the Monster and another to be the location of the Princess. Each turn, the Princess and Monster may move to any nodes adjacent to their respective locations. If the Monster and Princess move onto the same square, the Monster wins. Else, if the Princess can avoid the Monster for a certain amount of turns, the Princess wins.

*The game in practice varies a lot. For example, in some variants, the game is played on a continuous and bounded plane rather than a graph. Other interesting variations include revealing the starting location of each player to all of the player at the start of the game, enabling the Monster to capture the Princess if the Monster can simply get within a certain range, allowing the two player to “hear” each other if they are within a certain range, and changing the permitted speeds of each of the players.*

In particular, I’ll examine the case where the Monster wins if it moves onto a node adjacent to the Princess’ location, and where both players are notified of each other’s location if they are within three edges of each other. Furthermore, for simplicity, I’ll assume the game is played on a 10×10 grid, rather than a random undirected graph.

So some quick observations.

**The Really Simple Case**

If we simplify the situation so that the game is the pure game, with not knowledge of starting positions, no ability to hear, and no capture range, we can observe a few things. First off, so long as the monster’s strategy is non-deterministic, there is no way that the princess can guarantee she can evade capture. However, should the monster and princess choose a random-walk strategy, then given a 100 node graph, the probability that they meet is 1/100 * 1/100. So for now on, let’s just assume the princess adopts such a strategy.

Now we can make a far more interesting observation. With the princesses’ strategy in mind, we note that every time we explore a node and move on, the probability that the princess is at the node that we just explored is equal to the probability that the princess is at once of the adjacent nodes multiplied by the chance that the princess would move there. Furthermore, because the princess cannot reach any adjacent nodes from the location that we were just at, she in fact is less likely to be nearby. (Which is kind of weird, but it makes sense if you imagine us putting a dummy princess on each node, and each turn, splitting the dummies into multiple dummies, sending some out and keeping some around, so that, without the monster, each probability would remain 1.)

Given all that, the intuitive solution is to continuously move to the most probable area nearby. The problem with this approach is that it very easily leads to deterministic searches that cover only a small part of the graph. (See the video below.)

This is not say the approach is incorrect. It’s just that we aren’t looking far enough ahead.

The code used in the video above works like this. First we go through every node and score it equal to 1, representing it’s probability of 1%. Then we place the monster down, remove the 1% from where he’s standing, and distribute it to all of the other nodes evenly. Lastly, we simulate what would happen if the princess moved according to an arbtirary Markov process, which in this case is simple a 20% chance to move to any other neighbor. (For those unfamiliar, as I initially was, this preserves the chance that all nodes will equal 1% given the absence of the monster since the princess will leave the corner only 2*20% of the time when she is on a corner, and she will only enter a corner 20% of the time if she is on either of the adjacent nodes.) Specifically, this is implemented with two hashmaps. One keeps track of the current probability, and another is a copy of the first. Then we go through each node and tell it to distribute 20% of it’s current probability to each of neighbors in the copied hashmap (which is needed so as to not corrupt the flow). Finally we replace the hashmap we are using with the copy. (Note here that 20% is arbitrary. Ideally, we would use some machine learning techniques to figure out what exact matrix the princess is using. Note also that if the chance of the princess moving is 0, then the best strategy becomes to create a Eulerian path that explores each node exactly once.)

So basically, as mentioned on the Wikipedia page for Princess and Monster games, it becomes evident that simply standing still most of the time and only occasionally find a new spot to chill is a pretty viable strategy for the monster. This will kind of create a drain, with the monster being the gutter, simplifying the search from one that is completely blind to only that is only half blind. Unfortunately, I’m not at all on the specifics on how long to wait, but it seems to make some intuitive sense. (Note here that the more the princess moves, the better this strategy of waiting is. )

**Add in Capture Range**

And it’s no different obviously. The graph basically becomes relatively smaller. There are sometimes a few problem handling corners nicely, but largely it’s inconsequential.

**Add in both players knowing the start location of the other or add in hearing range**

These are both extremely interesting changes and deserve their own posts, so that I’ll give them. (But probably not until next Tuesday when we turn in our AIs. Check back then!)

## Dynamic Segmentation

In order to properly parse the input data gathered by a bot in Robocode, it’s insufficient to statically declare a number of bins beforehand and simply insert the data into the structure.

The main problem with this approach is that the data is getting subdivided way too many times to be useful. For example, if wanted to increase accuracy, we would desire to eliminate any confounding information, and therefore segment on every possible independent variable and more- the reason being that if we can assume bots are imperfect, than our goal is to find actually find flaws. This means we want to segment on a large number of variables, for example distance to target, time since start of battle, bullet power, target velocity, target acceleration, target angular acceleration, target heading, target x-distance from center, target y-distance from center, target hit points, shoot velocity, shooter acceleration, shooter angular acceleration ,shooter heading, shooter x-distance from center, shooter y-distance from center, shooter hit points, and possibly more. That’s a lot, and if we segment on all of that, it’s going to take a very long time to acquire meaningful data that we can act on.

Another problem is that by dividing information, information loses value near the divisions. This can be remedied to an extent through some information blurring, but generally it’s not a trivial problem to solve. Particularly, static divisions are particularly worrisome since there is no good way to know where to divide the data.

Fortunately, there is a solution- dynamic segmentation. The idea is to, rather than statically declaring the segmentation divisions, simply declare along which axes the data can be segmented, and then build a tree structure, which splits data when it makes sense to. This is not as simple as it seems, but to illustrate the general idea, if our enemy is moving on a clockwise spiral relative to our position moving randomly towards our away for sizable amounts of time, then while our first few shoots will all be clumped together in one node, eventually the bot should realize that by splitting the data up into ‘target moving closer’ and ‘target moving away’ that it’s shots will become more accurate. This is very cool, because the bot will generally have some pretty good guesses most of the time, and only improve with more data. Furthermore, it reduces the need to worry about any kind of covariance, since the bot will automatically detect it, being able to split anywhere- for example, rather than tracking distance to corners, the bot will eventually learn where the corners are (provided the enemy acts unusually in them) by first segmenting on target x-distance from center and then segmenting on target-y distance from center.

The problem now is to determine when is it appropriate to split. By virtue of the problem of determining whether or not a die is load, we can figure this out. Immediately, it’s apparent that small amounts of data won’t suffice. Additionally, it’s fairly clear that variance has something to do with it, since if we’re consistently hitting our target it would be rather foolish to split the data. The question is how much variance do we allow.

To that, unfortunately I’m not exactly sure. I think a chi-squared test is the solution, although from my research it seems it can get pretty involved. (Even determining whether or not a coin is biased is fairly complicated.) For now though, I just want to throw out the idea of utilizing a tree structure.

## Some Segmentation Ideas

Continuing with my earlier post on Robocode, I’d like to describe some thoughts on segmentation strategies.

So the basic idea is, we want to minimize our bins at all times. In effect, this means calculating the Maximum Escape Angle, calculating the arc formed by the intersection of the target and the circle of radius distance to target with center on our tank (that is, the orbital path our opponent could take), and then taking the ceiling of one half of the ratio (one half because a single shot will hit our opponent just as well as if hits the leftmost part of our opponent as the rightmost side).

Interestingly, we can adjust the Maximum Escape by adjusting the speed of our bullet. Given that, we can actually reduce or expand the number of bins as we desire- a useful ability when trying to maximize probability (since things are discrete, a change from N to N-1 bins can be fairly significant assuming the probability to hit is a uniform distribution at both ranges).

Furthermore, we can supplement our segmentation data by adjusting the range at which we fight, spiraling toward and away our opponent as necessary, in order to keep our targets at optimal distances.

## Wave Surfing Explained

I’ve recently reignited my interest in Robocode but this time armed with everything I’ve learned since September.

For the uninitiated, Robocode is an educational programming game, originally written with Java in mind, where players code up algorithms to guide virtual tanks in destroying each other. It’s a unique challenge, combining fields like programming, statistics, linear algebra, and game theory all into one.

Originally intended for students, the game was quickly picked up by experienced programmers and is now rather competitive. Many players use sophisticated algorithms in order to dodge enemy bullets and make sure their own find their mark.

Anyway, let me continue to explain wave surfing- but first I need to explain segmentation.

**Segmentation**

Since in Robocode neither bot can see any of the bullets, developers have to find ways to track their enemy through alternative means. Segmentation is a targeting technique that involves finding which firing angles are most likely to hit the target by dividing the continuous range of firing angles into discrete bins, and finding which bins are most popular. To implement segmentation, one needs to track each bullet with “waves”.

A wave is basically analogous to a ripple in a pond after tossing a stone into the water. We track the bullets until the “wave” hits our opponent. At that point, we can determine what the correct firing angle was at the time of firing, and subsequently increment the value in the bin which contains the correct firing angle. We then proceed to fire at the center of the most popular bin whenever we get the chance.

A primitive approach of segmentation might only segment by firing angle. However more sophisticated approaches first segment by distance, and then angle. (And even more sophisticated approaches segment by even more dimensions.) However, these improvements come at a cost of slowing down the rate of learning, but the accuracy gain is typically well worth the price.

Optimal segmentation (I think anyway) ought to reduce the number of bins down to the minimum needed to hit every possible position of the opponent. For example, an opponent at point blank range would only require one bin, since all firing angles within the escape angle would be guaranteed to hit the target. As the distance increases however, more and more bins become necessary (I believe at an exponential rate). By reducing the number of bins in this fashion, we increase learning speed, and reduce memory costs.

**Wave Surfing**

Wave Surfing is an evasive technique first developed and implemented by ABC in mid-2004. To put it succinctly, wave surfing is anti-segmentation. Every time the enemy fires a shot, we create a wave, and see whether or he not the bullet hit when the wave reaches our bot, and subsequently increment our own bin. In this way, we know what our opponent believes to be the most popular firing angle, and therefore make sure to avoid it, producing a near uniform distribution at all ranges.

**Why this is optimal**

Refer back to my earlier post on the game theoretical solution to Rock-Paper-Scissors. Basically, to do anything else is to tip our opponent off to a flaw in our strategy. If you know for example that your opponent will play Rock 2/3 of the time and Paper 1/3 of the time, your optimal strategy becomes to keep playing Paper.

**Implementation Issues**

A literal implementation of the above is still susceptible to intelligent algorithms. For example, at the start of a game or Rock-Paper-Scissors, if your opponent first plays Scissors, it would be illogical to assume that because your opponent has so far only played Scissors, it would be logical to assume he would do so again. This illustrates the problem of determining when the opponent is playing sub-optimally with an unequal probability distribution. Thankfully, statistics provides an answer. Using statistical methods to determine statistically significant variance will lead to a solution- typically, with low amount of data, statisticians say nothing can be concluded, and when more data, more confidence can be placed on inferences. A simple implementation of confidence intervals ought to be sufficient.

## Potential Fields

So while surfing for a solution to my own AI problems, I found an article on potential fields. I had used these before in trying to develop an AI for Naval Commander, and achieved limited success, but I couldn’t figure it out well enough and so I ditched the whole thing.

Anyway, a **potential field** is kind of like a magnetic field. Basically, the AI places charges around the map, positive charges near high value targets, and negative ones around dangerous areas and impassable terrain. Allied units use the field by testing a few points around them, figure out where the most potential is, and moving toward the location. By strategically placing the charges, the AI can guide armies in a very dynamic and simple way.

So here our potential field is represented by the lightness of the square, with light squares being more attractive. The rocks, being impassable, and the white enemies, being dangerous, emit negative potential, coloring the nearby squares dark, while the goal areas emits positive potential, lighting the map up. Together, these fields provide a way for the green unit to get to its destination all without any kind of path finding algorithm.

The article goes on to explain a few useful tricks, such as placing positive charges on top of enemy units to attract allied units to them, and then placing a weaker negative charge on top of them in order to get our units to attack from a certain range. Anyway, I think there is a lot of potential with this idea. I highly recommend you check the article out and you can count on me investigating the idea in the future.

(A thunderstorm won’t let me embed the link. Check it out here for now: http://aigamedev.com/open/tutorials/potential-fields/)

## Command 0.04

This update is very minor, but it’s the start of something bigger, of which I’m having trouble with. Basically, I’ve made it so that the AI tries to send the least amount of units to necessary to capture planets. Unfortunately, it only works with terminal nodes:

The new AI works by allowing each planet to store the number of armies that it needs to conduct the AI’s plan. It sends that value to it’s parent node, and that goes all the way up, so that all the nodes know how much they need to conduct the AI’s plan.

The image at the right is a little hard to read, but basically, node A tells B that it needs 2 armies, then B tells E that it needs 10, 2 to capture A and 8 to capture itself. The process, repeats upward.

This works for terminal nodes, so why doesn’t it work for B? The problem, I’m almost certain, is that I’m using a depth first search, and A is never being notified that B is receiving enough forces to capture A. Now that would be very easy to fix, except that if I did that, then E would never send to B, since it would consider A and B both captured.

Anyway, I need some help.

## Command 0.03

Building off of 0.02, I made a few changes to the algorithm. In the old algorithm, the search would stop whenever it hit an enemy node. However, that would prevent it from seeing anything beyond the front line, which could potentially be a problem. If, for example, our base connects to a 10+1 node and a 25+1 node, both the same distance away, our base should probably attack the 10+1 node. However, if behind the 25+1 node is a 0+100 node, then we may wish to rethink our initial decision.

Additionally, I rewrote parts of the code here and there to make things a little more organized for myself. With that, I’ve created an AI for Red, to showoff in the videos anyhow, but for the player I’ve implemented a new system for attacking. Where previously, a player’s click would send all immediately available units to the targeted node, the new system is set up so that it not only does that, but continues to stream afterward. The stream can be disabled by clicking the streaming node twice.

But anyway, here is a video of the new AI in action: