## Posts Tagged ‘**game theory**’

## Games as Economies

In a previous post, I wrote about how it’s impossible to price an object in a game according to a systematic formula, barring games of limited complexity, and objects that cover the same span (that is, there are just different multiples of the same vector). Instead I claimed at the time it was just arbitrary- that designers could set the prices to whatever they liked and that the game would always be fair so long as each player had equal opportunity. Thus, the designer’s job is really just to play with the prices until it produces the interplay he is looking for.

In another post, I wrote about how research in RTS games, and why spending resources in order to have the option to train new units can payoff. While upgrades obviously boost the strength of an army, research unlocks news units for a price, and that price is only worth paying if one can expect to use the unlocked units in such a way, that the utility of their use exceeds the initial cost of research. Thus, the price of research is also arbitrary.

Having since studied microeconomics, I’d like to revisit these topics.

—

Economists call the phenomenon I just described the marginal rate of substitution or MRS for short. Formally, the MRS is defined as “the rate at which a person will give up good y for good x while remaining indifferent”. In other words, it’s the price at which you are willing to buy something using something else (ie, how much you are willing to shell out for a can of soda). What’s interesting about the MRS is that it changes- the more a person has of good x, the more one is willing to trade of good x for good y. Said another way, because billionaires have so much money, they don’t mind paying $5.00 for a hot dog.

This is a far more intuitive way of looking at things than trying to predict prices from the attributes of the game objects. In short, all one needs to understand now is that players will buy an object when the utility of a object exceeds its cost.

—

While the main point I wanted to convey has been made, I want to just put down some related ideas that don’t exactly deserve a post of their own but that I think are worth sharing.

-If you are familiar with Dominion, you may know MRS as “the Silver test”. (If you are not familiar with Dominion, all you need to know is that players regularly face the choice of buying cards with special effects or treasures, such as a Silver, which increase income.) That is, when making a non-trivial purchasing decision, one always has to consider if the object at hand is in fact better than a Silver. What I find most interesting about the Silver test is how many players completely fail to pick up this rule, instead being regularly mislead by the incorrect assumption that “things that cost more must be better”. Certainly changed a few paradigms of mine after noticing what was going on.

-Knowing when to buy what is the backbone of many games. In Dominion, playing cards is fairly trivial- but the decisions of which cards to buy each turn is often rather complex, and in the majority of cases, determines who wins. Likewise, in StarCraft, micro-ing units is fairly simple- but, again, the decisions of which units to buy and which tech to research is far more complex, and is far more important than any tactical feat. In short, your economics textbook may be more valuable than The Art of War.

## The Princess and Monster game

The current assignment in my programming class (CIS 121) is to build two AIs for a variation on the classic Pursuit-Evasion game, the Princess and the Monster. I think it’s an extremely interesting problem and will present some thoughts on the game here, and would greatly appreciate any feedback or response to further guide me given how little I know about the subject.

For the uninitatiated, the Princess and Monster game goes like this: Take a connected graph of any size. Assign one node to be the location of the Monster and another to be the location of the Princess. Each turn, the Princess and Monster may move to any nodes adjacent to their respective locations. If the Monster and Princess move onto the same square, the Monster wins. Else, if the Princess can avoid the Monster for a certain amount of turns, the Princess wins.

*The game in practice varies a lot. For example, in some variants, the game is played on a continuous and bounded plane rather than a graph. Other interesting variations include revealing the starting location of each player to all of the player at the start of the game, enabling the Monster to capture the Princess if the Monster can simply get within a certain range, allowing the two player to “hear” each other if they are within a certain range, and changing the permitted speeds of each of the players.*

In particular, I’ll examine the case where the Monster wins if it moves onto a node adjacent to the Princess’ location, and where both players are notified of each other’s location if they are within three edges of each other. Furthermore, for simplicity, I’ll assume the game is played on a 10×10 grid, rather than a random undirected graph.

So some quick observations.

**The Really Simple Case**

If we simplify the situation so that the game is the pure game, with not knowledge of starting positions, no ability to hear, and no capture range, we can observe a few things. First off, so long as the monster’s strategy is non-deterministic, there is no way that the princess can guarantee she can evade capture. However, should the monster and princess choose a random-walk strategy, then given a 100 node graph, the probability that they meet is 1/100 * 1/100. So for now on, let’s just assume the princess adopts such a strategy.

Now we can make a far more interesting observation. With the princesses’ strategy in mind, we note that every time we explore a node and move on, the probability that the princess is at the node that we just explored is equal to the probability that the princess is at once of the adjacent nodes multiplied by the chance that the princess would move there. Furthermore, because the princess cannot reach any adjacent nodes from the location that we were just at, she in fact is less likely to be nearby. (Which is kind of weird, but it makes sense if you imagine us putting a dummy princess on each node, and each turn, splitting the dummies into multiple dummies, sending some out and keeping some around, so that, without the monster, each probability would remain 1.)

Given all that, the intuitive solution is to continuously move to the most probable area nearby. The problem with this approach is that it very easily leads to deterministic searches that cover only a small part of the graph. (See the video below.)

This is not say the approach is incorrect. It’s just that we aren’t looking far enough ahead.

The code used in the video above works like this. First we go through every node and score it equal to 1, representing it’s probability of 1%. Then we place the monster down, remove the 1% from where he’s standing, and distribute it to all of the other nodes evenly. Lastly, we simulate what would happen if the princess moved according to an arbtirary Markov process, which in this case is simple a 20% chance to move to any other neighbor. (For those unfamiliar, as I initially was, this preserves the chance that all nodes will equal 1% given the absence of the monster since the princess will leave the corner only 2*20% of the time when she is on a corner, and she will only enter a corner 20% of the time if she is on either of the adjacent nodes.) Specifically, this is implemented with two hashmaps. One keeps track of the current probability, and another is a copy of the first. Then we go through each node and tell it to distribute 20% of it’s current probability to each of neighbors in the copied hashmap (which is needed so as to not corrupt the flow). Finally we replace the hashmap we are using with the copy. (Note here that 20% is arbitrary. Ideally, we would use some machine learning techniques to figure out what exact matrix the princess is using. Note also that if the chance of the princess moving is 0, then the best strategy becomes to create a Eulerian path that explores each node exactly once.)

So basically, as mentioned on the Wikipedia page for Princess and Monster games, it becomes evident that simply standing still most of the time and only occasionally find a new spot to chill is a pretty viable strategy for the monster. This will kind of create a drain, with the monster being the gutter, simplifying the search from one that is completely blind to only that is only half blind. Unfortunately, I’m not at all on the specifics on how long to wait, but it seems to make some intuitive sense. (Note here that the more the princess moves, the better this strategy of waiting is. )

**Add in Capture Range**

And it’s no different obviously. The graph basically becomes relatively smaller. There are sometimes a few problem handling corners nicely, but largely it’s inconsequential.

**Add in both players knowing the start location of the other or add in hearing range**

These are both extremely interesting changes and deserve their own posts, so that I’ll give them. (But probably not until next Tuesday when we turn in our AIs. Check back then!)

## Dynamic Segmentation

In order to properly parse the input data gathered by a bot in Robocode, it’s insufficient to statically declare a number of bins beforehand and simply insert the data into the structure.

The main problem with this approach is that the data is getting subdivided way too many times to be useful. For example, if wanted to increase accuracy, we would desire to eliminate any confounding information, and therefore segment on every possible independent variable and more- the reason being that if we can assume bots are imperfect, than our goal is to find actually find flaws. This means we want to segment on a large number of variables, for example distance to target, time since start of battle, bullet power, target velocity, target acceleration, target angular acceleration, target heading, target x-distance from center, target y-distance from center, target hit points, shoot velocity, shooter acceleration, shooter angular acceleration ,shooter heading, shooter x-distance from center, shooter y-distance from center, shooter hit points, and possibly more. That’s a lot, and if we segment on all of that, it’s going to take a very long time to acquire meaningful data that we can act on.

Another problem is that by dividing information, information loses value near the divisions. This can be remedied to an extent through some information blurring, but generally it’s not a trivial problem to solve. Particularly, static divisions are particularly worrisome since there is no good way to know where to divide the data.

Fortunately, there is a solution- dynamic segmentation. The idea is to, rather than statically declaring the segmentation divisions, simply declare along which axes the data can be segmented, and then build a tree structure, which splits data when it makes sense to. This is not as simple as it seems, but to illustrate the general idea, if our enemy is moving on a clockwise spiral relative to our position moving randomly towards our away for sizable amounts of time, then while our first few shoots will all be clumped together in one node, eventually the bot should realize that by splitting the data up into ‘target moving closer’ and ‘target moving away’ that it’s shots will become more accurate. This is very cool, because the bot will generally have some pretty good guesses most of the time, and only improve with more data. Furthermore, it reduces the need to worry about any kind of covariance, since the bot will automatically detect it, being able to split anywhere- for example, rather than tracking distance to corners, the bot will eventually learn where the corners are (provided the enemy acts unusually in them) by first segmenting on target x-distance from center and then segmenting on target-y distance from center.

The problem now is to determine when is it appropriate to split. By virtue of the problem of determining whether or not a die is load, we can figure this out. Immediately, it’s apparent that small amounts of data won’t suffice. Additionally, it’s fairly clear that variance has something to do with it, since if we’re consistently hitting our target it would be rather foolish to split the data. The question is how much variance do we allow.

To that, unfortunately I’m not exactly sure. I think a chi-squared test is the solution, although from my research it seems it can get pretty involved. (Even determining whether or not a coin is biased is fairly complicated.) For now though, I just want to throw out the idea of utilizing a tree structure.

## Wave Surfing Explained

I’ve recently reignited my interest in Robocode but this time armed with everything I’ve learned since September.

For the uninitiated, Robocode is an educational programming game, originally written with Java in mind, where players code up algorithms to guide virtual tanks in destroying each other. It’s a unique challenge, combining fields like programming, statistics, linear algebra, and game theory all into one.

Originally intended for students, the game was quickly picked up by experienced programmers and is now rather competitive. Many players use sophisticated algorithms in order to dodge enemy bullets and make sure their own find their mark.

Anyway, let me continue to explain wave surfing- but first I need to explain segmentation.

**Segmentation**

Since in Robocode neither bot can see any of the bullets, developers have to find ways to track their enemy through alternative means. Segmentation is a targeting technique that involves finding which firing angles are most likely to hit the target by dividing the continuous range of firing angles into discrete bins, and finding which bins are most popular. To implement segmentation, one needs to track each bullet with “waves”.

A wave is basically analogous to a ripple in a pond after tossing a stone into the water. We track the bullets until the “wave” hits our opponent. At that point, we can determine what the correct firing angle was at the time of firing, and subsequently increment the value in the bin which contains the correct firing angle. We then proceed to fire at the center of the most popular bin whenever we get the chance.

A primitive approach of segmentation might only segment by firing angle. However more sophisticated approaches first segment by distance, and then angle. (And even more sophisticated approaches segment by even more dimensions.) However, these improvements come at a cost of slowing down the rate of learning, but the accuracy gain is typically well worth the price.

Optimal segmentation (I think anyway) ought to reduce the number of bins down to the minimum needed to hit every possible position of the opponent. For example, an opponent at point blank range would only require one bin, since all firing angles within the escape angle would be guaranteed to hit the target. As the distance increases however, more and more bins become necessary (I believe at an exponential rate). By reducing the number of bins in this fashion, we increase learning speed, and reduce memory costs.

**Wave Surfing**

Wave Surfing is an evasive technique first developed and implemented by ABC in mid-2004. To put it succinctly, wave surfing is anti-segmentation. Every time the enemy fires a shot, we create a wave, and see whether or he not the bullet hit when the wave reaches our bot, and subsequently increment our own bin. In this way, we know what our opponent believes to be the most popular firing angle, and therefore make sure to avoid it, producing a near uniform distribution at all ranges.

**Why this is optimal**

Refer back to my earlier post on the game theoretical solution to Rock-Paper-Scissors. Basically, to do anything else is to tip our opponent off to a flaw in our strategy. If you know for example that your opponent will play Rock 2/3 of the time and Paper 1/3 of the time, your optimal strategy becomes to keep playing Paper.

**Implementation Issues**

A literal implementation of the above is still susceptible to intelligent algorithms. For example, at the start of a game or Rock-Paper-Scissors, if your opponent first plays Scissors, it would be illogical to assume that because your opponent has so far only played Scissors, it would be logical to assume he would do so again. This illustrates the problem of determining when the opponent is playing sub-optimally with an unequal probability distribution. Thankfully, statistics provides an answer. Using statistical methods to determine statistically significant variance will lead to a solution- typically, with low amount of data, statisticians say nothing can be concluded, and when more data, more confidence can be placed on inferences. A simple implementation of confidence intervals ought to be sufficient.

## The Nature of Imbalance

Before I can explain the combat effectiveness of a heterogeneous group, I need make sure the reader understands a few things first.

First, recall that in homogeneous groups, every additional unit increases the strength of the group exponentially.

Recall also that a rational player will target enemy units according to their Offense to Defense ratio, the highest of which will be targeted first.

With that in mind, I can explain.

Let’s consider the simple case of a group of two units, one unit with 1 Offense (O) and 1 Defense (D), and the other with sqrt(3) O and sqrt(3) D. Since both units in the group have the same Offense to Defense ratio, the order in which our enemy targets our units is unimportant and therefore we can calculate the strength of our group against the Universal Unit to get a K-value for our group. That said, it comes out to be (1 + sqrt(3))*1+sqrt(3)*sqrt(3)=4+sqrt(3) ~= 5.73K.

Notice though, that the unit with sqrt(3) Offense and sqrt(3) Defense has a K-value of 3, which means it worth exactly twice as much the other unit. That is to say, the value of our group is $1 + $2 = $3.

However, notice that $3 worth of 1O, 1D units produces an effective strength of 1+2+3=6K. This suggests that my previous suggestion of how to price units is not adequate.

Things break down even more when the Offense to Defense ratios are different. If a player rationally targets the unit with the highest Offense to Defense ratio, he will make heterogeneous groups even less effective. Consider for example, another group worth $3, except this time composed of a unit with 1 Offense and 1 Defense and a unit with 1 Offense a 3 Defense. In this case, the 1 Offense, 1 Defense unit is more threatening, and so with that unit targeted first the group’s strength comes out to a measly (1+1)*1+1*3=4K. Should the player have done otherwise, he would improved the group’s strength to an impressive (1+1)*3+1*1=7K.

This has something interesting implications. What we are basically saying here is that the effective strength for a two unit group is given by,

(O1+O2)D1+O2D2 = D1O1 + D1O2 + D2O2 or (O1+O2)D2+O1D1 = D1O1 + D2O1 + D2O2

What is important here in the middle factor, the D1O2. Typically, if Unit 1 and Unit 2 were the same price we would want to D1=O2, since if D1!=O2 then D2!=O1, and since our opponent gets to choose which unit he will target first, he will always pick the unit that makes the middle term smallest. Thus we try to make D1O2=D2O1, or equivalently D1/O1=D2/O2. Therefore, homogeneous groups are typically the most effective.

However, it is sometimes possible to force our enemy to attack our units in the way that *we* want *him* to. In this case, we want very heterogeneous units. In fact, if we could have it, we would have our units as different as possible- give one with infinite Offense and infinitesimal Defense and the other infinite Defense and infinitesimal Offense, and we get a middle term that equals to infinity^2!

As we can see, prices cannot possibly be determined by looking at stats alone. In fact, prices for units are completely arbitrary- fine-tuned to make the game play as the designer intended. Likewise, the strength of units cannot be measured, since units rarely fight alone or in purely homogeneous groups. Only so long as prices keep the decisions interesting then they are completely left to the designer’s discretion.

## Indifference Curves

An **indifference curve** is a line that shows combinations of goods among which a consumer is indifferent. Since consumers always prefer more over less, a curve shifted to the top and right is preferable to a curve shifted left and down, but any point on any particular curve is equally preferable as any other point on that curve. (So, in this picture here, X is just as good as A which is just as good as B.)

The **marginal rate of substitution (MRS)** is the rate at which a person will give up good y (measured on the y axis) for good x (measured on the x axis) while remaining indifferent. The magnitude of the slope of an indifference curve measures the marginal rate of substitution. That is, if the indifference curve is steep, then the marginal rate of substitution is high and a person would be willing to give up a very large amount of y to obtain very little of x. If the curve is flat, the marginal rate of substitution is low. The person is willing to give up very little of y to obtain large quantities of x. Generally, there is a diminishing marginal rate of substitution, which means that people becomes more and more willing to give up large amounts of x for y when they have very little y.

For ordinary goods, we see curves that look like the one above. But sometimes the value of a good is influenced by another. For example, perfect substitutes will produce straight diagonal lines with slope -1: a pen from Walmart is equally preferable to the same pen from anywhere else. On the other hand, perfect complements on the other have L-shaped curves: a left shoe is worth nothing without the right, and two left shoes are worth nothing without two right shoes.

## Predicting Consumer Choice

The best affordable choice is 1)on the budget line and 2) on the highest attainable indifference curve. This point is where marginal rate of substitution equals relative price. Changes in the price or utility of a good or a person’s income change the best affordable point.

## Command 0.04

This update is very minor, but it’s the start of something bigger, of which I’m having trouble with. Basically, I’ve made it so that the AI tries to send the least amount of units to necessary to capture planets. Unfortunately, it only works with terminal nodes:

The new AI works by allowing each planet to store the number of armies that it needs to conduct the AI’s plan. It sends that value to it’s parent node, and that goes all the way up, so that all the nodes know how much they need to conduct the AI’s plan.

The image at the right is a little hard to read, but basically, node A tells B that it needs 2 armies, then B tells E that it needs 10, 2 to capture A and 8 to capture itself. The process, repeats upward.

This works for terminal nodes, so why doesn’t it work for B? The problem, I’m almost certain, is that I’m using a depth first search, and A is never being notified that B is receiving enough forces to capture A. Now that would be very easy to fix, except that if I did that, then E would never send to B, since it would consider A and B both captured.

Anyway, I need some help.