## Decoding State

In a previous article covering the concept of using decision trees for Robocode I posed the question:

Finally, how do we best encode the information regarding the state of the game itself? For example, does it make sense to store the distance between the two tanks even though the same information could be determined via enough splits? (I say yes, if just to speed things up) If so, what other attributes would be relevant?

I’m still not certain of the answer, but I’ve got some additional insight to further clarify the problem.

So we know that given a table of the state of the game encoded in any form (ie: x and y positions of both bots, hit points, and size of the battlefield) we could, given enough data points, build out a tree that very well approximates the correct firing angles for every situation. This make encoding something like “distance to target” technically redundant. However, the distance to the target seems to be a very relevant attribute to capture- something clearly not easily captured via the x and y positions of two bots. I’m not sure how it hit me, but that uncapturable essence of distance is unique strictly because it describes the *relationship* between the two variables.

That said, we now run into a problem- there are an infinite number of relationships between any two variables. Distance, obviously, is the relationship defined by (x -a) ** 2 + (y – b) ** 2 = c. But countless other relationships exist- for example any of the lines defined by any polynomials (ie: a * x ** 2 + b * x + c = y) or another example– any of the lines defined by exponential relations (ie: y = a * n ** x + b). These relationships suddenly become impossible to handle and were the main reason I decided to investigate other options when I realized multiple regressions suffer the same fault. (Remember, that what I’ve just described is the relationship between only two dimensions. With each added dimension, the simple relationships grow at a rate of O(n!).)

Fortunately, it may very well be the case that it’s not TOO important that we capture all of them. Take for example the concept of curve fitting. While for any n points, a polynomial of degree (n – 1) is guaranteed to exist, such polynomials are often so ridiculous in their fit that they are not useful for analyzing data. Thus, many statisticians will deliberately use only simple models to analyze data. It’s more or less the same with us- if we capture only relationships we expect to be useful, we’ll probably be good enough for most cases (especially given the size of the tanks and the relatively small size of the battlefield).

All told, while I’m still interested in figuring out definitively what is the right way to handle decoding the data, I think for now just capturing a few relationships that describe circles, straight lines, and hyperbolas, a very reasonable tank could be constructed.

## Comment