Ceasar's Mind

Follow me: @Ceasar_Bautista

Robocode, Decision Trees (Dynamic Segmentation), and Information Science

with 2 comments

So some time ago, I discovered that when I was talking about what is known to the Robocode community as “Dynamic Segmentation“, what I actually was talking about is known to the machine learning community as a decision tree. To those who are unfamiliar, a decision tree is a tool used to classify information by identifying relevant clusters. That’s still a little vague, so I’ll explain in the context of trying to compress an image.

Let’s imagine that we have an image of black and white pixels that we’re trying to compress. To start, we first translate everything into a bit matrix. Now things get interesting. To compress the image optimally, we need to recursively divide the image along the x or y axis until each bit is in a box of identical numbers. There are several ways to do this, the most obvious being to perform a depth first search of all possible combinations of splits that satisfy the end condition until the tree with the least splits is found. That’s a terrible solution though and fortunately information science offers some help.

Rather than doing a recursive depth first search, a better solution is to calculate the entropy of the original image and for all possible splits calculate the reduction in entropy and then selecting the one that reduces entropy the most. (For those unfamiliar, entropy is more or less the misclassification rate– not exactly so, but they’re closely related.) Doing so repeatedly ought to give us not necessarily the best solution, but one that is good enough.

The resulting tree structure used to compress the image is called a decision tree. The reason being that we now have a series of rules that can tell us what color a pixel ought to be at any x-y coordinate, not so different than a min-heap really. The code for a more general tree is below.

class DecisionTree():
    def __init__(self, table, target_column, fitness_func):
        self.table = table
        self.target_component = target_column
        self.fitness_func = fitness_func
        self.tree = None

    def build(self):
        '''Build the decision tree.'''
        target_values = self.table.select(self.target_column)

        if not self.table:
            return None
        elif len(set(target_values)) == 1:
            return target_values[0]
        else:
            splitting_column = self.choose_column()

            if splitting_column is None:
                return sample_analysis.mode(target_values) #Could be average or median for continuous data
            else:
                self.tree = {'splitting_column': splitting_column}
                #Could be a problem on big data
                splits = {}
                for row in self.table.get_rows():
                    try:
                        splits[row[splitting_column]].append(row)
                    except:
                        splits[row[splitting_column]] = [row]
                for split in splits:
                    subtree = DecisionTree(Table(splits[split]), self.target_column, self.fitness_func)
                    self.tree[split] = subtree

    def choose_column(self, significance=0.0):
        '''Get the attribute with the highest information gain.'''
        best_gain = 0.0
        best_column = None

        for column in self.table.columns:
            gain = fitness_func(vectors, attribute, target_attribute)
            if gain > best_gain:
                best_gain = gain
                best_column = column

        if best_gain > significance: #Chosen for significance
            return best_column
        else:
            return None

tldr: A decision tree algorithm takes a table of data, a target column, and a fitness function (used to figure out how to split the data) and constructs an n-dimensional tree which can be used to explain the data.

Now, back to Robocode- The decision tree is an incredibly formidable tool. Not only can it be used to compress and explain data, but it can be  and is used for forecasting. Since the algorithm identifies ultimately identifies clusters, then we can reasonably expect future data to simply fit in to our rules. So simply load it up with a table of wave data, identify the correct firing angle as the column to split on, select a reasonable fitness function (these get complex) and let the code figure out the clusters.  Then just before firing, apply the rules you’ve discovered to find the relevant cluster, and fire!

All said, there are a few interesting technical problems that arise that I want to raise, partly because I don’t yet know the answer, and just for you, the reader, to consider before getting to coding.

  • First off, what fitness functions do we use to categorize the data at each split? It gets tricky when you start mixing in categorical data with numerical data. Even barring that, fitness functions often can be biased.
  • A tree is a static data structure. Therefore, if it can be determined that the tree is not performing well, the entire tree needs to be recalculated (in opposition to what I had believed earlier about simply splitting leaves). While this isn’t so much a problem for small amounts of data, which huge amount it can be a problem. Are there ways to make the tree more flexible so that it never has to fully be recalculated?
  • Done incorrectly, a tree can overclassify the data, for example, making each row id its own cluster. Even if that can be fixed (and it can), where is the correct spot to draw the line on classifying the data, taking into the account that the more splits we make, the less likely it is that we split correctly? (That is to say, at the most extreme, you can be 100% sure you correctly classified the data as ‘data’ if you make no splits.) My intuition says some genetic algorithms can figure this out, but it’s hard to say for sure.
  • Finally, how do we best encode the information regarding the state of the game itself? For example, does it make sense to store the distance between the two tanks even though the same information could be determined via enough splits? (I say yes, if just to speed things up) If so, what other attributes would be relevant?

In any case, there is a LOT of research on this topic as I’ve recently discovered. (Claude Shannon is the man!) Let me know of any thoughts / answers and good luck!

Advertisements

Written by Ceasar Bautista

2011/06/27 at 23:49

2 Responses

Subscribe to comments with RSS.

  1. […] a previous article covering a decision tree I asked the question: Finally, how do we best encode the information […]

  2. A few years ago, one of our local residents, Mac Vorce, who is
    also a bicycle enthusiast petitioned for the development of bicycle paths and
    trails. No matter what your field, it helps to know what you’re good at, but it also helps to recognize what others are good at.
    After you do all this, you are ready to start performing in front of
    people for money.


Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s