Ceasar's Mind

Follow me: @Ceasar_Bautista

Prime Generator

with 2 comments

Inspired by the last post, just thought this was too cool not to share. The function generates primes!

def prime_gen():
    '''Generate primes.'''
    offset = 2
    index = offset
    sieve = range(offset, offset ** 2)
    found = []
    while 1:
        start = index ** 2
        end = (index + 1) ** 2
        sieve.extend(range(start, end))

        num = sieve[index - offset]
        if num:
            found.append(num)
            sieve = sieve[index - offset:]
            offset = num
            yield num

        i = 0
        while i < len(found):
            curr = found[i]
            j = start / curr * curr
            while j < len(sieve) + offset:
                sieve[j - offset] = 0
                j += curr
            i += 1
        index += 1</pre>

Written by Ceasar Bautista

2011/07/10 at 03:57

Posted in Uncategorized

Tagged with , ,

The Quadratic Sieve

leave a comment »

Just got done with a terrible day of magic numbers. To spare everyone the trouble, here’s a fairly clear version of a quadratic sieve for generating prime numbers that ought to be self-explanatory and easily modified– as opposed to many other algorithms out there.

def primes(n):
    '''Get all primes up to n.'''
    n = int(n)
    if n < 2:
        return []
    sieve = range(n)
    sieve[1] = 0
    root = n ** 0.5
    index = 0
    while index <= root:
        if sieve[index]:
            i = index ** 2
            while i &lt; n:
                sieve[i] = 0
                i += index
        index += 1
    return [x for x in sieve if x]

Written by Ceasar Bautista

2011/07/10 at 02:49

Posted in Uncategorized

Tagged with , ,

Decoding State

leave a comment »

In a previous article covering the concept of using decision trees for Robocode I posed the question:

Finally, how do we best encode the information regarding the state of the game itself? For example, does it make sense to store the distance between the two tanks even though the same information could be determined via enough splits? (I say yes, if just to speed things up) If so, what other attributes would be relevant?

I’m still not certain of the answer, but I’ve got some additional insight to further clarify the problem.

So we know that given a table of the state of the game encoded in any form (ie: x and y positions of both bots, hit points, and size of the battlefield) we could, given enough data points, build out a tree that very well approximates the correct firing angles for every situation. This make encoding something like “distance to target” technically redundant. However, the distance to the target seems to be a very relevant attribute to capture- something clearly not easily captured via the x and y positions of two bots. I’m not sure how it hit me, but that uncapturable essence of distance is unique strictly because it describes the relationship between the two variables.

That said, we now run into a problem- there are an infinite number of relationships between any two variables. Distance, obviously, is the relationship defined by (x -a) ** 2 + (y – b) ** 2 = c. But countless other relationships exist- for example any of the lines defined by any polynomials (ie: a * x ** 2 + b * x  + c = y) or another example– any of the lines defined by exponential relations (ie: y = a * n ** x + b). These relationships suddenly become impossible to handle and were the main reason I decided to investigate other options when I realized multiple regressions suffer the same fault. (Remember, that what I’ve just described is the relationship between only two dimensions. With each added dimension, the simple relationships grow at a rate of O(n!).)

Fortunately, it may very well be the case that it’s not TOO important that we capture all of them. Take for example the concept of curve fitting. While for any n points, a polynomial of degree (n – 1) is guaranteed to exist, such polynomials  are often so ridiculous in their fit that they are not useful for analyzing data. Thus, many statisticians will deliberately use only simple models to analyze data. It’s more or less the same with us- if we capture only relationships we expect to be useful, we’ll probably be good enough for most cases (especially given the size of the tanks and the relatively small size of the battlefield).

All told, while I’m still interested in figuring out definitively what is the right way to handle decoding the data, I think for now just capturing a few relationships that describe circles, straight lines, and hyperbolas, a very reasonable tank could be constructed.

Written by Ceasar Bautista

2011/07/08 at 02:28

Relationship between Least Common Multiple and Greatest Common Factor

with 2 comments

Never realized this before. Kind of cool.

def gcd(a, b):
    pfa = prime_factors(a)
    pfb = prime_factors(b)
    common = set(pfa) &amp; set(pfb)
    return reduce(operator.mul, [i ** min(pfa.count(i), pfb.count(i)) for i in common])

def lcm(a, b):
    '''Get the least common multiple of two numbers.'''
    pfa = prime_factors(a)
    pfb = prime_factors(b)
    common = set(pfa) &amp; set(pfb)
    return reduce(operator.mul, [i ** max(pfa.count(i), pfb.count(i)) for i in common])

Written by Ceasar Bautista

2011/07/01 at 17:30

Posted in Uncategorized

Tagged with ,

Robocode, Decision Trees (Dynamic Segmentation), and Information Science

with one comment

So some time ago, I discovered that when I was talking about what is known to the Robocode community as “Dynamic Segmentation“, what I actually was talking about is known to the machine learning community as a decision tree. To those who are unfamiliar, a decision tree is a tool used to classify information by identifying relevant clusters. That’s still a little vague, so I’ll explain in the context of trying to compress an image.

Let’s imagine that we have an image of black and white pixels that we’re trying to compress. To start, we first translate everything into a bit matrix. Now things get interesting. To compress the image optimally, we need to recursively divide the image along the x or y axis until each bit is in a box of identical numbers. There are several ways to do this, the most obvious being to perform a depth first search of all possible combinations of splits that satisfy the end condition until the tree with the least splits is found. That’s a terrible solution though and fortunately information science offers some help.

Rather than doing a recursive depth first search, a better solution is to calculate the entropy of the original image and for all possible splits calculate the reduction in entropy and then selecting the one that reduces entropy the most. (For those unfamiliar, entropy is more or less the misclassification rate– not exactly so, but they’re closely related.) Doing so repeatedly ought to give us not necessarily the best solution, but one that is good enough.

The resulting tree structure used to compress the image is called a decision tree. The reason being that we now have a series of rules that can tell us what color a pixel ought to be at any x-y coordinate, not so different than a min-heap really. The code for a more general tree is below.

class DecisionTree():
    def __init__(self, table, target_column, fitness_func):
        self.table = table
        self.target_component = target_column
        self.fitness_func = fitness_func
        self.tree = None

    def build(self):
        '''Build the decision tree.'''
        target_values = self.table.select(self.target_column)

        if not self.table:
            return None
        elif len(set(target_values)) == 1:
            return target_values[0]
        else:
            splitting_column = self.choose_column()

            if splitting_column is None:
                return sample_analysis.mode(target_values) #Could be average or median for continuous data
            else:
                self.tree = {'splitting_column': splitting_column}
                #Could be a problem on big data
                splits = {}
                for row in self.table.get_rows():
                    try:
                        splits[row[splitting_column]].append(row)
                    except:
                        splits[row[splitting_column]] = [row]
                for split in splits:
                    subtree = DecisionTree(Table(splits[split]), self.target_column, self.fitness_func)
                    self.tree[split] = subtree

    def choose_column(self, significance=0.0):
        '''Get the attribute with the highest information gain.'''
        best_gain = 0.0
        best_column = None

        for column in self.table.columns:
            gain = fitness_func(vectors, attribute, target_attribute)
            if gain > best_gain:
                best_gain = gain
                best_column = column

        if best_gain > significance: #Chosen for significance
            return best_column
        else:
            return None

tldr: A decision tree algorithm takes a table of data, a target column, and a fitness function (used to figure out how to split the data) and constructs an n-dimensional tree which can be used to explain the data.

Now, back to Robocode- The decision tree is an incredibly formidable tool. Not only can it be used to compress and explain data, but it can be  and is used for forecasting. Since the algorithm identifies ultimately identifies clusters, then we can reasonably expect future data to simply fit in to our rules. So simply load it up with a table of wave data, identify the correct firing angle as the column to split on, select a reasonable fitness function (these get complex) and let the code figure out the clusters.  Then just before firing, apply the rules you’ve discovered to find the relevant cluster, and fire!

All said, there are a few interesting technical problems that arise that I want to raise, partly because I don’t yet know the answer, and just for you, the reader, to consider before getting to coding.

  • First off, what fitness functions do we use to categorize the data at each split? It gets tricky when you start mixing in categorical data with numerical data. Even barring that, fitness functions often can be biased.
  • A tree is a static data structure. Therefore, if it can be determined that the tree is not performing well, the entire tree needs to be recalculated (in opposition to what I had believed earlier about simply splitting leaves). While this isn’t so much a problem for small amounts of data, which huge amount it can be a problem. Are there ways to make the tree more flexible so that it never has to fully be recalculated?
  • Done incorrectly, a tree can overclassify the data, for example, making each row id its own cluster. Even if that can be fixed (and it can), where is the correct spot to draw the line on classifying the data, taking into the account that the more splits we make, the less likely it is that we split correctly? (That is to say, at the most extreme, you can be 100% sure you correctly classified the data as ‘data’ if you make no splits.) My intuition says some genetic algorithms can figure this out, but it’s hard to say for sure.
  • Finally, how do we best encode the information regarding the state of the game itself? For example, does it make sense to store the distance between the two tanks even though the same information could be determined via enough splits? (I say yes, if just to speed things up) If so, what other attributes would be relevant?

In any case, there is a LOT of research on this topic as I’ve recently discovered. (Claude Shannon is the man!) Let me know of any thoughts / answers and good luck!

Written by Ceasar Bautista

2011/06/27 at 23:49

Ten hunters. Ten ducks.

with one comment

Here’s an interesting problem that might be useful to know in the future.

Ten hunters are waiting for ducks to fly by. When a flock of ducks flies overhead, the hunters fire at the same time,, but each chooses his target at random, independently of the others. If each hunters independently hits his target with probability p, compute the expected number of ducks that escape unhurt when a flock of size 10 flies overhead.

Solution

Let X_i equal 1 if the i-th duck escapes unhurt, and 0 otherwise, for i = 1, 2, …, 10. The expected number of ducks to escape can be expressed as:

E[X_1 + ... + X_10] = E[X_1] + … + E[X_10]

To compute E[X_i] = P{X_i = 1], we note that each of the hunters will independently, hit the the i-th duck with probability p/10, so: P{X_i=1} = (1 – (p / 10)) ^ 10 (that is, the chance that the duck escapes unharmed is equal to the chance that none of the hunters hit it). Hence, E[X] = 10 * (1 – (p / 10)) ^ 10.

(Just for reference, if p = 1, E[X] = 3.49.)

More generally, E[X] = T * (1 – (p / T)) ^ S, where T represents the number of targets, and S the number of shooters.

This is pretty cool in my opinion, as the problem shows up a lot in various games (although a lot of games use algorithms to determine which unit to fire at rather than chance), Tower Defense games perhaps being the most obvious.

Written by Ceasar Bautista

2011/05/09 at 12:49

Posted in Uncategorized

Tagged with ,

The Principal-Agent Problem (aka Why CEOs get mad bonuses)

leave a comment »

While I don’t really care for reading the news, from random conversations and web articles, it was clear that back during the government bailouts, people were pissed that the CEOs of many of the largest banks were taking home huge bonuses despite having nearly plunged the economy into the ground. While I can’t really comment on that, I can at least explain why CEOs in general why CEOs and business get bonuses while the rest of us get fixed-wages.

Imagine you are starting your own company, let’s say for simplicity a pizza parlor, and you have everything in place and now it’s time to hire the delivery guys. Let’s also say that you live in a town of unpleasant misers who refuse to tip. So now you have a problem. While it would be nice to assume that all of your employees will do their jobs, it’s very likely that a few of your workers will dislike their job and deliberately waste time on the job, either taking the scenics routes, or sitting their cars browsing Reddit. Worse, there is no way to know whether or not your workers are just screwing around or if by chance they actually just got stuck in traffic or had to deal with an unpleasant customer.

As you might expect, fixed wages are not the answer. If a worker gets a simple fixed wage and can choose between slacking off and doing his job, where slacking off is more rewarding, and not only that but in either case he can’t get caught, his decision to slack off is a no-brainer. The solution then is to turn to bonuses and the new problem becomes figuring out what the bonus should be.

Let’s say for example that when hiring, the opportunity cost of taking your offer is $100 for a day’s worth of work. Then we know that we have to pay our new employees at least that much per day. Let’s denote the fixed wage as W where W > 100.

Let’s say also that if your employee decides to slack off, then you still get a benefit of $200 in revenue, so $200 – W  profit. On the other hand, if you motivate your employee to work hard and he doesn’t get stuck in traffic, then you get $600 in revenue, otherwise you still get $200.  However, for your employee to work hard, it’s going to make him less happy, say $100 less so. Since we’re going to be paying our employee some bonus, B, though our profits in the case that our employee does well is $600 – (W + B). For our employee, this mean if he takes the risk working hard, with success rate P of no random problems, he will get a payoff of P * (W + B – 100) + (1 – P)(W – 100) or more simply, P * B + W – 100.

So now we know a few things. First, the payoff for our worker to take a risk working hard must be greater than or equal to slacking off. Second, the payoff to our worker must be greater than or equal to the worker’s opportunity cost. That is:

P * B + W – 100 >= W and P * B + W – 100 >= 100, so basically, P * B >= 100 and P * B >= 200 – W.

Because we’re out to maximize profits though, we just set P * B = 100 = 200 – W, which gives us B = 100 / P and W = 100 as one might expect. Notice this is not only intuitive but optimal. Paying in all bonuses or all wages costs us more.

Risk Aversion

However, this is hardly the case in reality because people are risk averse. That is, as humans, many of us would rather take $100 then take a 50% chance to get $250. So now let’s factor that in. Let’s say our employees are risk averse with aversion factor A. So now all of our employees payoffs are raised to the power A, which is between 0 and 1, with lower numbers meaning more averse. This gives us:

P * (W + B – 100)^A + (1 – P) * (W – 100)^A = W^A

We have to pay more to get people to take the same risk just because they are irrational. Fortunately, we get a surplus so long as 600 – P * B – W > 200 – W which simplifies to 400 > P * B, which still gives us a fair bit of room to work with. At this point, it’s just a matter of figuring out reasonable values for P and A, and then we can start making offers.

In conclusion, in fields where there is money to be made by working hard only to take a  risk (like Wall Street, car dealerships, etc) then bonuses are the most efficient way to get people to work diligently. Since statisticians can determine P, then really all that’s left is for businesses to set A, where they would prefer A to be closer to 1, which basically means that we would expect people in professions that require this kind of behavior to be pretty comfortable with the fact that despite their hard work, things might not roll their way. Anyway, hope that explains in general why bonuses are necessary, even if it was the case that those CEOs actually just stole our money.

Written by Ceasar Bautista

2011/04/29 at 18:31

Follow

Get every new post delivered to your Inbox.

Join 69 other followers