Thursday, August 20, 2015

Odds of 500,000 heads and 500,000 tails

If you flip a perfect coin* 1 million times, what are the odds that you would get exactly 500,000 heads and 500,000 tails?

A) 0%
B) 0.08%
C) 1%
D) 10%
E) 50%
F) 100%

Go ahead and try and explain your answer.

*perfect coin: A coin that has exactly a 50% chance of landing on either heads or tails.

Think of the simplest case. 2 coin flips. There are 4 possible outcomes, both heads, both tails, or 1 of each.

Wait, I just said 4 possible outcomes, then described 3?  Why?

Well, think of it like this:

HH, TT, HT, TH

Getting heads then tails is a different outcome than tails then heads, but both result in 1 of each. So, judging from that, you say, well 2 of 4, that's 50%, I knew the answer was E! Hah, not so fast...

Now, let's look at the next simplest case.  4 coin flips. Here are the possible outcomes:

HHHH, HHHT, HHTH, [HHTT], HTHH, [HTHT], [HTTH], HTTT, THHH, [THHT], [THTH], THTT, [TTHH], TTHT, TTTH, TTTT

So, there's 16 possibilities, and only 6 have equal numbers of heads and tails, I surrounded them in square brackets.  Uh Oh, that's 37.5% and it's probably going to get worse as we increase the flips towards 1 million. So maybe you're leaning towards answer C or D now, right?

From here I could go into the topics of binomial coefficients, pascal's triangle and other worthwhile, interesting topics, but I'll skip the heavy math and just show you dude's triangle.

     1
    1 1
  1 (2) 1   summed = (4)
 1 3   3 1
1 4 (6) 4 1 summed = (16)

Hopefully you recognize Pascal's triangle, but if you don't there's always Wikipedia.

We only care about every other case, as you need an even number of flips to obtain a perfectly even split of heads and tails.  You might recognize the numbers I put in parentheses. Those are the results we saw for 2 and 4 flips. So, now we just have to draw the next 999,996 lines of the triangle.

Don't worry, I did this on a separate sheet of paper, and the result was roughly 0.08%.

If you don't believe me, you can ask WolframAlpha


Thursday, February 19, 2015

nosetests: error: no such option: --with-xunit

Sometimes, nosetest is a liar.

For some reason, we were seeing this error in the Jenkins console output:

    nosetests: error: no such option: --with-xunit

Of course, the first thing to do is to google the problem.  Why doesn't nose like this parameter any more?  Maybe we updated versions, and now it's no longer accepted.

Well, unfortunately, that's not the case.

    nosetests --help

Tells me:

    --with-xunit Enable plugin Xunit: This plugin provides test results 
in the standard XUnit XML format. [NOSE_WITH_XUNIT]

Doh, that's not it.  However, scrolling up a few lines in the jenkins console output, I see that one of our 

    python setup.py develop 

commands was failing miserably, but the error wasn't being caught.  

I saw this in the console output:

    Couldn't find index page for 'our_thing' (maybe misspelled?)
    No local packages or download links found for our-thing
    error: Could not find suitable distribution for Requirement.parse('our-thing')

Fixed that problem, and nosetest were back in business.  Of course, we had lots of broken tests, because the nosetests error wasn't being caught as a failure by Jenkins.  Stupidity on top of idiocy.

<sigh>

Thursday, October 2, 2014

T-Mobile vs AT&T 7 day Test Drive

Originally, this blog wasn't going to have any code in it. I was just going to tell you the result, but what fun would that be. Also, it wouldn't really fit the theme of the blog.

Disclaimer: I've been an AT&T cell subscriber since 1999 when they were still called Cingular Wireless.


That said, you might think I'm biased, right? But which way?

Anyhow, T-Mobile was offering an opportunity to test drive their network. Basically, they send you an iPhone 5S in the mail, and you get to try it out for a week. Luckily, for the sake of science, my personal phone is also an iPhone 5S, but on the AT&T network.

The plan was fairly simple, I'd run Ookla's speed test app at a number of different locations that I commonly travel to. My daily commute takes me from the Far East Bay to downtown Oakland, and I also went to Berkeley for soccer practice. I was expecting to just eyeball it, and if the conclusion was, "Hmm, well, it's pretty close" I'd be tempted to switch from AT&T to T-mobile.

Then I discovered that the speedtest app will let you email a csv file containing all your test results, so of course I had to take advantage of that and put the results under further scrutiny.

So, I wrote up about 60 lines of throw away python code to evaluate the results. The way I graded the results was in three parts:

If one network was faster in both download and upload speeds, we'd call that a win. If one network won upload and the other download we'd call that mixed results.

The second criteria I looked at was simply average upload and download speeds across all the tests.

The third criteria was how many dead-zones with little to no network speed I found. I played around with the thresholds for what I considered to be a "dead-zone".

There are probably a lot more interesting things I could do with the data, and I may yet do so, as I'm only on day 4 of the test.

Here is the throwaway python code I used:
import csv
import os

with open('data' + os.sep + 'att.csv', 'rb') as a:
    a_data = csv.reader(a)
    att_data = []
    for row in a_data:
        att_data.append(row)

with open('data' + os.sep + 'tmobile.csv', 'rb') as t:
    t_data = csv.reader(t)
    tmobile_data = []
    for row in t_data:
        tmobile_data.append(row)

att_wins = 0
tmobile_wins = 0
mixed = 0

att_sum_dl = 0
tmobile_sum_dl = 0
att_sum_ul = 0
tmobile_sum_ul = 0

total_comparisons = 0

att_dead_zones = 0
tmobile_dead_zones = 0
both_dead_zones = 0
dl_dead_threshold = 1000
ul_dead_threshold = 500

for a_row in att_data:
    for t_row in tmobile_data:
        if a_row and t_row:  # Ignore blank lines
            if a_row[0] == t_row[0] and a_row[0] != 'Date':  # if times match exactly
                if int(a_row[4]) > int(t_row[4]) and int(a_row[5]) > int(t_row[5]):
                    att_wins += 1
                elif int(a_row[4]) < int(t_row[4]) and int(a_row[5]) < int(t_row[5]):
                    tmobile_wins += 1
                if int(a_row[4]) > int(t_row[4]) and int(a_row[5]) < int(t_row[5]):
                    mixed += 1
                if int(a_row[4]) < dl_dead_threshold or int(a_row[5]) < ul_dead_threshold:
                    if int(t_row[4]) < dl_dead_threshold or int(t_row[5]) < ul_dead_threshold:
                        both_dead_zones += 1
                    else:
                        att_dead_zones += 1
                if int(t_row[4]) < dl_dead_threshold or int(t_row[5]) < ul_dead_threshold:
                    if int(a_row[4]) < dl_dead_threshold or int(a_row[5]) < ul_dead_threshold:
                        pass
                    else:
                        tmobile_dead_zones += 1

                att_sum_dl += int(a_row[4])
                tmobile_sum_dl += int(t_row[4])
                att_sum_ul += int(a_row[5])
                tmobile_sum_ul += int(t_row[5])
                total_comparisons += 1

print 'AT&T wins:     ' + str(att_wins)
print 'T-mobile wins: ' + str(tmobile_wins)
print 'mixed results: ' + str(mixed)
print 'AT&T average download:     ' + str(att_sum_dl // total_comparisons)
print 'T-mobile average download: ' + str(tmobile_sum_dl // total_comparisons)
print 'AT&T average upload:       ' + str(att_sum_ul // total_comparisons)
print 'T-mobile average upload    ' + str(tmobile_sum_ul // total_comparisons)
print 'AT&T deadzones:     ' + str(att_dead_zones)
print 'T-mobile deadzones: ' + str(tmobile_dead_zones)
print 'both deadzones:     ' + str(both_dead_zones)
And here are the results:
AT&T wins: 13
T-mobile wins: 39
mixed results: 22


AT&T average download: 14926
T-mobile average download: 17747
AT&T average upload: 6495
T-mobile average upload 10466

AT&T deadzones: 7
T-mobile deadzones: 6
both deadzones: 4
As you can see, it was pretty much a slaughter.

One thing to note was that Berkeley seemed to have better AT&T coverage than T-mobile.
Also, the only place along the BART line from Dublin/Pleasanton to 12th Street Oakland that AT&T wins is at the Oakland Coliseum stop.

Thursday, September 25, 2014

XenServer 6.2 and CentOS 6.5 frustration...

So I've just provisioned a brand new XenServer and patched it all the way to the hilt.

Next step of course is to create some sweet new VMs on my little virtualization platform.

I go through the most basic of steps, any 12 year old sysadmin could navigate...

Of course I'll Install CentOS 6.5 from a URL, because who wants to keep an ISO library on hand, am I right?

Well, my mistake is captured in this screen shot, and if you can see it immediately, you should have a bright future as an IT monkey.



How this error manifests itself is equally elusive, and it took me longer to discover the root cause than it took to write this blog. Those type of issues are usually blog worthy.

Anyhow, back on topic. The error presented to me looks like this:

Sep 25, 2014 12:26:34 PM Error: Starting VM 'CentOS 6 (64-bit) (1)' - Internal error: xenopsd internal error: XenguestHelper.Xenctrl_dom_linux_build_failure(2, " panic: xc_dom_core.c:616: xc_dom_find_loader: no loader\\\"")


Well, technically, it shows up in this hard to copy status area of XenCenter.



However, you can find this in the server's log and then copy the text from there. Hah, of course, now I've got an actual error string, and Google, problem solved, right? Hecky nah.

Googling this error will lead you down a nefarious twisted trail of hopelessness, especially considering how simple the cause and effect were in my case.

30 minutes of searching for bootloaders gone missing, corrupt device drivers, corrupt disk drives, and little green men from outer space led me no where good.

Finally, I realized the problem. If you see where the cursor is in that first screen shot, there is simply some white space at the end of the Installation URL. That simple mistake caused this avalanche of bizarre errors. I'm just so glad I found the root cause, because these type of issues are nothing but aggravation.

Friday, August 22, 2014

Installing Python 2.7.8 on CentOS 6.5

CentOS 6.5 ships with Python 2.6.6 and you can't yum update past that. I really need Python 2.7. Since, 2.7.8 is the latest version (excluding Python 3) as of today, I'll go with that. I might be drunk, but I'm not so drunk that I can live without the argparse module, the unittest enhancements, OrderedDict and Counter classes.
I did all these commands from the root account, without that your mileage may vary.

Update CentOS and install the development tools

yum -y update
yum groupinstall -y 'development tools'
I'll want the following packages enabled for Python, and so should you.
yum install -y zlib-devel bzip2-devel openssl-devel xz-libs wget

Install Python 2.7.8 from Source

Download and extract the source:
wget http://www.python.org/ftp/python/2.7.8/Python-2.7.8.tar.xz

xz -d Python-2.7.8.tar.xz

tar -xvf Python-2.7.8.tar

Compile the Source and Install

# Enter the directory:
cd Python-2.7.8

# Run the configure:
./configure --prefix=/usr/local

# compile and install it:
make
make altinstall

# Checking Python version:
python2.7 -V
# Python 2.7.8

Install pip and virtualenv

No Pythonista worth two dead parrots would be caught dead putting together anything of complexity without these tools. So here's how to install them.
# Install pip
curl --silent --show-error --retry 5 https://bootstrap.pypa.io/get-pip.py | python2.7

# Install virtualenv
pip2.7 install virtualenv
And you're done.

Sunday, August 17, 2014

Project Euler Problem 2 and Fibonacci numbers

We can't talk about Fibonaacci numbers in a programming blog without at least touching on the idea of recursion. Of course, it's not a terribly efficient way to calculate Fibonacci numbers, it still warrants attention. That said, what is a Fibonacci number? Well, more accurately, it's the Fibonacci sequence and it starts with [0, 1]. So, we'll notate that by saying F0 = 0 and F1 = 1. Each subsequent number in the series is the sum of the previous two numbers.
F2 = F1 + F0
1 = 0 + 1
And more generally:
Fn = Fn - 1 + Fn - 2
The first 10 numbers in the series looks like this:
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
With that overly simple explanation, on to question #2!
Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:

1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...

By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.
Notice that the Project Euler question starts with 1, 2. That's fine, it isn't going to affect us here. The first thing we'll want to do is calculate Fibonacci numbers, I'm almost certain of that. Since the sequence is defined as a self referential sum of the previous 2 numbers, we could create a function like this:
def fib(n):
    """Recursive Fibonacci function.  
    Return the nth Fibonacci number.
    """
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return(fib(n - 1) + fib(n - 2))
Now, you can call that function with any number and get the nth Fibonacci number. Sadly, this is terribly slow, and it's not really what we need to solve the problem. I just put it here to show you an example of recursion. If you don't know what recursion is, then you don't know what recursion is. So, hopefully, now you know. Maybe we really want a generator function, that will just continuously yield subsequent fibonacci numbers. Something like this:
def fib_gen():
    """Fibonacci generator function.  
    Just keep yielding Fibonacci numbers.
    """
    a, b = 0, 1
    while 1:
        yield a
        a, b = b, a + b
Or maybe we can just brute force the whole thing? This is how I actually solved it without really thinking too hard at all:
second_previous = 0
previous = 1
fib_num = 0
total = 0

while fib_num < 4000000:
    fib_num = previous + second_previous
    if fib_num < 4000000:
        if fib_num % 2 == 0:
            total = total + fib_num
        second_previous = previous
        previous = fib_num
print(total)
It prints out the total that problem 2 is asking for. That solution runs very fast for me. There is one other interesting property, and that is that every third Fibonacci number is even.

Saturday, August 16, 2014

Project Euler problem 1

Apparently, some real jerks hacked the Project Euler website, so it wasn't operating at full functionality for quite some time.  However, it's back now and you can once again spend some time going through their over 400 mathematical and programming challenges posted there.

As an introduction, I'll start off with problem 1, how the site works and solve the problem for you, because even drunk we can do that.

Now that the sire is back you'll want to create a username and password.  They have removed email addresses from the site, so it's important that you remember the password, as there is no more password recovery option.

Once you've done that, you've got to start solving the problems.  The problems are usually stated very simply, and then they will kick your ass.  The first one isn't much of a challenge, but it'll give you an idea to the approach I use.  However, even the first question offers some intriguing ways to be solved, using math, more so than raw computing power.

So the first problem is titled

Multiples of 3 and 5


And it states:
If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000.
Just using the most obvious algorithm, we can solve the problem like so:

def sum3and5(val):
    """Sum the multiples of 3 or 5 below val."""
    total = 0
    for x in range(val):
        if x % 3 == 0 or x % 5 == 0:
            total += x
    return total

assert(sum3and5(10) == 23)
print(sum3and5(1000))
You can see I'm simply looping from 1 to 999. If the value is a multiple of 3 or 5, I add it to the total. I also test my function is working properly by running it against 10 and verifying that it totals to 23. In order to make the code a little sexier you could write this as a generator function, and sum the output. That means you can reduce the whole function to a single line:
sum(x for x in range(1000) if x % 3 == 0 or x % 5 == 0)
However, that doesn't speed up the execution much. In order to make this really efficient, we need to do some math. The function we really want is the sum of all the numbers divisible by another number in a range. We want to know the sum of all the numbers divisible by 3 up to 1000. That sum looks like this: 3 + 6 + 9 + 12 ... 999. But, we can see that it's the same as 3 * (1 + 2 + 3 + 4 + 5 ... 333). We also know from math that the sum 1 + 2 + 3 + 4 + 5 ... p = p * (p + 1) / 2 Putting those two facts together, I created a new function that looks like this:
def sum_divisible_by(x, y):
    """Sum of all the numbers up to y, divisible by x."""
    z = y // x
    return x * (z * (z + 1)) // 2
But that alone will only tell us the sum for a single divisor. We'll need to do it for 3 and 5. However, we'll be summing some numbers twice. To get rid of those we'll need to run the same function with 15 and subtract that sum. Putting that all together, we end up with this:
print(sum_divisible_by(3, 999)
      + sum_divisible_by(5, 999) 
      - sum_divisible_by(15, 999))
Sure enough, that gives the same answer, and it does it much more efficiently. In order to demonstrate that, I wrote this code, which runs each of the functions 10,000 times and times the duration.
t = time.time()
for x in range(10000):
    sum3and5(1000)
u = time.time()
print(u - t)

t = time.time()
for x in range(10000):
    sum3and5gen(1000)
u = time.time()
print(u - t)

t = time.time()
for x in range(10000):
    sum_divisible_by(3, 999)\
    + sum_divisible_by(5, 999)\
    - sum_divisible_by(15, 999)
u = time.time()
print(u - t)
This was the result:
1.87544608116
1.84872412682
0.0155010223389
Basically, the final function was more than 100 times faster.