Drunk Coding: 2014

Thursday, October 2, 2014

T-Mobile vs AT&T 7 day Test Drive

Originally, this blog wasn't going to have any code in it. I was just going to tell you the result, but what fun would that be. Also, it wouldn't really fit the theme of the blog.

Disclaimer: I've been an AT&T cell subscriber since 1999 when they were still called Cingular Wireless.

That said, you might think I'm biased, right? But which way?

Anyhow, T-Mobile was offering an opportunity to test drive their network. Basically, they send you an iPhone 5S in the mail, and you get to try it out for a week. Luckily, for the sake of science, my personal phone is also an iPhone 5S, but on the AT&T network.

The plan was fairly simple, I'd run Ookla's speed test app at a number of different locations that I commonly travel to. My daily commute takes me from the Far East Bay to downtown Oakland, and I also went to Berkeley for soccer practice. I was expecting to just eyeball it, and if the conclusion was, "Hmm, well, it's pretty close" I'd be tempted to switch from AT&T to T-mobile.

Then I discovered that the speedtest app will let you email a csv file containing all your test results, so of course I had to take advantage of that and put the results under further scrutiny.

So, I wrote up about 60 lines of throw away python code to evaluate the results. The way I graded the results was in three parts:

If one network was faster in both download and upload speeds, we'd call that a win. If one network won upload and the other download we'd call that mixed results.

The second criteria I looked at was simply average upload and download speeds across all the tests.

The third criteria was how many dead-zones with little to no network speed I found. I played around with the thresholds for what I considered to be a "dead-zone".

There are probably a lot more interesting things I could do with the data, and I may yet do so, as I'm only on day 4 of the test.

Here is the throwaway python code I used:

import csv
import os

with open('data' + os.sep + 'att.csv', 'rb') as a:
    a_data = csv.reader(a)
    att_data = []
    for row in a_data:
        att_data.append(row)

with open('data' + os.sep + 'tmobile.csv', 'rb') as t:
    t_data = csv.reader(t)
    tmobile_data = []
    for row in t_data:
        tmobile_data.append(row)

att_wins = 0
tmobile_wins = 0
mixed = 0

att_sum_dl = 0
tmobile_sum_dl = 0
att_sum_ul = 0
tmobile_sum_ul = 0

total_comparisons = 0

att_dead_zones = 0
tmobile_dead_zones = 0
both_dead_zones = 0
dl_dead_threshold = 1000
ul_dead_threshold = 500

for a_row in att_data:
    for t_row in tmobile_data:
        if a_row and t_row:  # Ignore blank lines
            if a_row[0] == t_row[0] and a_row[0] != 'Date':  # if times match exactly
                if int(a_row[4]) > int(t_row[4]) and int(a_row[5]) > int(t_row[5]):
                    att_wins += 1
                elif int(a_row[4]) < int(t_row[4]) and int(a_row[5]) < int(t_row[5]):
                    tmobile_wins += 1
                if int(a_row[4]) > int(t_row[4]) and int(a_row[5]) < int(t_row[5]):
                    mixed += 1
                if int(a_row[4]) < dl_dead_threshold or int(a_row[5]) < ul_dead_threshold:
                    if int(t_row[4]) < dl_dead_threshold or int(t_row[5]) < ul_dead_threshold:
                        both_dead_zones += 1
                    else:
                        att_dead_zones += 1
                if int(t_row[4]) < dl_dead_threshold or int(t_row[5]) < ul_dead_threshold:
                    if int(a_row[4]) < dl_dead_threshold or int(a_row[5]) < ul_dead_threshold:
                        pass
                    else:
                        tmobile_dead_zones += 1

                att_sum_dl += int(a_row[4])
                tmobile_sum_dl += int(t_row[4])
                att_sum_ul += int(a_row[5])
                tmobile_sum_ul += int(t_row[5])
                total_comparisons += 1

print 'AT&T wins:     ' + str(att_wins)
print 'T-mobile wins: ' + str(tmobile_wins)
print 'mixed results: ' + str(mixed)
print 'AT&T average download:     ' + str(att_sum_dl // total_comparisons)
print 'T-mobile average download: ' + str(tmobile_sum_dl // total_comparisons)
print 'AT&T average upload:       ' + str(att_sum_ul // total_comparisons)
print 'T-mobile average upload    ' + str(tmobile_sum_ul // total_comparisons)
print 'AT&T deadzones:     ' + str(att_dead_zones)
print 'T-mobile deadzones: ' + str(tmobile_dead_zones)
print 'both deadzones:     ' + str(both_dead_zones)

And here are the results:

AT&T wins: 13
T-mobile wins: 39
mixed results: 22

AT&T average download: 14926
T-mobile average download: 17747
AT&T average upload: 6495
T-mobile average upload 10466

AT&T deadzones: 7
T-mobile deadzones: 6
both deadzones: 4

As you can see, it was pretty much a slaughter.

One thing to note was that Berkeley seemed to have better AT&T coverage than T-mobile.
Also, the only place along the BART line from Dublin/Pleasanton to 12th Street Oakland that AT&T wins is at the Oakland Coliseum stop.

Thursday, September 25, 2014

XenServer 6.2 and CentOS 6.5 frustration...

So I've just provisioned a brand new XenServer and patched it all the way to the hilt.

Next step of course is to create some sweet new VMs on my little virtualization platform.

I go through the most basic of steps, any 12 year old sysadmin could navigate...

Of course I'll Install CentOS 6.5 from a URL, because who wants to keep an ISO library on hand, am I right?

Well, my mistake is captured in this screen shot, and if you can see it immediately, you should have a bright future as an IT monkey.

How this error manifests itself is equally elusive, and it took me longer to discover the root cause than it took to write this blog. Those type of issues are usually blog worthy.

Anyhow, back on topic. The error presented to me looks like this:

Sep 25, 2014 12:26:34 PM Error: Starting VM 'CentOS 6 (64-bit) (1)' - Internal error: xenopsd internal error: XenguestHelper.Xenctrl_dom_linux_build_failure(2, " panic: xc_dom_core.c:616: xc_dom_find_loader: no loader\\\"")

Well, technically, it shows up in this hard to copy status area of XenCenter.

However, you can find this in the server's log and then copy the text from there. Hah, of course, now I've got an actual error string, and Google, problem solved, right? Hecky nah.

Googling this error will lead you down a nefarious twisted trail of hopelessness, especially considering how simple the cause and effect were in my case.

30 minutes of searching for bootloaders gone missing, corrupt device drivers, corrupt disk drives, and little green men from outer space led me no where good.

Finally, I realized the problem. If you see where the cursor is in that first screen shot, there is simply some white space at the end of the Installation URL. That simple mistake caused this avalanche of bizarre errors. I'm just so glad I found the root cause, because these type of issues are nothing but aggravation.

Friday, August 22, 2014

Installing Python 2.7.8 on CentOS 6.5

CentOS 6.5 ships with Python 2.6.6 and you can't yum update past that. I really need Python 2.7. Since, 2.7.8 is the latest version (excluding Python 3) as of today, I'll go with that. I might be drunk, but I'm not so drunk that I can live without the argparse module, the unittest enhancements, OrderedDict and Counter classes.
I did all these commands from the root account, without that your mileage may vary.

Update CentOS and install the development tools

yum -y update
yum groupinstall -y 'development tools'

I'll want the following packages enabled for Python, and so should you.

yum install -y zlib-devel bzip2-devel openssl-devel xz-libs wget

Install Python 2.7.8 from Source

Download and extract the source:

wget http://www.python.org/ftp/python/2.7.8/Python-2.7.8.tar.xz

xz -d Python-2.7.8.tar.xz

tar -xvf Python-2.7.8.tar

Compile the Source and Install

# Enter the directory:
cd Python-2.7.8

# Run the configure:
./configure --prefix=/usr/local

# compile and install it:
make
make altinstall

# Checking Python version:
python2.7 -V
# Python 2.7.8

Install pip and virtualenv

No Pythonista worth two dead parrots would be caught dead putting together anything of complexity without these tools. So here's how to install them.

# Install pip
curl --silent --show-error --retry 5 https://bootstrap.pypa.io/get-pip.py | python2.7

# Install virtualenv
pip2.7 install virtualenv

And you're done.

Sunday, August 17, 2014

Project Euler Problem 2 and Fibonacci numbers

We can't talk about Fibonaacci numbers in a programming blog without at least touching on the idea of recursion. Of course, it's not a terribly efficient way to calculate Fibonacci numbers, it still warrants attention. That said, what is a Fibonacci number? Well, more accurately, it's the Fibonacci sequence and it starts with [0, 1]. So, we'll notate that by saying F₀ = 0 and F₁ = 1. Each subsequent number in the series is the sum of the previous two numbers.

F₂ = F₁ + F₀

1 = 0 + 1

And more generally:

F_n = F_{n - 1} + F_{n - 2}

The first 10 numbers in the series looks like this:

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

With that overly simple explanation, on to question #2!

Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:

1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...

By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.

Notice that the Project Euler question starts with 1, 2. That's fine, it isn't going to affect us here. The first thing we'll want to do is calculate Fibonacci numbers, I'm almost certain of that. Since the sequence is defined as a self referential sum of the previous 2 numbers, we could create a function like this:

def fib(n):
    """Recursive Fibonacci function.  
    Return the nth Fibonacci number.
    """
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return(fib(n - 1) + fib(n - 2))

Now, you can call that function with any number and get the nth Fibonacci number. Sadly, this is terribly slow, and it's not really what we need to solve the problem. I just put it here to show you an example of recursion. If you don't know what recursion is, then you don't know what recursion is. So, hopefully, now you know. Maybe we really want a generator function, that will just continuously yield subsequent fibonacci numbers. Something like this:

def fib_gen():
    """Fibonacci generator function.  
    Just keep yielding Fibonacci numbers.
    """
    a, b = 0, 1
    while 1:
        yield a
        a, b = b, a + b

Or maybe we can just brute force the whole thing? This is how I actually solved it without really thinking too hard at all:

second_previous = 0
previous = 1
fib_num = 0
total = 0

while fib_num < 4000000:
    fib_num = previous + second_previous
    if fib_num < 4000000:
        if fib_num % 2 == 0:
            total = total + fib_num
        second_previous = previous
        previous = fib_num
print(total)

It prints out the total that problem 2 is asking for. That solution runs very fast for me. There is one other interesting property, and that is that every third Fibonacci number is even.

Saturday, August 16, 2014

Project Euler problem 1

Apparently, some real jerks hacked the Project Euler website, so it wasn't operating at full functionality for quite some time. However, it's back now and you can once again spend some time going through their over 400 mathematical and programming challenges posted there.

As an introduction, I'll start off with problem 1, how the site works and solve the problem for you, because even drunk we can do that.

Now that the sire is back you'll want to create a username and password. They have removed email addresses from the site, so it's important that you remember the password, as there is no more password recovery option.

Once you've done that, you've got to start solving the problems. The problems are usually stated very simply, and then they will kick your ass. The first one isn't much of a challenge, but it'll give you an idea to the approach I use. However, even the first question offers some intriguing ways to be solved, using math, more so than raw computing power.

So the first problem is titled

Multiples of 3 and 5

And it states:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000.

Just using the most obvious algorithm, we can solve the problem like so:

def sum3and5(val):
    """Sum the multiples of 3 or 5 below val."""
    total = 0
    for x in range(val):
        if x % 3 == 0 or x % 5 == 0:
            total += x
    return total

assert(sum3and5(10) == 23)
print(sum3and5(1000))

You can see I'm simply looping from 1 to 999. If the value is a multiple of 3 or 5, I add it to the total. I also test my function is working properly by running it against 10 and verifying that it totals to 23. In order to make the code a little sexier you could write this as a generator function, and sum the output. That means you can reduce the whole function to a single line:

sum(x for x in range(1000) if x % 3 == 0 or x % 5 == 0)

However, that doesn't speed up the execution much. In order to make this really efficient, we need to do some math. The function we really want is the sum of all the numbers divisible by another number in a range. We want to know the sum of all the numbers divisible by 3 up to 1000. That sum looks like this: 3 + 6 + 9 + 12 ... 999. But, we can see that it's the same as 3 * (1 + 2 + 3 + 4 + 5 ... 333). We also know from math that the sum 1 + 2 + 3 + 4 + 5 ... p = p * (p + 1) / 2 Putting those two facts together, I created a new function that looks like this:

def sum_divisible_by(x, y):
    """Sum of all the numbers up to y, divisible by x."""
    z = y // x
    return x * (z * (z + 1)) // 2

But that alone will only tell us the sum for a single divisor. We'll need to do it for 3 and 5. However, we'll be summing some numbers twice. To get rid of those we'll need to run the same function with 15 and subtract that sum. Putting that all together, we end up with this:

print(sum_divisible_by(3, 999)
      + sum_divisible_by(5, 999) 
      - sum_divisible_by(15, 999))

Sure enough, that gives the same answer, and it does it much more efficiently. In order to demonstrate that, I wrote this code, which runs each of the functions 10,000 times and times the duration.

t = time.time()
for x in range(10000):
    sum3and5(1000)
u = time.time()
print(u - t)

t = time.time()
for x in range(10000):
    sum3and5gen(1000)
u = time.time()
print(u - t)

t = time.time()
for x in range(10000):
    sum_divisible_by(3, 999)\
    + sum_divisible_by(5, 999)\
    - sum_divisible_by(15, 999)
u = time.time()
print(u - t)

This was the result:

1.87544608116
1.84872412682
0.0155010223389

Basically, the final function was more than 100 times faster.

Thursday, March 6, 2014

A maths/math/physics question

So I'm watching season 1 of The Killing on Netflix. The show is great by the way, if you're into suspenseful murder mystery type stuff. Of course they mislead you in 15 different directions only to each time stomp out all your suspicions and provide a near perfect alibi for your most recent suspect. I still haven't finished season 1, but I'm going at my own pace, so NO SPOILERS please. You won't find them in this blog, so I'd appreciate not finding them in the comments...

Anyhow, in Season 1 detective Holder is joking around reading a math question out of a school algebra book. It goes something like this:

Holder: A biker goes blah blah distance at 8 miles per hour and then the same fool he bikes back at 10 miles per hour. So what's the average round trip rate?
Jack: I don't know
Holder: And I don't give a shit.

I'm assuming the majority of you are in the same camp as Jack and/or Holder (don't know, don't care). I however, am not, and I love math, almost as much as I love drinking! OK, I lied, I love math more, but that's kind of scary to some people, so I'll stick with my original stance.

Anyhow, I had this drunken conversation with my brother last night, who I made watch the first 3 episodes, and we agreed, the answer, just like the killer in the show, is not the obvious suspect, which would be 9 miles per hour, or at this point, Jasper. In order to calculate the correct answer, I quickly wrote up a few lines of python to determine the correct answer, while my brother was trying to think of extreme cases, such as what if you change the problem from 8 and 10 to say 1 and 60?

So, here's the sloppy python code I wrote up:

import sys
try:
    speed1 = int(sys.argv[1])
except:
    speed1 = 8.0
try:
    speed2 = int(sys.argv[2])
except:
    speed2 = 10.0
# speed in mph
time1 = 1.0 / speed1
time2 = 1.0 / speed2
avg = 2.0 / (time1 + time2)
print(avg)

Running the script with no parameters tells us that the answer is 8.88889 miles per hour. It's a bit counter-intuitive, until you start to think in terms of extremes. Running the script with 1 and 60 as the speeds, a person might expect an average somewhere near 30, however, that would be pretty far from accurate. In fact, running the script as such:

python avg_speed_same_distance.py 1 60
1.96721311475

It's not even over 2 miles per hour. In fact, if you ride the first leg of the trip at 1 mile per hour, you can NEVER make the round trip over 2 miles per hour. Take the most extreme case you can think of, where you traverse the reverse trip in 0 time, basically you travel at infinity miles per hour, you will still only have averaged 2 miles per hour. For instance, you travel 1 mile at 1 mile per hour, that's exactly 1 hour. Now, turn around, and instantly complete the reverse trip. Now you've traveled 2 miles, but it still took you an hour, so the round trip average is 2 miles per hour.

Just to assure myself that the distance was irrelevant, I changed up the code, so that the distance is just randomly chosen on each run of the script.

import random
import sys
distance = random.randint(1, 1000)
try:
    speed1 = int(sys.argv[1])
except:
    speed1 = 8.0
try:
    speed2 = int(sys.argv[2])
except:
    speed2 = 10.0
# speed in mph
time1 = distance / speed1
time2 = distance / speed2
avg = (distance * 2.0) / (time1 + time2)
print(avg)

As I suspected, it doesn't matter, you get the same results for the same parameters, every time, even with varying distance.

TLDR; The answer is C. The answer is always C.

Drunk Coding