Weekly Python Code Snippets

Proud to Code
Code to be Proud Of

The best resume I know how to give,
quality contributions to a code base over time.

paufler.net@gmail.com

Posted in reverse order, most recent first.

Closure Class

Short but sweet, a class that acts like an 'empty closure' (i.e. a pass through closure that does nothing):


class empty_closure():
    def __init__(self, func):
        self.func = func
    def __call__(self, *args, **kwargs):
        self.func(*args, **kwargs)

@empty_closure
def foo():
    do_something()

The utility is not so much in what this specific code can do, for it does nothing (i.e. foo does exactly what it did pre-decorator, so blessed obfuscation). But I like how __call__ is clearly separated from __init__, cleanly segregating the use case from creation. Seems conceptually simpler to me.

Or Else

I've been pretty proud of myself for using inline if/else statements for the last few months. Well, here's to making my code even shorter by using or statements during assignment.


a = True if False else False
b = False if True else True
c = True or False
d = False or True

print a, b, c, d
#False False True True

Not every if/else use case can be replace with the equivelant or, as the or can't assign a False value. But my average use case of checking for None collapses quite succinctly:


variable = None
default = 'Something'

val_old = variable if variable else default

val_new = variable or default

print val_old == val_new
#True

Parse Order

The compiler cares what order commands are given, but not the order of functions. I would say that I've never modified functions on the fly. But it's probably more accurate to say that I've never modified a function in an intentionally dynamic way.

The following is pretty straightforward: b could just as easily come before a.


def a(cond):
    print 'In a()'
    if cond:
        return b(not cond)
    else:
        return 'End A'

def b(cond):
    print 'In b()'
    if cond:
        return a(not cond)
    else:
        return 'End B'
    
print a(True)
print
print b(True)

#In a()
#In b()
#End B
#
#In b()
#In a()
#End A

But if the following code appears further down the module, a different outcome results.


def b(cond):
    return 'Enough of this!'

print a(True)
print
print b(True)

#In a()
#Enough of this!
#
#Enough of this!

It's not really odd, until one stops to question how function a knows which version of b to use, as when a is defined, neither version exists.

I'll guess that a binds to the text string 'b' and it calls this from the global/local dictionary, which is in turn modified at the second definition of b

And since the following outputs more or less what I expected, I call my analysis complete.


def b(cond):
    return globals()['a'](False)
print a(True)

#In a()
#In a()
#End A

The Vast Cavern of Wonder

I like the metaphor (or at least, the terminology) of the Rabbit Hole. It being the implicit quest of the Expert Programmer to go as far down the Rabbit Hole as possible. Now, I'm not going to say that I'm some great spelunker. I am a casual explorer at best, perhaps more of a day tripper. But during one of my latest outings, I came across Python's opcodes module, the idea therein and the assembly code to which Python is eventually compiled. And I decided to look at the raw code, which led me to the git repository for Python as a whole. And all I can say is that I looked on with wonder and delight as the enormity of this vast repository opened before my eyes: individual libraries as stalagmites with their form and texture derived by the underlying modules, entries descending for miles, who knows to what depth and detail in the browser screen below.

I, suppose, this is what keeps me coming back to programming. The well never seems to go dry. And every time I trace down a small tributary, I find myself staring across another ocean, so wide it would take a lifetime to reach the other shore, much less explore completely.

This is perhaps how some folks feel when they stand next to the Grand Canyon. Well, I feel this way often enough while perusing the abstraction of code. And it, probably, goes a long way towards explaining why I keep on coming back for more.

Strategic Game AI

I am working on some Strategic Game AI, at the moment; though, AI does seem like the wrong word for it. Anyhow, some ideas on that whole Strategic Game AI thing:

If the total number of possible moves for the next turn is small enough {tic-tac-toe (9, max), checkers (under 10, typically), chess (around 100, at the most complicated mid-game positions, less at the beginning and end), so most board games}, I would be inclined to create a list of all the possible moves prior to selecting the best (list moves, rank-score the moves, sort, and select).

Play styles can be implemented by tweaking the rank-scoring algorithm

Skill levels can then be implemented by slicing off the best moves and selecting at random (10% fuzz equals slicing off the 10% best moves and selecting at random from those, 20% - 20%, and so on).

As one gets further away from the 'next turn' and delves into deeper strategy, network graphs seem intriguing (this move, that move, then the key move; or if that's not clear, move here, build a cannon, then with the increased firepower attack). Graphs would allow the linking of dissimilar moves into a cohesive whole (movement, reinforcing, attack, etc.).
As the set of moves grows large enough, some sort of genetic algorithm seems plausible (to creatively select from the near infinite). But at this stage, even the network graphs idea above is merely an idea.
Finally, having little to do with Strategic Game AI, but being more in tune with the rest of this page, I am inclined to favour classes over named tuples in the future, as I am finding it easier to evolve classes over time (add/remove attributes and expand functionality). So, let's see how long 'No more named tuples for me' lasts.

It's a String

Shows how often I use binary numbers. In Python, they're strings.


print type(int(1))
print type(bin(1))
print type(oct(1))
print type(hex(1))

#<type 'int'>
#<type 'str'>
#<type 'str'>
#<type 'str'>

Which means, rather than doing math, when added together, binary strings are concatenated.

Tic Tac Toe

I made a Tic Tac Toe game last week, so that I'd have a project in which to learn (or start to learn) the Tk framework. I'm rather proud of the code below. Rather than a long if/elif/else block, the result of all the Maybe's are formed into a list, from which the first non-None value is popped and returned.

All the self.play_ methods are Maybe's in that they return either None or an int with the int being that play's recommendation for the next move in the game.


def play_turn_hard(self):
    '''Not worried about the extra loops,
    if/then guards makes the logic needlessly complicated.
    
    All play_'s yield None or a game_square number,
        so maybe's essentially.'''
    
    #in order of preference
    play_strategy = [
        self.play_first_turn(),   
        self.play_winning_move(),
        self.play_block_winning_move(),
        self.play_fork(),
        self.play_block_fork(),
        self.play_center(),
        self.play_opposite_corner(),
        self.play_any_corner(),
        self.play_turn_random()  
        ]
    
    plays = [play for play in play_strategy
        if play != None]
    play = plays.pop(0)
    #print play, play_strategy #yields a nice debugging string
    return play

I mean, it's still sort of messy. And I just pulled the code straight from the project without prettying it up any. But I think it's worlds better than a long if statement.

FASWC

Just a short little exercise to try and get a grip on some of the more common formatting options. For me, the mnemonic is FASWC... not that I'll remember that in a hour.


fill_values = [' ', '_']
align_values = ['<', '>', '^']
sign_values = ['+', '-']
width = ['10']
comma = [',', '']
precision = '.digits of precision for floats'

values = product(fill_values,
                 align_values,
                 sign_values,
                 width,
                 comma)

for faswc in values:
    print ('{:%s%s%s%s%s}' % faswc).format(1000),
    text = '{:%s%s%s%s%s}' % faswc
    print text.format(1000), '\t', faswc, '\t', text

As they say, I will leave the output of the above as an exercise for the reader.

The only other interesting format option (for me, now, keeping in mind that I'm sure I'll want more, once again, in that very same hour) is:


print '{:.4}%'.format(1.0/3.0*100)
print '{:.2%}'.format(1.0/3.0)
#33.33%
#33.33%

Notice that the output is the same, but in one the '%' is inside the brackets and the other outside. So, four digits of accuracy or a percentile to two digits of accuracy.

And yeah, I guess I'm still on this project. Truth is, the only reason I took time to practice this bit was so I'd have something to post this week. That in itself makes the page worthwhile to me. Every little bit counts.

Extreme Programming

Having listened to a few videos featuring Linus Torvalds as of late, I am keen on his heuristic that nesting code more than three layers deep (indenting, in Python) is a bad 'code smell'. The easy fix, of course, is to break those nested layers out into separate functions, loops, and/or control structures.

Also, as part of the same video spree, I came across the idea of Extreme Programming, which likely means different things to different people, but to me, it means trying to make comments superfluous. So, very much in line with what I always thought was the key point behind Literate Programming: code should read like a book (i.e. make sense).

And between you and me, I think that's enough style-name dropping for one day.


def utf16_txt_to_ascii(utf16_txt_file):
    '''Converts utf16 .txt file to ascii (inplace).''' 
    
    with open(utf16_txt_file, 'rb') as f:
        text = f.read()

    text = text.decode('utf-16')
    text = text.encode('ascii', 'ignore')
    text = text.replace('"',' ')
    
    with open(utf16_txt_file, 'wb') as f:
        f.write(text)

Does the doc string add any information that wasn't already provided by the function and parameter names? Probably not.

In the extreme form, I believe Extreme Programming advocates the elimination of comments, but I can't bring myself to do that. But eliminating the need for comments, well, that seems like a worthy goal.

Import Self

Being clever, I thought I'd name my experimental module after the module I was experimenting with, which led to a circular import of the experimental module.

For example, in a user defined module named math.py:


#math.py

import math
print math.cos(3.0)

#AttributeError: 'module' object has no attribute 'cos'

math.py imports itself as math; and as such, the standard math module is not available.

Lambda

I was thinking about defining a group of functions in a list. The solution, of course, uses lambda, which isn't as tricky or clever as I was hoping. But then, cleverness is seldom appreciated in the coding style of others.


func = [lambda x: x + 0,
        lambda x: x + 1,
        lambda x: x + 2,
        ]

result = [f(x)
          for x in [0, 10, 100]
          for f in func]

print result
#[0, 1, 2, 10, 11, 12, 100, 101, 102]

If nothing else, I like how the numbers are all sequential.

Funsies

Sometimes, I take this little project too seriously; too the point, I've thought about just stopping. So, let this next be a bit of fun, a delight in the possible.


a, b = b, a = a, b = 1, 2
print a, b
#1 2

a, b = b, a = a, b = b, a
print a, b
#2 1

It's sort of pleasing when things work how they're supposed to work, even if no one could possibly want them to work that way.

Stacks

Is it the problem domain that I am currently working in? Or has my style of programming shifted? Who knows? Currently, this last week, I've found myself using stack structures quite a bit:


def start_value():
    pass
def process_current():
    pass

current = start_value() 
stack = [current]

while stack:
    next_item = process_current(current) 
    if next_item:
        stack.append(next_item)
        current = next_item
    else:
        current = stack.pop()

Of course, in a real stack, to prevent loops, there may well be a list of items_processed = []. But then again, there may well not...

Lists of Objects

I think I've found my new favourite data structure: a list of objects.


my_num = count(1)

class my_object():
    def __init__(self):
        self.num = my_num()

data_list = [my_object(), my_object(), my_object(), ...]

In the abstract, it might not seem that useful, but given that my_object could be a node, and the data_list could be contained in another structure called a tree, inverting or flipping said tree becomes trivial, and while changing the root, if not trivial, certainly cannot be that difficult (even if I have not implemented the last as of this writing).


node_num = count(1)

class Node():
    def __init__(self, links=None):
        self.num = next(node_num)
		self.links = links if links else []

class Tree():
    def __init__(self, nodes=None):
        self.nodes = nodes if nodes else []

    def flip(self):
        '''Reverses order of all links in all nodes.'''
        for node in self.nodes:
            node.links = node.links[::-1]
			
t = Tree([Node(), Node(), Node()]

Filling in the Blanks

It's sort of interesting, revealing, insightful, whatever looking at the holes in my knowledge. I've never used raw_input(). I can't say that a meaningful use case for it rapidly comes to mind. Sure, I know why others might use it. But since I work out of eclipse, inside eclipse, it's easier to change a variable in the code than it is to utilize input. So, I don't know how I would use it. Hence, two, three, four, who knows, such an incentive to lie, to exaggerate, but three years since I started keeping track of how long I've been coding (my website has an anniversary), and this is the first time I've used raw_input(); and such an obviously trivial use case at that.


t = raw_input('Hello (type here): ')
print 'Right Back at you: %s' % t

KISS

Keep It Simple, Sherlock!

The past week or two, I've felt my time would be better spent looking at other people's code (in the main distribution, popular libraries, github files, whatever). And it's been very helpful in seeing what other folks do, what calls they make, what libraries they use, and that I would have never thought about making previously on my own. But there is a type of programming that I can't say that I appreciate (call it Java style, like I would know, endless classes and so on). And as a minor example of overly difficult code, I present the following, which at first blush, seems like something almost deliberately confusing.


print 1<<5,             bin(1<<5)
print __lshift__(1, 5), bin(__lshift__(1, 5))
print 2**5,             bin(2**5)
print pow(2,5),         bin(pow(2,5))
#32 0b100000

Granted, the four lines above are not identical, but I would wager more folks can parse the last two lines more easily than the first two.

Though, now that I look at it, the next step in the progression seems like it might be much-much easier to process if the first two lines are utilized, especially after one has seen this trick before (like modulus, no doubt).


print 32<<5,             bin(32<<5)
print __lshift__(32, 5), bin(__lshift__(32, 5))
print 32*2**5,             bin(32*2**5)
print 32*pow(2,5),         bin(32*pow(2,5))
#1024 0b10000000000

And therein, we have the explanation for why it was done thusly.
And why I am bothering to look at random code snippets in the first place.
The truth is, I would have never come up with this on my own... nor likely figured out its true purpose without ranting a bit, so call this post 'Rubber Ducking' of a sort. The first implementation of a bitshift that I've come close to understanding. I'm sure there will be more.

Forest for the Trees

A simple test run / proof of concept for four different constructors as might be used in a node/leaf tree. I suppose, as much as anything else, I wanted to play with type, setattr, and getattr to get a better feel for them. And my feeling after this is a simply that class (IMHO) remains the way to go.


class node_class():
    __slots__ = ('n', 'a', 'b',)
    def __init__(self, n=None, a=None, b=None):
        self.n = n
        self.a = a
        self.b = b
    def __repr__(self):
        return '(n: %s (a:(%s), b:(%s)))' % (
                str(self.n), str(self.a), str(self.b))

w = node_class('base', 'left', 'right')
w.a = node_class(w.a, 'l-l', 'l-r')
print w
#(n: base (a:((n: left (a:(l-l), b:(l-r)))), b:(right)))


def node_func(n, a, b):
    return type('node_func', (object,), dict(n=n, a=a, b=b))

x = node_func('base',
              node_func('left', 'l-l', 'l-r'),
              'right' )
print x.n, x.a.n, x.a.a, x.a.b, x.b
#base left l-l l-r right


node_tuple = namedtuple('node', ['n', 'a', 'b'])
y = node_tuple('base',
               node_tuple('left', 'l-l', 'l-r'),
               'right')
print y
#node(n='base', a=node(n='left', a='l-l', b='l-r'), b='right')

#The Hard Way with type, setattr, and getattr
dict_1 = dict([('n','base'), ('a', 'left'), ('b', 'right')])
dict_2 = dict([('n','left'), ('a', 'l-l'), ('b', 'l-r')])
z = type('z', (object,), dict_1)
setattr(z, 'a', type('z', (object,), dict_2))
print getattr(z, 'n'), getattr(getattr(z, 'a'), 'n'),
print getattr(getattr(z, 'a'), 'a'), getattr(getattr(z, 'a'), 'b'),
print getattr(z, 'b')
#base left l-l l-r right

`fileinput`

My first impression of the fileinput module was quite negative. And in truth, the only reason I'm tempering my verbalization of that opinion is because I'm convinced my current frustration derives from my own ignorance.

And for the moment, I'm just going to gloss over why that might be a good rule of thumb for any and all such future frustrations.

Anyhow, the following code 'copies' a file 'inplace'; or in the commented out version, replaces the given text (over all the pages of my website, should I so desire, based on the passed files in the for loop).


import fileinput
for html in html_files():
    for line in fileinput.input(html, inplace=True):
        print line,
        #print line.replace('©', '&#169;'),

In criticism, it seems sort of sloppy (to me, anyway) to redirect print from the screen to a file based on context (as is done in the fileinput module). Me, I would have liked a write method of some sort. Yes, I understand redirecting stdout had to make the implementation easier, but making an implementation easier isn't really the basis of a meaningful design decision.

Anyhow, I got this far in the code; and then, realized that for the next step, I need to be able to differentiate between the hodgepodge of encodings I use on my website (ANSI, UTF-8, probably more), a hodgepodge which never bothered me before (probably because I never noticed before) not until I wanted to mass convert all my © symbols into ©, because to do that, I need to know how the © were encoded before... or do a series of replaces; but really, isn't it time all my webpages had the same encoding? I mean, wasn't that going to be one of the projects for this year, remove the mistakes I made to the site when I was young and coding was new?

Equal or In

Not so much code, as a code design philosophical inquiry, something to think about.

When constructing a case/select is it better to be forgiving or restrictive?


if var == 'equal':
    run_this()

#Or?

if var in 'equal':
    run_this()

Both fire on 'equal', but only the later fires on 'eq'.

Print as a Function

I use Python 2.7. Maybe I should switch over to 3.5. But in the meantime, I came across a nifty way to access print from inside a list comprehension.


def prin(x): print x,
a = [prin(a) for a in range(1,6)]
#1 2 3 4 5

And since that's so short, I'll throw in a simple reduce function, as well.


from operator import add, sub, mul
def my_reduce(func, r, xs):
    '''Reduces passed list xs with given func.'''
    x = xs.pop(False)
    return my_reduce(func, func(r, x), xs) if xs else func(r, x)
print my_reduce(add, 0, range(1,1000)) #499500
print my_reduce(sub, 0, range(1,1000)) #-499500
print my_reduce(mul, 1, range(1,1000)) #being suitably long

As production code, I have my doubts (it's too crumpled up); but it's good enough for a simple (x:xs) proof of concept.

x, xs = aList.pop(False), aList

Generating Gotchas

I ran across some interesting Python code last week, which was essentially (or at least, looked that way to me) Lisp in Python. Of course, it wasn't Lisp word for word, but rather the look and feel of Lisp using standard Python syntax and library calls. And since I like endless parenthesis (of the aforementioned Lisp aesthetic), I was intrigued and felt the urge to further explore generators (and itertools). But much to my surprise (and therefore the source of a minor bug), a truth test for an empty list returns False, whereas the same test on an empty generator evaluates to True.


lst = [x for x in []]
print bool(lst), lst
#False []

gen = (x for x in [])
print bool(gen), gen
#True <generator object>

Trampolines... and such

A long entry, so perhaps that means nothing of importance. Was thinking about what to write up and recalled an article on trampolines that I'd read. So, figured I do that from memory and call it a day. Turns out I haven't done that much with generators... if anything, so the yield took some doing. But once I'd done it, I couldn't see much point in the trampoline. Though, to be honest, not quite sure I understand the implications of this little trick yet, so maybe what's coded isn't really a trampoline at all...


from itertools import count

def fibo():
    '''Generator yielding next number in Fibonacci sequence.'''
    f = [0, 1]
    for _ in count():
        f.append(sum(f))
        yield f.pop(0)

def fibonacci(nth):
    '''Returns nth number in Fibonacci sequence'''
    f = fibo()
    for _ in range(nth - 1):
        f.next()
    return f.next()

def trampoline(fun, nth):
    '''Returns nth number in passed func sequence.'''
    f = fun()
    for _ in range(nth - 1):
        f.next()
    return f.next()

def start_at_tenth(fun):
    '''Wrapper that advances wrapped generator function
    to the tenth number in sequence.'''
    def wrapper():
        f = fun()
        for _ in range(9):
            f.next()
        return f
    return wrapper

def test_fibo_funcs(nth=10):
    '''Tests the different fibo constructs.
        nth must be an int += 10,
             as that's where the wrapper starts.'''
    assert nth >= 10
    #Set Up Base fibo() generator
    f = fibo()    
    for _ in range(nth - 1):
        f.next()
    f_fibo = f.next()
    #Set Up Wrapper Generator
    @start_at_tenth
    def f10():
        return fibo()
    f_10 = f10()
    for _ in range(nth - 9):
        f_start_ten = f_10.next()
    #All tests should give same value for nth fibonacci number
    tests = [f_fibo, f_start_ten,
             fibonacci(nth), trampoline(fibo, nth)]
    print tests
    assert all([t == tests[0] for t in tests])

test_fibo_funcs()
test_fibo_funcs(45)
#[34, 34, 34, 34]
#[701408733, 701408733, 701408733, 701408733]

Any In

I often find myself filtering multiple OR conditions (this or that or that_other). This function finds such matches a bit more elegantly compared to run-on OR statements:


def any_in(match, against):
    '''Returns True if any in match found in against,
    where both match & against support iteration.'''
    return any(map(lambda x: x in against, match))

In other news, I was surprised to learn about difflib. It would have been exactly what I needed, but it's way too slow to inventory the files on my computer week to week (i.e. 500,000 line long text files). So, I'm working with sets instead. In the end, maybe I'll prefilter the dataset down with sets before utulizing difflib to extract meaning from the remainders. Who knows? It's a work in progress.


#Fairly close... or not really, not at all
diff = difflib.Differ().compare(a, b)
diff = set(a) - set(b)

REGEX

I stacked a bunch of named tuples in a list to hold my regex signatures for a text scanning program that I'm working on. Of course, I'm not so good at regex's that I can just make one off the cuff and be sure it works. Rather, I find scanning some test text with the given regex to be part of the regex development process. And what this structure does is keep all those tests safely packed together along side their respective regex's, which in turn allows me to better manage the entire list of regex's.


sig = namedtuple('sig', ['regex', 'feedback',
                         'hit_list', 'miss_list'])
    #regex = regex to find match
    #feedback = debugging notes
    #hit_list = regex matches these strings 
    #miss_list = regex DOES NOT match these 
						 
sig(regex = '(\.src(?!(\s!?=+\s|;|\))))',
feedback = 'JavaScript .src should have " = ", ;, or )',
hit_list = ['.src=9', '.src=  thisVar', '.src =4;'],
miss_list = ['.src = i', ' t.src; th', 'a.src === this',
             '.src) ', 't.src != that']),

Querying the SQLite Fantastic

What to say? I'm about as proud (enamoured, pleased to show off) the (cz_min,) = c.execute(sql).fetchone() syntax as much as anything else. (cz_min,) looking pretty darn clean to me compared to a [0] tagged on the end of the .fetchone().

Beyond that, I'll just say that sqlite3 has enough error handling capabilities that the sql_commands don't feel like complete 'Blob Text' (i.e. they are easy enough to debug). And at this point, I feel more limited by my desire to do anything with a sql (corn, grain) database, than I am by my knowledge of SQL. So, a good week. Progress!


import sqlite3
from itertools import chain
    
#Open database, get cursor
grainDB = './sql/grain.db'
conn = sqlite3.connect(grainDB)
c = conn.cursor()

#Base string for sql statements
sfw_base = "SELECT %s FROM %s WHERE %s;"

#Creates a cz2015 view, if none exists 
sql = sfw_base % ('*', 'sqlite_master', "type='view'")
if 'cz2015' not in chain.from_iterable(c.execute(sql).fetchall()):
    c.execute('''CREATE VIEW cz2015 AS SELECT * FROM
                    futures WHERE issue=="CZ2015" and year=2015;''')

#MIN: 358.25
sql = sfw_base % ('settle', 'cz2015',
                  'settle == (SELECT min(settle) FROM cz2015)')
(cz_min,) = c.execute(sql).fetchone()
print 'MIN: %.2f' % cz_min

#MAX: 451.75
sql = sfw_base % ('settle', 'cz2015',
                  'settle == (SELECT max(settle) FROM cz2015)')
(cz_max,) = c.execute(sql).fetchone()
print 'MAX: %.2f' % cz_max

#AVE: 393.24
sql = sfw_base % ('sum(settle)/count(settle)', 'cz2015', 'settle')
(cz_ave,) = c.execute(sql).fetchone()
print 'AVE: %.2f' % cz_ave

conn.close()

SQLite: So, SQL-ish

How long have I been programming? I don't actually know since I didn't write down the starting date. So, call it three years... even if there is quite the incentive to exaggerate that number, three years sounds about right. And in that time, even though an SQL book was one of the first books I read, I've been avoiding SQL databases like the plague... maybe still am, since this is done in SQLite. But this doesn't seem too hard. And in many ways, it seems easier than Pandas, easier to wrap my mind around completely. So, maybe the time has come for SQL.


def create_grain_db():
    '''Creates a new sqlite grain.db using current csv files.
        grain.db includes futures table:
            year, month, day, settle, volume, issue'''
    #Removes Old grain.db if it exists
    name = 'sql/grain.db'
    if isfile(name):
        remove(name)
    #New grain.db is created: note schema 
    conn = sqlite3.connect(name)
    c = conn.cursor()
    c.execute('''CREATE TABLE futures
                        (year integer, month integer, day integer,
                        settle real, volume real, issue text)''')    
    #Data is loaded from all files in these two directories
    path_list = dir_paths('./corn/') + dir_paths('./soy/')
    for file_path in path_list:
        issue_name = issue(file_path)
        for datum in csv_data(file_path):
            data = str(tuple(datum + [issue_name]))
            sql_command = 'INSERT INTO futures VALUES %s' % data
            print sql_command
            c.execute(sql_command)
    conn.commit()
    print '\nCreated: %s' % name

I used %s instead of ? formatting in sql_command, because I want the print feedback. I suppose in production the safety of ? would be important and the feedback wouldn't. Here, the opposite is true.

Numpy Where

Is this the best of the week? Yeah, probably. The long range implications for image manipulations are greater than they might at first appear in that a -1 return value for any effect can now mean ignore this effect at this location (to be replaced with some other value, background, or effect in a later step). It will make cascading image effects (feeding one effect into the next) that much easier. So, sort of like an image mask. Sure, it's been done before. But now I have my protocol and that makes all the difference.


img_out = np.where(img_effect == -1, img_background, img_effect)

Python: My Calculator of Choice

So, there's this Brain Teaser question that goes along the lines of: If ten feet were added to a rope that stretched around the world, would a cat be able to walk under said rope. I would have said no, but it turns out the answer (in the spirit in which the question was asked) is yes. This is the proof:


from math import pi

earth_circumference = 24000 * 5810
earth_circumference_10 = earth_circumference + 10

earth_diam = earth_circumference / pi
earth_diam_10 = earth_circumference_10 / pi

print earth_diam_10 - earth_diam


for c in range(1, 11) + range(1000000, 1000110, 10):
    print c, c / pi
#For all circumferences, c + 10, yields diameter + ~3

This isn't supposed to be complicated... nor even well documented or even explained very well. Rather, it's an example of me using Python (or more precisely, Eclipse with PyDev) to solve a math problem rather than using Excel, a calculator, pen and paper, the back of a napkin, or whatever. I won't say I think in Python. What I will say is, wanting to solve this particular problem, I looked to Python and my IDE. And believe it or not, that's about as good an observation I came up with my coding this week... mostly I read JavaScript to be honest. But whatever. Python: my calculator of choice.

Housekeeping

Over the years, I've changed my naming convention for my web based resources:


THEN:
'./images/202013%205%2015%20images/Some%20Long%20Name%26Such.PNG'

NOW:
'./images/brettcode_13.png'

Yeah, the first was more descriptive, but the later is easier to maintain and doesn't push the limits of a 255 character path count.

This week's function takes one of those unwieldy file_path's and converts it into something closer to what I'd do now. Not a very useful function by itself, but its part of a larger script that automatically updates all the file_names along with any associated src/href links for this website all at the same time.


def clean_name(file_path):
    '''Tidies up name_part of file path (dir_path_part/name_part.ext)
            by removing spaces, special characters,
            and miscellaneous key words.
        Returned string is '_' seperated lower case.
        Trailing '.ext' if any is converted to lower case.
        Directory Path is completely unaffected.'''
    parts = file_path.split('\\')
    file_name = parts[-1]
    file_name = file_name.replace('.', '_', file_name.count('.') - 1)
    kill_list = ['Copyright Brett Paufler', '(c)', 'Copyright', 
                 ',', '&', '-', "'", '(', ')', '%' ]
    for c in kill_list:
        file_name = re.sub(re.escape(c), ' ',
                    file_name, flags=re.IGNORECASE)
    file_name = '_'.join(file_name.split())
    file_name = file_name.replace('_.', '.') # del trailing '_'
    file_name = file_name.lower()
    file_path = '\\'.join(parts[:-1] + [file_name])
    return file_path

Taking a Walk

The code below isn't that interesting (could any code from one such as I, truly be considered interesting), though I include it (in an abbreviated version, devoid of prints, etc.), since pushing code to the screen is a main purpose of this page. The other reason for this page is to discuss said code that I push; and therefore, what a computer-file tree-walking function such as this signifies to me (and thus, why it's the least bit interesting) and that would be because a tree-walking can be used to explore a new problem domain (to me): the structure of my computer. Who would have thought it?

Well, maybe me, since I've been reading up on Linux as of late. But since it will be months (or even years) before I transition over, I figure, in the meantime, this little bit of code will serve as the starting point for exploring the Microsoft Windows OS.

First stop? Well, there are darn near a half million files on my system with maybe 7% of those being dll's (equivalent to exe's, I'm told, who would have thought) along with another 6.5% being .png's (certainly seems like a lot of .png's to me). Figuring out the whys, wherefores, and who installed what, when, and where of that file mix along with the corresponding tree-structure (I'm thinking the Python's ETE Tree Exploration library will work well for imaging that last) will be a good place to start.

Of course, we'll see how I feel when I get back from my (yet, another) vacation, so who knows what the next post will be about? I can see this as being a very long term project.


from os import walk, stat
from os.path import join
from csv import writer

def catalog_computer():
    '''Saves list of all files on system,
    along with associated file statistics.'''
    error_list = []
    dir_out = 'C:\whatever'
    with open(join(dir_out, 'catalog.csv'), 'wb') as csv_file:
        csv_writer = writer(csv_file)
        for root, _, names in walk('C:\\'):
            for name in names:
                f = join(root, name)
                try:
                    a = [f] + list(stat(f))
                    csv_writer.writerow(a)
                except WindowsError:
                    error_list.append(f)
        if error_list:
            with open(join(dir_out, 'errors.txt', 'wb') as error_txt:
                for err in error_list:
                    error_txt.write(err)

Toggle

I admit it. I was simply messing around, trying to come up with something to post. I figured I'd implement an __iter__ method in some simple Class example. But in getting the yield to work, I believe I formulated something cleaner, morer in keeping with my intent to post wonderful code. Heck, I might even understand how it works...


from itertools import cycle

def toggle(a=cycle([True, False])):
    return next(a)

print toggle(), toggle(), toggle()
#True False True

HTML Tag Tutorial

Another Lightning Talk (not given, would be happy to present to your group), having nothing to do with Python, so I feel obligated to put something else up as well. Although, seeing as how I haven't coded much this last week, that means I'll have to think of something first. So, come back in a day or two, if you'd please, for something more...

Functional Programming Lightning Talk

I prepared a longish sort of 'Lighting Talk' (so perhaps not really a 'Lightning Talk', as such) over the week, focusing on Functional Programming. I don't know if I want to say these two lines of code are the highlight (that honor likely goes to the wrapper examples), but I think they're pretty nice bits of code to 'Oh' and 'Ah' over, as examples of currying.


#Pre-filling with a function
curry_func = lambda y: map(lambda x: x*x, y)
print curry_func(range(5))
#[0, 1, 4, 9, 16]

#Pre-filling with data
curry_data = lambda y: map(y, range(5))
print curry_data(lambda x : x*x)
#[0, 1, 4, 9, 16]

Genetic Programming

The Lightning Talk I prepared last week went splendidly, so I was motivated to put together another talk on Genetic Programming. I haven't presented it, yet. And I don't know when I will. But it's ready to go whenever the group is.

As to more personal projects, I wasn't motivated to work on MutaGenetic Football^TM much this last week. Instead I opted to do a quick (thought it would be quick) image filter Friday night. It took longer than I thought (obviously). But part of that time was deciding to work on a library of reusable functions for my image work. Typically, I would make a custom name formatter for each project. But from now on, this function should be handle to handle the name formatting aspects of all (most every) image manipulation project I pursue... or at least, I hope.


 def format_save_name(image, *args):
    '''Given an image name, returns a save_name (sN)
    with any *args inserted as image_arg1_arg2.png.
        format_save_name('this', 3, 'that', 4.1, 'the_other', 6.957)
            ./output/c_this_003_that_00410_the_other_00695.jpg'''
    #TODO: Make platform independent?
    sN = image.replace('input', 'output')
    if args:
        arg_list = []
        for a in args:
            if isinstance(a, bool) or a == True or a == False:
                if a == True:
                    a = '_True'
                else:
                    a = 'False'
                arg_list.append(a)
            elif isinstance(a, int):
                arg_list.append('{:>03}'.format(a))
            elif isinstance(a, float):
                arg_list.append('{:>05}'.format(int(round(a * 1000)/10)))
            else:
                arg_list.append('{}'.format(a))
        label = '_' + '_'.join(arg_list)
    else:
        label = ''
    sN = sN[:-4] + label + sN[-4:]
    return sN

Not magic. But I think its a good step on the way towards becoming more efficient with my time.

Also, who would have thought it, but bool is subclassed from int, so it was important to check isinstance(a, bool) prior to isinstance(a, int).

Break's Over

Back from vacation. I'm not motivated to write a long entry. Currently working on my MutaGenetics Football^TM project (an Agent Based Model with the logic powered by a simple Genetic Algorithm), which for the most, informs these few sparse notes.

I'm finding assert a == b to be more convenient than


if a != b: raise Exception('a != b')

.

I discovered that enumerate(some_list, 1) starts the numbering at 1, so eliminates the need to +1 the enumerated variable, resulting in cleaner code. In fact this entire page (for the most) is a long winded ode to a quest for cleaner code. But whatever.

Finally, for a Lightning Talk I'm planning on presenting later this week, I managed to put together a rather short (I think) code snippet that modifies an image's colour layers.


def image_crack_demo(image='./input/alcatraz.png',
                     sN='./output/alcatraz.png',
                     factors=(2,3,5)):
    '''Color Layer 'coolness' effect applied to passed image file.
        factors are amount (r,g,b) color layers are multiplied by'''
    img = imread(image)
    img[:,:,0] *= factors[0]
    img[:,:,1] *= factors[1]
    img[:,:,2] *= factors[2]
    imsave(sN, img)

So, I'm pretty stoked about that.
And with that, I shall call this week's entry a wrap.

...
In which time passes, while I travel about the East Coast; next month, on to Germany!!!
...

Boids

Sixty hours (at a wild guess), over two weeks, utilizing three classes to organize the code structure. I didn't intend it to be a study in OOP, but perhaps, that's what it was more than anything else. Complete code at the link, so nothing relating to it further, here.

As for actual code to push to the page, here's a quick script that takes an image (or images) and outputs a bunch of black and white variations. For me, the input are the images from my front page; and the output became the basis for the Business Cards and/or Social Calling Cards I'll be handing out over the summer.

import os
import numpy as np
from skimage.io import imread, imsave

for fN in os.listdir('./imageIn'):
    print "CONVERTING: " + fN
    
    #Black White
    img = imread('./imageIn/'+fN, as_grey=True)
    sN = './imageOut/' + fN[:-4] + '_BW' + fN[-4:]
    imsave(sN,img)
    print sN

    #BW, Posterize (contrast options)
    for x in range(2,5):
        y = 1.0/x
        img = imread('./imageIn/'+fN, as_grey=True)
        img = np.round(img/y) * y 
        sN = './imageOut/' + fN[:-4] + '_BW_P%d' % x + fN[-4:]
        imsave(sN,img)
        print sN

    #BW, Reverse Image (Negative Effect)
    img = imread('./imageIn/'+fN, as_grey=True)
    sN = './imageOut/' + fN[:-4] + '_BW_R' + fN[-4:]
    img = (-1 * img) + 1
    imsave(sN,img)
    print sN
    
    #BW, Reversed, Contrast Options
    for x in range(2,5):
        y = 1.0/x
        img = imread('./imageIn/'+fN, as_grey=True)
        img = (-1 * img) + 1
        img = np.round(img/y) * y 
        sN = './imageOut/' + fN[:-4] + '_BW_R_P%d' % x + fN[-4:]
        imsave(sN,img)
        print sN

Time-wise, it was probably a wash between figuring out how to do this in a traditional photo-ap and just coding is from scratch: probably less so, considering it's a thirty second job from here on out.

Obviously (or perhaps not, so that's why I'm saying), I didn't spend any time refactoring or creating function calls. And I can't say I ever will. As the effects are fairly straightforward, (in my mind) writing function calls would (in most cases) be overkill and (in many instances) would simply obscure the code. For instance, the Negative effect could be encapsulated as:

def negative(img):
    '''returns negative of a black and white image
        ex: [[0.2,0.5], [0.0,1.0]] becomes [[0.8,0.5],[1.0,0.0]]
    '''
    return (-1 * img) + 1

Truthfully, that just seems like a bit of obscuration to me. But I see this in code bases all the time, so to each their own.

And not that I have an extensive fan base that anxiously awaits my every post, but hopefully there will be some major quality improvements between this post and the next, as I'm taking the summer off; or so, that is the plan...

Revenge of the Boids

Last week I posted the collision logic for my Boids Project; and since then, I've pretty much removed all collision logic from the code base: instead, opting for a default distance degradation formula. I've also switched over to Numpy (for ease of doing full Vector Math); and so, the d looping, of which I was so proud, has been swiftly (and/or firmly) excised from the code. Ah, the joys of being a junior level programmer: the code is so easy to refactor for the good. Eh, and who knows? Maybe next week I'll actually finish this project. Better be, I'm going on vacation soon. And next week's is the last update for... well, the summer, I'm thinking. Got to celebrate my birthday, don't you know: Fifty Years, doing it in style.

class Boid():
	'''Craig Reynold's: Boid Object
	'''
	
    def __init__(self):
        '''pos=position, vel=velocity,
            delta=proposed changes to velocity
        '''
        self.pos   = np.array([0.0, 0.0])
        self.vel   = np.array([0.0, 0.0])
        self.delta = np.array([0.0, 0.0])

    def distance(self,b2):
        '''scalar distance between two boids, self and b2
        '''
        return np.linalg.norm(self.pos - b2.pos)

    def add_delta(self,vL):
        '''increments delta by up to maximum allowed amount
        '''
        if vector.vec_len(vL) > Boid.max_delta:
            vL = vector.vec_unit(vL) * Boid.max_delta
        self.delta = self.delta + vL
		
def vel_avoidance(b1, boidList):
    '''adjusts delta of b1 based on proximity to any/all in boidList
    '''
    ave = np.array([0.0,0.0])
    for b2 in boidList:
        ave += (b1.pos - b2.pos) / pow(b1.distance(b2), 3)
    b1.add_delta(ave)

Craig Reynold's: Boids

A work in progress, so just a little snippet this week.

And with that said, the starting if needs to refactored out eventually. But even without that correction, this function makes quite the filter. But perhaps even more importantly, by implementing the Boid.size and b1.pos variables as lists, I am able to make them arbitrarily long. So, although I'm only planning on doing the x,y axis on this project (2D), if I found a nice 3D imaging library (or wanted to do some research into the Fourth Dimension), turning out fully formed higher dimensional data would be as simple as increasing the static Boid.size variable (from, say, [100,100] to [100,100,100], which just between you and me, seems pretty easy). And it's not like that list implementation came at any additional effort, as implementing the for loop turned out to be easier (in my mind, anyway) than providing repeated hard coded logic for separate x and y controls.

Anyway, long story short, I'm very satisfied with the [d] part and cognizant enough to realize shortcomings of the leading if.

def wrap_collision(b1,b2):
    if Boid.edge=='wrap':
        for d in range(len(Boid.size)):
            if (((b1.pos[d] <= Boid.coll_rad) and 
                (b2.pos[d] >= (Boid.size[d] - Boid.coll_rad))) or
                ((b2.pos[d] <= Boid.coll_rad) and
                (b1.pos[d] >= (Boid.size[d] - Boid.coll_rad)))):
                return True
    return False

The following is, perhaps, a much better code snippet. In searching for a toggle, I found much more complicated solutions including such things at itertools.cycle. But the following is much simpler, straightforward, and works using only the standard library (no need to import anything).

print "Two Way Toggle"
a,b = 1,2
for _ in range(50):
    print a
    a,b = b,a
#outputs: 1, 2, 1, 2, etc.
	
print "Three Way Toggle"
a,b,c = 1,2,3
for _ in range(50):
    print a
    a,b,c = b,c,a
#outputs: 1, 2, 3, 1, 2, 3, etc.

Bisect

Mastery is not cultivated in a day, but over time in little spurts and jumps, here and there. So with that in mind, I uncovered the use of the bisect module this week. And although the example is right out of the Python Docs, it works, and is now part of my ever increasing repertoire.

import bisect
import random

grade = 'FDCBA'

name = ['Brett', 'Henry',
        'Steve', 'Bob',
        'Tony', 'Philip',
        'Dave', 'John']

for n in name:
    if n == 'Brett': #my name
        x = 100
    elif n == 'Henry': #my middle name
        x = 93
    else:
        x = random.randint(44,92) #the rest can suck it

    print "%s earned an %s with a score of %d" % (
                n,
                grade[bisect.bisect([60,70,80,90], x)],
                x)
#Yields
#Brett earned an A with a score of 100
#Henry earned an A with a score of 93
#And so on,
#But I can't say I care about the rest

And just in case that's unclear or somehow I forget what this is all about, bisect is amazingly good at grouping values by range (i.e. the grade string 'FDCBA' is value split at [60,70,80,90] based on the randint x).

Conway's Life Bitwise Operator

The bitwise logic for the game reduces to the following with the link above for those who desire a more complete understanding.

life = (life | life<<1 | life>>1) & ~(life & life<<1 & life>>1)

Bit Operators

This week (and most of last), I was probably at the place where most folks jump languages (a little bored with Python and not quite sure where I could take it). So, rather than jumping ship, I decided to move laterally and look at something new. Crypto-Analysis caught my eye (perhaps because every time I go to job fairs, look online, or what have you), the law enforcement sector always seems to stand out as a promising path for the future. So, I looked into that: cryptoanalysis, which led me to hash functions, which led me to bitwise operators (the first time in two years that I've had the need).

c = [(13,13), ('13<<2',13<<2), ('13>>2',13>>2),
     (' ',0),
     ('13|~13', 13|~13), ('13&~13', 13&~13), ('13^~13', 13^~13),
     ]
for k,v in c:
    if k == ' ':
        print
    else:
        print "{: <10}\tNum: {: >4} \t numpy: \...
               {: >10} \t Python: {: >10} ".format(
                      k, v, np.binary_repr(v,8), bin(v))
        #\... being a line break for web-layout purposes
        #no actual line break in code

Yields:

13        Num:   13  numpy:   00001101  Python:     0b1101 
13<<2     Num:   52  numpy:   00110100  Python:   0b110100 
13>>2     Num:    3  numpy:   00000011  Python:       0b11 

13|~13    Num:   -1  numpy:   11111111  Python:       -0b1 
13&~13    Num:    0  numpy:   00000000  Python:        0b0 
13^~13    Num:   -1  numpy:   11111111  Python:       -0b1

NxN: Pixelated GiF Effect

Full write up above, so just the uncommented highlights, below.

def nxn(image):
    '''returns a pixelated gif and intermediate jpg images
    given an image location address in
    '''
    
    img = imread(image, dtype=int)
    
    p = []
    for n in range(2,6):
        pI = np.power(img,n)
        sN = image[:-4] + '_power_' + str(n) + image[-4:]
        imsave(sN, pI)
        p.append(pI)
    
    sN = image[:-4] + '_powerGIF.gif'
    clip = mpy.ImageSequenceClip(p, fps=5)
    clip.write_gif(sN)

Numpy Test Pattern: Mandelbrot

Reviewing libraries, I have been, as of late (with Numpy the Latest). Somehow, looking into the Mandelbrot Set at the same time seemed like the perfect synergistic fit. Full code at link, the teaser being:

def base_iter(x,c):
    '''This is the Heart of solving the Mandelbrot Set
    Full numpy arrays are both fed in and returned
    '''
    return x*x + c

The Zebra Problem

This is another one of those logical problems I never solved. At this point, I'm thinking there's some (possibly second order) logical filter that I'm not understanding (or aware of). And since, getting a solution was never important to me (rather it was the journey I cared about, I never cared on any intrinsic level which house the Zebra resided within), so I'm OK with (posting) a partial answer, a playing with the problem for a while.

Anyway, in my play, there are four parts (sub-functions) of the non-solution that I am particularly proud of (fun little optimizations):

Checking to ensure all clues are in the solution set (no misspellings, key-errors)
Extracting negative clues from the positives clues given (code below, logically_different())
Looping the main function so levels of complexity are added incrementally (code below, solve())
And finally, an optimal ordering of the solution set from most clues to least to filter out as much as possible up front.

def logically_different():
    '''Returns a list of all logically different constructs.
    
    Given 'truths' to be a global list of tuples of the form:
    
    truths = [
        #There are five houses.
          
        #The  man lives in the red house.
        ('english', 'red', 'same'),
    
        #The Swede has a dog.
        ('swede', 'dog', 'same'),
        ]
    
    Since english=red & swede=dog;
        english!=dog & swede!=red
    
    lD=logicallyDifferent = [
        ('english', 'dog', 'different'),
        ('swede', 'red', 'different'),
        ]
    
    '''
    #tp = [(a,b),(b,a)...] for (a,b) in truths
    tP = [(a,b) for (a,b,c) in truths if c=='same']
    tP += [(b,a) for (a,b) in tP]
    
    #world is a global 5x5 array, listing the solution set
    lD = [(y,b,'different') 
           for i,(a,b) in enumerate(tP[:-1])
           for (x,y) in tP[i+1:]
           for w in world
           if (x in w and a in w)
           and not (a==x or a==y or b==x or b==y)]
    return lD

Next is the main solve(), which sequentially adds complexity to the solution set (i.e. solves as best as it can for the type of [cigarette] before moving on to [cigarette, color], then [cigarette, color, nationality], and so forth).

I get 37,554 possible answers when all is said and done, so something is wrong with my apply_truths_to_sol_set(), a something I do not expect to resolve anytime soon. So like, there's a limit to my logical reasoning abilities and/or that's what Phd's are for, right?

def solve(x=False):
    '''Incrementally adds a level from the world
           and reduces solution set
    
    world = [
         ['english', 'swede', 'german', 'norwegian', 'dane'],
         ['pallmall', 'bluemaster', 'dunhill', 'prince', 'blend'],
         ['milk', 'water', 'beer', 'coffee', 'tea'],
         ["zebra", 'cats', 'horse', 'birds', 'dog'],
         ['blue', 'yellow', 'white', 'red', 'green'],
         #[0,1,2,3,4],
         ]
    
    '''

    #Initialization
    solSet = [list(s) for s in permutations(world[0])]
    print '\n world:0'
    print len(solSet)
    print solSet[0]
    # world:0
    #120
    #['pallmall', 'bluemaster', 'dunhill', 'prince', 'blend']

    #Adding the first level
    solSet = [[s] + [list(c)] for s in solSet
                              for c in permutations(world[1])]
    solSet = [s for s in solSet if apply_truths_to_sol_set(s)]
    print '\n world:1'
    print len(solSet)
    print solSet[0]
    #world:1
    #1448
    #[['pallmall', 'bluemaster', 'dunhill', 'prince', 'blend'],
    #  ['blue', 'red', 'yellow', 'green', 'white']]
	
	
    #All subsequent levels, not condenced due to list-alignment issues
    #    [s] above, vs s below
    y = len(world) - 1
    if x and x < y:
        y = x
    for n in range(2,x):
        solSet = [s + [list(c)] for s in solSet 
                                for c in permutations(world[n])]
        solSet = [s for s in solSet if apply_truths_to_sol_set(s)]
        print '\n world: %d' % n
        print len(solSet)
        print solSet[0]
    #world: 2
    #5026
    #[['pallmall', 'bluemaster', 'dunhill', 'prince', 'blend'],
    # ['blue', 'red', 'yellow', 'green', 'white'],
    # ['english', 'swede', 'norwegian', 'german', 'dane']]
	
solve(x=3) #output shown in code above

Never Get Out of the Boat

A full expose of this week's work is available at the link above, so I'll keep this extraordinarily short.

def list_all_brothers(family_tree):
    '''returns a list of all brothers in given family_tree
    '''
    return [(a,w) for (a,b,c,d) in family_tree
                      for (w,x,y,z) in family_tree
                      if are_brothers((a,b,c,d),(w,x,y,z))
                      and a < w]

That '<' on the last line, a thing of beauty. By including that little snippet 'and a < w', I was able to delete two more filters (one for a==w, and the other for when both (a,w) & (w,a) occured in the list). Magic!

See link for full grandeur of this project. Very proud. Much WoW!

Regex

I'm still (slowly but surely) playing with reddit, slowed down somewhat as I don't really know what I want out of the project anymore...

Anyway, given a dictionary of regex's:

regex_groups = {'bots': '(?&lt;!ro)bot[s]?$'
                'reddit':'^reddit$',
                'redditIn':'reddit',
               }

A simple re.search() finds the relevant matches without having to call separate startswith(), endswith(), in, or not in.

for k,v in regDict.items():
    fA = [a for a in aF if re.search(v,a)]

The total conflagration of code being as follows with an equally long regDict for all sorts of mix and match comparisons.

def author_title_review(regDict, test=True):
    '''Prints Raw Data Pre-Analysis for reddit
    (throw away code)

    test=true means use only one data file
    test=False, use them all
    '''
    
    #front_page
    fP = load_data(category='front_page', test_run=test)
    #author
    aF = [a.lower() for a in list(fP['author'])]
    aFL = len(aF)
    #title
    tF = [a.lower() for a in list(fP['title'])]
    tFL = len(tF)
    
    #new
    nP = load_data(category='new', test_run=test)
    #author
    aN =  [a.lower() for a in list(nP['author'])]
    aNL = len(aN)
    #title
    tN = [a.lower() for a in list(nP['title'])]
    tNL = len(tN)
    
    for k,v in regDict.items():
        fA = [a for a in aF if re.search(v,a)] #front author
        nA = [a for a in aN if re.search(v,a)] #new author
        fT = [a for a in tF if re.search(v,a)] #front title
        nT = [a for a in tN if re.search(v,a)] #new title
        
        #report
        print "\n%s" % k
        #author
        print "author: (frontPage, newPage, ratio)" 
        print "fP: %d, %.4f %s" % (len(fA), len(fA)/float(aFL), fA)
        print "nP: %d, %.4f %s" % (len(nA), len(nA)/float(aNL), nA)
        if fA and nA:
            print "Ratio fp/nP: %.4f ::fP(%d/%d), nP(%d/%d)" % (
                   len(fA) * aNL / float(aFL) / len(nA),
                   len(fA), aFL, len(nA), aNL)
        
        #title
        print "title: (frontPage, newPage, ratio)" 
        print "fP: %d, %.4f %s" % (len(fT), len(fT)/float(tFL), fT[:10])
        print "nP: %d, %.4f %s" % (len(nT), len(nT)/float(tNL), nT[:10])
        if fT and nT:
            print "Ratio fp/nP: %.4f ::fP(%d/%d), nP(%d/%d)" % (
                   len(fT) * tNL / float(tFL) / len(nT),
                   len(fT), tFL, len(nT), tNL)

A fair bit of code duplication to be sure, but at some point, code duplication is the clearer, more maintainable, easier to read solution.

Anyway, this really is throwaway code, once I write up the html for the project, I'll likely never use this code again. Either way, using those regex's sure made my life easier. Won't be forgetting that trick anytime soon.

Oh, and perhaps I should mention something about my findings, so here's an interesting pair of statistics. Of the default subreddits included in the front page, less than %0.1 (or 1/10^th of 1%) are made by self-proclaimed bots, whereas for reddit as a whole, +/-3.0% of the posts in the incoming stream are created by bots.

The revolution is nigh, my friends. The revolution is nigh.

Flea Circus

The code is based on a Project Euler question... that I didn't get right and probably never will. The actual answer is fairly close to (tends towards):

import math

print "Project Euler Jumping Fleas Problem: %.7f" % (900.0/math.e)

But tends towards and being correct to six decimal places are two different things. Anyhow, I'm not really a math guy. I have no Phd. I'm OK. Better than the average bear. But it's not why someone is ever going to hire me, as plenty of folks have math degrees. Which is to say, I'm not really stressed about getting the wrong answer as the real reason I was looking at Project Euler in the first place was because that's where I cut my Haskell chops (slim as they might be) and I was looking for inspiration vis a vie doing some function programming in Python. And I think that's pretty much what this is: a fine exercise in functional programming, even if the answer it spits out isn't accurate to six decimal places.

#PE-213 
import random

def fleaJump((x,y)):
    a,b = random.choice([(x+1,y),(x-1,y),(x,y+1),(x,y-1)])
    if a<0 or b<0 or a>29 or b>29:
        return fleaJump((x,y))
    else:
        return (a,b)

def jumpAll(fleaList):
    fleasNext = []
    for flea in fleaList:
        fleasNext.append(fleaJump(flea))
    return fleasNext

fleaStart = [(a,b) for a in range(30) for b in range(30)]

turn = [fleaStart]

moves = 200
for m in range(moves):
    turn.append(jumpAll(turn[-1]))

emptySquaresPerTurn = [900 - len(set(t)) for t in turn]
average = sum(emptySquaresPerTurn[-100:]) / 100.0

print average

And if that's not Functional enough for you, my first draft included such monstrocities as:

moves = lambda : 200

virtualJump = lambda (x,y) : random.choice([
              (x+1,y),(x-1,y),(x,y+1),(x,y-1)])

But I think that's taking things too far.

Anyhow, until next week.

The Simple Joys of Life

I'm really excited about this first code snippet, so much so that instead of saying 'I haven't tested it yet', I paused to take a moment to do just that, so I know it works. And what does it do? It converts a list of numpy type arrays into a gif in two odd lines of code. Or in other words, off the shelf, it scratches an itch in about as few words as it takes to describe what is desired (close to the limits of abstraction, I'd say).

import moviepy.editor as mpy
import skimage.io

img1 = skimage.io.imread("./imageIn/image_01.jpg")
img2 = skimage.io.imread("./imageIn/image_02.jpg")
imgList = [img1,img2]

clip = mpy.ImageSequenceClip(imgList, fps=1) 
clip.write_gif("./imageIn/image_01_02.gif", fps=15)

I think I'm going to love this library.

Continuing on with the theme of high level abstraction that works like a dream out of the box (and continuing my analysis of reddit), this next converts a csv file containing the headers [id, domain, subreddit, author, time] for around 45,000 reddit posts into a pandas dataframe, including conversion of the time into something human readable, in only a few lines of code. Originally, I thought I'd be able to convert the time by passing an argument to .from_csv(), which may well you can, but I couldn't figure it out. Still, four lines of code, not bad considering if pandas didn't have this function built in, I'd have had to spend at least a few minutes building a fragile custom formatter.

def load_mass_pull():
    '''returns the mass_pull.csv as a pd.Dataframe
    '''
    mPath = r"C:\mass_pull.csv"

    dF = pd.DataFrame.from_csv(mPath)
    dF['time'] = pd.to_datetime(dF['time'], unit='s')

    return dF

The following is pretty sweet as well, as it returns the total timespan that the returned dataframe for the above data pull represents.

def timespan(dF):
    '''returns amount of time the data pull spans
    '''
    start = min(dF['time'])
    finish = max(dF['time'])
    timeDelta = finish - start
    print "Start Time: %s" % start
    print "Finis Time: %s" % finish
    print "TIME SPAN: %s" % timeDelta
    return timeDelta

Which may not be all that clear, so without the print statements, the above reduces to:

def timespan(dF):
    return max(dF['time']) - min(dF['time'])

For which it is hardly worthy creating a function -- the best type of function of all. Oh, and just in case you're wondering, for the 5:25:58 (Hour:Minute:Seconds) slice I looked at, there were 44,997 reddit submissions, for a stream rate at the hose of around 8,250subs/hour or +/- 2.3 posts a second if you prefer your data that way.

POSTER-izing an Image

Saw a cool picture on Reddit. In the comments, someone described what was happening (reduction in the number of color bands). And an hour later, I had a working effect. Very gratifying, I must tell you, to go from inspiration, to finished product on something seemingly so involved in such a short span of time.

def posterize(image, sN, colors=1):
    '''Poster-izes the given image
    a flat cartoon, color-band reduction effect
    
        image is path to image
        sN is save name
        colors is how many jumps from 0-255 in color palette
        
            0 throws an error
            1 sometimes works as an edge detector of sorts
            2-3 pretty standard posterize for most images
            5-7 not bad for low light crap shots
            for a preview set, go with [1,2,3,5,7]
    '''
    print "POSTER-izing:\t %s" % image
    
    img = skimage.io.imread(image)
        
    s = 255 / colors
    r = (img[:,:,0]/s)*s
    g = (img[:,:,1]/s)*s
    b = (img[:,:,2]/s)*s
   
    skimage.io.imsave(sN, np.dstack((r,g,b)))

Hello PHP!

Yeah, I learned PHP over the weekend... or, you know, not. What I did most definitely do was put together a Hello World program. But being able to say 'Hello' and speak a language with any degree of fluency are two completely different things.

The following saves the current date to file, which one could think of as the start of a log file.

<?php
echo file_get_contents("php.txt");
$t = date("Y-m-d H:m:s") . "<br>\n";
file_put_contents("php.txt", $t, $flags=FILE_APPEND);
?>

At one time, I had this as working code on my site (with a link herein), but the long term lesson is that given my web-hosting plan it's not worth it to me to deal with special php permissions for a single application. So, I took the page down.

Render an Image as a HTML Table

I'm slow to learn Django, Falcon, Pylons, Flask, or any of the other popular Python web-frameworks as pushing static content to the web, which is all I want for myself, doesn't require the complexities of a relational database. However, since a lot of my web content starts life as Python code (image manipulation, database tables, etc.), using web page templates makes a lot of sense to me (to speed the visual debugging process if nothing else). That being said, this is the main juice for my HTML Tables as Images web page, which I pass in as the body text to a web page template.

def render_image_as_table(image="work.jpg"):
    '''outputs a formatted html table that looks like the input image
    '''
    
    print "render_image_as_table starting on %s" % image
    img = io.imread(image)

    table = '\n<center>\n<table>'
    for w in range(img.shape[0]):
        table += '\n<tr>\n'
        for h in range(img.shape[1]):
            r = img[w,h,0]
            g = img[w,h,1]
            b = img[w,h,2]
            table += '<td style="background-color:rgb(%d,%d,%d);"'
            table += '></td>' % (r,g,b)
        table += '\n</tr>'
    table += '\n</table>\n</center>\n\n\n'
    table += '<br><br>\n\n\n'
    
    print "render_image_as_table returning table for %s" % image
    return table

Glob in lieu of os.listdir()

I'm currently working on Reddit crawler using the PRAW library.
Having saved previous crawls as a series of csv data files, this function presorts those files and returns a pandas dataframe for further offline analysis based on the passed arguments.

Of course, after refactoring those long print statements into something that doesn't muck up the rest of the web page (i.e. shortening them down to an 80 line width, so the rest of the page isn't skewed off center), the code no longer looks as clean and pretty as it once did to my eye. But what are you going to do? Truth is, after using this for a few weeks, I can assure you that what I see most of this function are all those wordy print statements and I am grateful for them. So, beauty loses out to utility this round. But having said that, I have a feeling I'll trim down the code (refactor more, minimize print statements, etc.) on subsequent posts. Live and learn, I suppose. Live and learn.

def thousands_to_dataframe(dfr = "C:\\reddit_data\\",
                           last_first=False,
                           hour_filter=0,
                           max_num_files=1000):
    '''
    assembles files in dfr of form "*thousands.txt" that pass filters
        into one comprehensive pandas dataFrame
    
    dfr = data file repository (where files are kept)
    hour_filter = min number of hours between data sets
    last_first = reverses sort order
    max_num_files = maximum number of files to compose data set from
	
    sample usage:
        thousands_to_dataframe()
            returns the first 1000 csv files in one pandas dataframe
        thousands_to_dataframe(hour_filter=24)
            returns sets sequenced at least 24 hours apart
        thousands_to_dataframe(last_first=True, max_num_files=1)
            returns the last crawl as a pandas dataframe
    '''
    
    #dfr
    rawThousands = glob.glob(dfr + "*thousands.txt")
    
    #last_first
    rawThousands.sort()
    if last_first:
        rawThousands.reverse()

    #hour_filter
    thousands = [rawThousands[0]]
    print "Passing Filter (last_first=%r, hour_filter=%d):\n\t%s" % (
                           last_first, hour_filter, rawThousands[0])
    lastTime = datetime.datetime.strptime(rawThousands[0],
                        dfr + "%Y-%m-%d-%H-%M-%S_thousands.txt")
    for t in rawThousands[1:]:
        thisTime = datetime.datetime.strptime(t,
                          dfr + "%Y-%m-%d-%H-%M-%S_thousands.txt")
        timeFilter = datetime.timedelta(hours=hour_filter)
        if abs(lastTime - thisTime) > timeFilter:
            print "\t%s" % t
            thousands.append(t) 
            lastTime = thisTime 
    
    #max_num_files
    print "max_num_files=%d" % max_num_files
    thousands = thousands[:max_num_files]
    
    #to pandas dataFrame
    dFList = [pd.DataFrame.from_csv(t) for t in thousands]
    dF = pd.concat(dFList,ignore_index=True )

    print "%d Files Considered: %d Files Passed Filter" % (
                            len(rawThousands), len(dFList))
    print "Returning Pandas Array (thousands_to_dataframe): %s" %
                                                    str(dF.shape)
    print dF.head(2)
    print "End thousands_to_dataframe(last_first=%r,
                   hour_filter=%d, max_num_files=%d)\n\n" % (
                   last_first, hour_filter, max_num_files)
    return dF