Python Library Review

A file dump concerning my personal review and evaluation of select Python Libraries.

Individual results may vary.

Astropy

I found no use for this.

Base concerns itself mainly with physics values (kg/s, that sort of thing).
Has Fits support (a large image file format used in astronomy).
And multiple sky coordinate systems.

The values package seems the most interesting and likely a very good start for this kind of thing. Alas, I have no need/interest.

Biopython

Not my domain. I found little to extract for other projects.

String manipulation routines for protein sequences central to the functionality.
I'm pretty sure (going by memory at this point) that there's slight tree coverage, but I'd use ete2 for that, instead.

This nifty image was the only real use case I found in the library. And, ironically enough, I just went to the webpage to get it to work right.

Cubes

Data Brewery & Bubbles
It took me longer to figure out what this library was about than it should have.

It's a:

Website Framework

using Flask

Which can Combine Multiple Data Sources

any number of separate SQL instances

different SQL flavours OK

with secondary support for csv, mongolDB, etc.

With Built-In Presentation Support

Graphs, Tables, etc.

I'm not big on web presentation. Certainly a utility such as this makes sense to learn after one has learned Flask (the web framework on which it is built). But mostly, if one already owns the data, then combining all the data into a single SQL server instance seems like the most common use case (a use case which makes this library superfluous), thus regulating the use of Cubes to instances where one does not own the data (a use case I have no personal interest in pursuing).

So, instead, I shall move on... perhaps to SQL, instead.

Kivy

I stopped working on my test Kivy project more through lack of desire in pursuing the project (or any similar project in the foreseeable future) more than anything else. Honestly, it felt a lot like programming in JavaScript to me (i.e. Event Driven Programming). So, learn a bit of syntax; and from there, probably reasonably straightforward once one internalizes the class structures.

Anyway, in short, the reason to use Kivy is because it is a program once deploy everywhere library. So, code once, and deploy as a Linux Distro, iPhone App, Android App, and native Windows exe all with the same base code (just differing distribution packages)... or so is the claim. I never got that far. But I was pleased with where I got, which was mostly dials, buttons, and GUI things of that nature. I doubt this will ever be my strong suit (GUI's), so I don't have much belief I'll be furthering my investigations into this library any time soon. However, I'd give it a solid four stars, maybe five; the documentation being a bit iffy; but then, my knowledge of Event Driven Programming (and GUI's in general) is a bit iffy, so it's really not for me to say.

Bottom line, if I decide to do an App, I'm reaching for this library first.

NLTK: Natural Language Tool Kit

It's goes against my philosophy of website design to link to outside pages (as those pages tend to get moved and then a person is left with a bunch of dead links). But suffice to say, I liked my walk through of NLTK enough that I almost put a bunch of links here:

NLTK has an amazing introductory walkthrough / Python Tutorial

List Comprehensions
Regular Expressions
Tree Structures
And pretty much every idea needed to work NLTK in Python

NLTK has a bunch of linked Word Corpus

Including WordNet

NLTK Specializes in Sentence Structure Analysis

Tokenization of words and sentences

Nouns, Adverbs, Verbs
a.k.a. Chunking
a.k.a. Sequence Classification
N-Gram Tagging
(s:n(cn),v(a,v))

So, tree structures

NLTK: Doesn't Have, but a good starting point for

Speech to Text (or Text to Speech)
Cognition (sentence meaning, interpretation)
First Order Logic

NLTK: Utility Functions

Confusion Matrix

Error origination tracking for misclassification of words

Sentence Generator

Lists of Word types (N,V)
Template of Structure (N-V-N)
Yields either random or all possible sentences

The API documentation sort of sucks (though there are extensive samples if one knows what they want). And the NLTK doesn't directly touch on my (previous) interests or scratch a current itch. But the online tutorial/walk-through is amazing! Props! Kudos! More libraries/fields could benefit from such an introduction. And having been exposed, it does seem sort of unlikely that I will never use/build upon the knowledge gained.

In short, I still have no use case (nor bothered to write any sample code), but after 2-4 weeks (I do seem to lose track of time), I consider my foray into the NLTK Library time well spent.

mpMath

Unlimited precision numbers right out of the box.
Matrix Math (True Matrixes: not Numpy compatible arrays)
Convenience functions

linespace() comes to mind

Presumably lots of Gotcha's (for the un-math savvy)
Graphing of Complex Numbers Looks Interesting

And then, you got me as to why I'm writing so much on this trivially small library

I guess sometimes that's just what I do...

All that and it's easy enough to get one's feet wet:

import mpmath

#set precision
mpmath.mp.dps = 50
print mpmath.mp.dps
     #50

#fun with floats
y = mpmath.mp.mpf('1.00000012000000000021')
print y
     #1.00000012000000000021
print y - 1.0
     #0.00000012000000000020999999999999999999999999999999974513
print y - mpmath.mp.mpf('1.0')
     #0.00000012000000000020999999999999999999999999999999974513
	 
#complex numbers
z = mpmath.mp.mpc('1','0.00000000000009500')
print z
     #(1.0 + 0.000000000000095j)
print y + z
     #(2.00000012000000000021 + 0.000000000000095j)
print y * z
     #(1.00000012000000000021 + 0.00000000000009500001140000000001995j)

I believe Sage is built on mpmath (among lots of others). And if I were to do Project Euler in Python, this would be one of my goto libraries (along with SymPy and the aforementioned Sage). But in truth, the above is about as far as I've gotten with this library thus far, so what do I know. Pure math really isn't my strong suit.

Numpy

Though I started this page to record newly reviewed libraries, it makes some sense for me to record my impressions (suggestions, over-reaching advice) on those libraries I already use regularly.

I use Numpy lots. It's the backbone of most of my image manipulation techniques (certainly the skill sets learned transfer easily). I like Numpy lots. And truth be told, I probably (perhaps, really should, but I've got to admit, it sounds sort of boring) should take a break from reviewing new libraries and refresh my knowledge of Numpy's API.

That said, what does Numpy do? Matrixes, arrays, numbers in tabular format, quick and easy. Perhaps doesn't sound like much, but the scikit stack, Pandas, and a whole slew of other libraries depend on Numpy, so it does what it does well, so well, I can't name it's nearest competitor off the top of my head. I don't really think it has one.

Two-Dimensional arrays in Python might as well be called Numpies.

Numpy, how do I love thee...

Pandas

Pandas is sort of like Excel for Python (it even does pivot tables). Have a table of data (or something that should be a table of data or could be a table of data), then Pandas just might be the solution.

If I'm exploring data, Pandas tends to be my first stop. And when working with Pandas, my advice is simple enough: get into the Pandas framework as soon as possible; use Pandas to load the csv, html, or SQL; do your stuff (stay in Pandas as long as possible); and then, at the end, output to whatever you need. While inside, Pandas will slice and dice your data; filter, qualify, and reorder your data; and make nifty graphs, html tables, and other such niceties as I am sure that I have not even begun to scratch the surface upon.

In short, if you're interested in data and using Python, I can't think of a single reason not to use Pandas... or the name of another data processing framework that would come close to Panda's power with so little effort. Bottom line, Pandas been easy to use and lightning fast for everything I've wanted it to do thus far.

It will be interesting to see the use case (and the solution) when it eventually doesn't.

PRAW

Python Reddit... something, something (maybe crawler)
I like reddit. I use PRAW as my API interface to do my crawling. It easy enough to get a submission object (and the documentation for this is clear enough), but once you have the submission object, you're pretty much on your own.

import praw

#initialize user agent
prawUserAgent = praw.Reddit(user_agent="some_string")

#pull x new submissions
#  there are other pull classes, like by ID
#  prawUserAgent.get_submission(submission_id=subId)
#  where subId is a six digit hex_decimal
newSubmissions = prawUserAgent.get_new(limit=intVarUpTo1000)

#explore the submission objects as you see fit, perhaps:
for submission in newSubmissions:
    print submission
    print dir(submission)
    print vars(submission)

Thumbs up for the library, but very soon, one is completely on their own. More information on the various objects along with default methodologies for traversing comment trees would be nice. Still, it's free, so who's complaining.

Scappy

I seem to be moving more an more away from web development these days (yeah, sure, an odd sort of statement to be found on a website, but there is it). And perhaps even further from web scrapping (there are just too many resources available that have API's or, better yet, that will export their data troves wholesale). Anyhow, for the rest of the Internet, there's Scrappy. I never liked Beautiful Soup, so if I were to scrape a website at random (and downloading the thing en mass with HTtrack wasn't an option), Scrappy would be my first choice of libraries to learn (a 2015 statement). My notes after going over the documentation very briefly include (once again, after a brief review and no use of code, I came away with the impression):

Command Line

I got the feeling Scrappy would prefer to be called as a separate process from the command line (as a subprocess call vs a library import).

Built in Spiders and Crawlers

Web Crawler
Web Scraper
Spiders

Really, don't ask me what the difference is, maybe none. I'm just copying over notes at this point.

Work Flow is Crawl Site Then Process Data

From what I saw (interpreted, made wild assumptions about), there was very little ability to crawl and process data concurrently. I view the default work flow to be load scrappy, crawl, save data, exit scrappy, and then, do whatever.

Wonderful Data Export Capabilities

JSON
XML
CSV

The last, a definite plus.

And that's about it. I downloaded the documentation thinking I would dig into it, but after an hour or two of reading it over, I realized I just didn't have the need... and therefore will to look into the library any deeper.

More to be added, whenever...

www.paufler.net

© Copyright 2015 Brett Paufler
All information derived one way or another from docs.python.org
paufler.net@gmail.com
Terms of Service