Python Library Review
A file dump concerning my personal review and evaluation of select Python Libraries.
Individual results may vary.
I found no use for this.
The values package seems the most interesting and likely a very good start for this kind of thing. Alas, I have no need/interest.
- Base concerns itself mainly with physics values (kg/s, that sort of thing).
- Has Fits support (a large image file format used in astronomy).
- And multiple sky coordinate systems.
Not my domain. I found little to extract for other projects.
This nifty image was the only real use case I found in the library. And, ironically enough, I just went to the webpage to get it to work right.
- String manipulation routines for protein sequences central to the functionality.
- I'm pretty sure (going by memory at this point) that there's slight tree coverage, but I'd use ete2 for that, instead.
Data Brewery & Bubbles
It took me longer to figure out what this library was about than it should have.
I'm not big on web presentation. Certainly a utility such as this makes sense to learn after one has learned Flask (the web framework on which it is built). But mostly, if one already owns the data, then combining all the data into a single SQL server instance seems like the most common use case (a use case which makes this library superfluous), thus regulating the use of Cubes to instances where one does not own the data (a use case I have no personal interest in pursuing).
- Website Framework
- Which can Combine Multiple Data Sources
- any number of separate SQL instances
- different SQL flavours OK
- with secondary support for csv, mongolDB, etc.
- With Built-In Presentation Support
So, instead, I shall move on... perhaps to SQL, instead.
Anyway, in short, the reason to use Kivy is because it is a program once deploy everywhere library. So, code once, and deploy as a Linux Distro, iPhone App, Android App, and native Windows exe all with the same base code (just differing distribution packages)... or so is the claim. I never got that far. But I was pleased with where I got, which was mostly dials, buttons, and GUI things of that nature. I doubt this will ever be my strong suit (GUI's), so I don't have much belief I'll be furthering my investigations into this library any time soon. However, I'd give it a solid four stars, maybe five; the documentation being a bit iffy; but then, my knowledge of Event Driven Programming (and GUI's in general) is a bit iffy, so it's really not for me to say.
Bottom line, if I decide to do an App, I'm reaching for this library first.
NLTK: Natural Language Tool Kit
It's goes against my philosophy of website design to link to outside pages (as those pages tend to get moved and then a person is left with a bunch of dead links). But suffice to say, I liked my walk through of NLTK enough that I almost put a bunch of links here:
The API documentation sort of sucks (though there are extensive samples if one knows what they want). And the NLTK doesn't directly touch on my (previous) interests or scratch a current itch. But the online tutorial/walk-through is amazing! Props! Kudos! More libraries/fields could benefit from such an introduction. And having been exposed, it does seem sort of unlikely that I will never use/build upon the knowledge gained.
- NLTK has an amazing introductory walkthrough / Python Tutorial
- List Comprehensions
- Regular Expressions
- Tree Structures
- And pretty much every idea needed to work NLTK in Python
- NLTK has a bunch of linked Word Corpus
- NLTK Specializes in Sentence Structure Analysis
- Tokenization of words and sentences
- Nouns, Adverbs, Verbs
- a.k.a. Chunking
- a.k.a. Sequence Classification
- N-Gram Tagging
- NLTK: Doesn't Have, but a good starting point for
- Speech to Text (or Text to Speech)
- Cognition (sentence meaning, interpretation)
- First Order Logic
- NLTK: Utility Functions
- Confusion Matrix
- Error origination tracking for misclassification of words
- Sentence Generator
- Lists of Word types (N,V)
- Template of Structure (N-V-N)
- Yields either random or all possible sentences
In short, I still have no use case (nor bothered to write any sample code), but after 2-4 weeks (I do seem to lose track of time), I consider my foray into the NLTK Library time well spent.
All that and it's easy enough to get one's feet wet:
- Unlimited precision numbers right out of the box.
- Matrix Math (True Matrixes: not Numpy compatible arrays)
- Convenience functions
linespace() comes to mind
- Presumably lots of Gotcha's (for the un-math savvy)
- Graphing of Complex Numbers Looks Interesting
- And then, you got me as to why I'm writing so much on this trivially small library
- I guess sometimes that's just what I do...
mpmath.mp.dps = 50
#fun with floats
y = mpmath.mp.mpf('1.00000012000000000021')
print y - 1.0
print y - mpmath.mp.mpf('1.0')
z = mpmath.mp.mpc('1','0.00000000000009500')
#(1.0 + 0.000000000000095j)
print y + z
#(2.00000012000000000021 + 0.000000000000095j)
print y * z
#(1.00000012000000000021 + 0.00000000000009500001140000000001995j)
I believe Sage is built on mpmath (among lots of others). And if I were to do Project Euler in Python, this would be one of my goto libraries (along with SymPy and the aforementioned Sage). But in truth, the above is about as far as I've gotten with this library thus far, so what do I know. Pure math really isn't my strong suit.
Though I started this page to record newly reviewed libraries, it makes some sense for me to record my impressions (suggestions, over-reaching advice) on those libraries I already use regularly.
I use Numpy lots. It's the backbone of most of my image manipulation techniques (certainly the skill sets learned transfer easily). I like Numpy lots. And truth be told, I probably (perhaps, really should, but I've got to admit, it sounds sort of boring) should take a break from reviewing new libraries and refresh my knowledge of Numpy's API.
That said, what does Numpy do? Matrixes, arrays, numbers in tabular format, quick and easy. Perhaps doesn't sound like much, but the scikit stack, Pandas, and a whole slew of other libraries depend on Numpy, so it does what it does well, so well, I can't name it's nearest competitor off the top of my head. I don't really think it has one.
Two-Dimensional arrays in Python might as well be called Numpies.
Numpy, how do I love thee...
Pandas is sort of like Excel for Python (it even does pivot tables). Have a table of data (or something that should be a table of data or could be a table of data), then Pandas just might be the solution.
If I'm exploring data, Pandas tends to be my first stop. And when working with Pandas, my advice is simple enough: get into the Pandas framework as soon as possible; use Pandas to load the csv, html, or SQL; do your stuff (stay in Pandas as long as possible); and then, at the end, output to whatever you need. While inside, Pandas will slice and dice your data; filter, qualify, and reorder your data; and make nifty graphs, html tables, and other such niceties as I am sure that I have not even begun to scratch the surface upon.
In short, if you're interested in data and using Python, I can't think of a single reason not to use Pandas... or the name of another data processing framework that would come close to Panda's power with so little effort. Bottom line, Pandas been easy to use and lightning fast for everything I've wanted it to do thus far.
It will be interesting to see the use case (and the solution) when it eventually doesn't.
Python Reddit... something, something (maybe crawler)
I like reddit. I use PRAW as my API interface to do my crawling. It easy enough to get a submission object (and the documentation for this is clear enough), but once you have the submission object, you're pretty much on your own.
#initialize user agent
prawUserAgent = praw.Reddit(user_agent="some_string")
#pull x new submissions
# there are other pull classes, like by ID
# where subId is a six digit hex_decimal
newSubmissions = prawUserAgent.get_new(limit=intVarUpTo1000)
#explore the submission objects as you see fit, perhaps:
for submission in newSubmissions:
Thumbs up for the library, but very soon, one is completely on their own. More information on the various objects along with default methodologies for traversing comment trees would be nice. Still, it's free, so who's complaining.
I seem to be moving more an more away from web development these days (yeah, sure, an odd sort of statement to be found on a website, but there is it). And perhaps even further from web scrapping (there are just too many resources available that have API's or, better yet, that will export their data troves wholesale). Anyhow, for the rest of the Internet, there's Scrappy. I never liked Beautiful Soup, so if I were to scrape a website at random (and downloading the thing en mass with HTtrack wasn't an option), Scrappy would be my first choice of libraries to learn (a 2015 statement).
My notes after going over the documentation very briefly include (once again, after a brief review and no use of code, I came away with the impression):
And that's about it. I downloaded the documentation thinking I would dig into it, but after an hour or two of reading it over, I realized I just didn't have the need... and therefore will to look into the library any deeper.
- Command Line
- I got the feeling Scrappy would prefer to be called as a separate process from the command line (as a subprocess call vs a library import).
- Built in Spiders and Crawlers
- Web Crawler
- Web Scraper
- Really, don't ask me what the difference is, maybe none. I'm just copying over notes at this point.
- Work Flow is Crawl Site Then Process Data
- From what I saw (interpreted, made wild assumptions about), there was very little ability to crawl and process data concurrently. I view the default work flow to be load scrappy, crawl, save data, exit scrappy, and then, do whatever.
- Wonderful Data Export Capabilities
- The last, a definite plus.
More to be added, whenever...
© Copyright 2015 Brett Paufler
All information derived one way or another from docs.python.org
Terms of Service