Brett Stuff
Judging the Judges
Term Year: 2018

2018 Term Year Analysis
Data Munging

From Raw Data
To Collated Data

Raw Data

The Supreme Court has its own Website, from which one can download all of their Slips.

I downloaded the Slips. Read the Slips. And as I did, I recorded such information, as I thought would be pertinent for an End of Year Analysis, which is what this project is all about.


Collated Data

The Supreme Court Slips are text based (in a Natural Language sort of way), which is not overly conducive to analysis.

So at the start of every Webpage in my write-up, I noted the following information:

R-
DATE: 2019-
DOCKET: 17-
NAME: v.
WORTHY: True, False

OPINION: {Court, Concurring, Dissenting}
   AUTHOR: Per Curiam
   JOINING: Roberts, Thomas, Ginsburg, Breyer, Alito, Sotomayor, Kagan, Gorsuch, Kavanaugh
   GOOD: {Yes, No}
PAGES: #

Since the same Slip (the same Court Decision) may include multiple Opinions, that section was repeated as necessary.

Of course, what is shown above is what a Web Browser displays. The incoming source code (i.e. the raw html page that the browser receives) looks like the following:

<code class="summary_analysis">
R-<br>
DATE: 2019-<br>
DOCKET: 17-<br>
NAME: v. <br>
WORTHY: True, False<br>

<br class="opinion">
OPINION: {Court, Concurring, Dissenting}<br>
&nbsp;&nbsp; AUTHOR: Per Curiam<br>
&nbsp;&nbsp; JOINING: Roberts, Thomas, Ginsburg, Breyer, Alito, Sotomayor, Kagan, Gorsuch, Kavanaugh<br>
&nbsp;&nbsp; GOOD: {Yes, No}<br>
PAGES: #<br>
</code>

And this might not look any better than what one can get from a Supreme Court Slip. But it does have the advantage of being consistent. And that consistency allowed me to write a Python Script, which extracted the data from the Raw HTML and Condensed it into a simple (well, relatively simple) Python Object.


Python

There's not much to say about 2018_judges_html_extract.txt:
The program is included almost exclusively for purposes of transparency.

2018_judges_html_extract outputs three files, all of which contain the same data, utilizing different formats:

2018_judges_text.txt: Most Human Readable
2018_judges_json.txt: A Popular Format
2018_judges_pickle.txt: Easiest For Me

And that's that.

From here on, I will only be using 2018_judges_pickle.txt. It will be the input for all future analysis.

In fact, the other data formats and even the extraction script itself are so specialized they serve no further purpose. And as such, I have already removed the working copies from my file system.

{Here it is the very next day. And I've already reloaded the relevant scrips, as I discovered discrepancy in the data, owing to a clerical error: a </code> break was misplaced in the html, causing two Opinions in the same Case to be overlooked. Well, I've corrected the html and added a test to my script, leading me to believe it is a one-off. But who knows what the next one-off will be.}


Judging the Judges

Next Entry

Index


Life is easy, once you let someone else do all the work...


© copyright 2020 Brett Paufler
paufler.net@gmail.com
A Personal Opinion/Editorial