I thought it would be fun to talk about how I calculated the numbers in this post. If you are a developer, this will probably be pretty routine (although maybe you can tell me how to fix the problem with non-ASCII characters), but others may find it interesting.
The first thing I needed was a list of House members by state and party, preferably in a machine-readable format. The best source I found for this was https://www.govtrack.us/developers/api. I’m not going to attempt a general definition of API (Application Programming Interface), but this is an example of a public web-based API. It works pretty similarly to a web page: you can access it in a browser, and it basically treats the URL in a manner similar to a command line. Instead of of the server returning a web page–an HTML document designed to be readable by human beings–it returns a JSON document–a structured text document designed to be readable by computers. The output is still returned in the browser, from which you can copy and paste, save as a file, or interact with it in some other way.
The documentation for the API is a bit tricky to make sense of if you aren’t accustomed to API programming, but in fact it’s not all that complicated. Let’s examine the URL that I used:
https://www.govtrack.us/api/v2/role?current=true&role_type=representative&fields=person__firstname,person__lastname,state,party&limit=500
The first part–https://www.govtrack.us–is the same as any other URL. (The “https” part is called the protocol, and the “www.govtrack.us” part is called the hostname). In a simple static web page, the stuff after the hostname would be identifying a particular file on that web server. Here, the “/api/v2/role” part is telling the web server that you want to access the API. Everything after the question mark is the parameters you are sending to the API. Basically, the web server takes those parameters, converts them into a database query, pulls the information you have asked for out of the database, and then structures it for browser-based delivery.
The “current=true” part tells the web server that I’m only interested in currently serving officials.
The “role_type=representative” part says that I’m only interested in members of the House.
The part after “fields=” indicates the particular values I would like to get back. The database has a ton of fields, most of which weren’t relevant for this purpose. (To see them, try removing from the URL everything from “fields” to the next “&”, and then reloading the page.)
The data has a particular structure, and the part after “fields=” reflects that. For instance, data like name and gender live under “person”, and the double-underline is how you say “From within ‘person’, I want the ‘firstname’ field”.
The part after “limit=” tells the server how many records I want returned at once. The current membership of Congress is a small number for a computer, so it is no problem to ask for all of them at once, especially when I just want a handful of fields. For larger data sets, you might have to break up your requests.
The data comes back in a format called JSON, for JavaScript Object Notation. Although JSON syntax is based on JavaScript, many languages support the import of JSON data. The first part of the return file consists of metadata, i.e., data about the data. The “limit” of 500 corresponds to the limit specified in the URL. Had the number of matching records exceeded the limit, the “offset” would have been needed to track which “page” of results this was. The “total_count” is the total number of matching records; it is 440, rather than 435, because the database includes the non-voting delegates from DC, Guam, etc.
To import the data into Python, I first copy-pasted the text into a text editor, and saved it as a file with the name “house.json”, in my Python working directory. Then I made a few tweaks. First, I deleted the whole metadata section, which I didn’t need. Then, for clarity, I changed the generic label “objects” to the more specific label “representatives”.
The code to import the file is short:
import json
with open('house.json') as data_file:
house_data = json.load(data_file)
Unfortunately, this generated the following error:
Traceback (most recent call last):
File "/Users/stein/Documents/Python/House.py", line 12, in
house_data = json.load(data_file)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/__init__.py", line 265, in load
return loads(fp.read(),
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6456: ordinal not in range(128)
This is the sort of thing that can make programming very frustrating. The source of the problem turns out to be that the data includes a number of names with accented characters, such as “Gutiérrez”. The accented é is a non-ASCII character, and non-ASCII characters can be quite difficult for data imports to handle. I’m sure there are ways to get it to work, but I tried a few different things, and none of them did. For this purpose, I don’t need representatives’ names to be rendered perfectly. Strictly speaking, I’m not really using them at all, although without them it would be hard to verify that my code was correct. So I resorted to an unusual-sounding command offered by my text editor (TextWrangler): “Zap Gremlins”. This provides the ability to replace all the non-ASCII characters with their closest ASCII equivalents: good enough for this purpose, and it did the trick.
The original data in the file is structured with a “wrapper” around it. This is a flexible mechanism, but if I left it that way, it would make for clunky references. For example, to reference the nth representative in the data set, I would need to refer to:
house_data[‘representatives’][n]
To avoid that, I created a new variable that is easier to reference:
representatives = house_data['representatives']
Now the nth representative is just:
representatives[n]
Next, I needed a JSON list of all the state abbreviations. It wouldn’t be all that much work to create one from scratch, but of course someone else has already done that. I borrowed one from here: https://gist.github.com/mshafrir/2646763, and imported it using the same technique as discussed for the representatives.
The next step is to create variables to store how many representatives from each state are Republicans, Democrats, or other. To do that, I used a quite cool data structure known in Python as a dictionary; it gets declared like this:
rCount = {}
dCount = {}
oCount = {}
The neat thing about a dictionary is that it permits reference by arbitrary values. If you were going to do this with only the very simplest variables, you might create variables like AL_rCount, GA_rCount, SC_rCount, etc. Iterating through them would be a nightmare! Moving up to a more sophisticated variable type, you could use an array. Then you would have rCount[0], rCount[1], rCount[2], etc. That works a lot better, but you still have to keep track of which state corresponds to each index in the array (e.g., AL, is 1, GA is 2, etc.) With a dictionary, you can instead reference the states directly: rCount[‘AL’], rCount[‘GA’], etc. Brilliant!
The first step is to iterate through the states, initializing all the values to 0:
for x in range (0,len(states)):
rCount[states[x]['abbreviation']] = 0
dCount[states[x]['abbreviation']] = 0
oCount[states[x]['abbreviation']] = 0
Next, we iterate through all the representatives, tallying the counts by party:
for x in range (0,len(representatives)):
if representatives[x]['party'] == 'Republican':
rCount[representatives[x]['state']] += 1;
elif representatives[x]['party'] == 'Democrat':
dCount[representatives[x]['state']] += 1;
else:
oCount[representatives[x]['state']] += 1;
With these counts finished, we can iterate through the states a second time, this time identifying which party a majority of the House delegation belongs to:
majorityParty = {}
for x in range (0,len(states)):
if rCount[states[x]['abbreviation']]>dCount[states[x]['abbreviation']]:
majorityParty[states[x]['abbreviation']]='Republican'
elif rCount[states[x]['abbreviation']]<dCount[states[x]['abbreviation']]:
majorityParty[states[x]['abbreviation']]='Democrat'
elif rCount[states[x]['abbreviation']]==dCount[states[x]['abbreviation']]:
majorityParty[states[x]['abbreviation']]='Tie'
else:
print("ERROR")
Finally, we need variables to count the number of states for which each party has a majority, so we can iterate through the states to do the count:
rMajorityCount = 0
dMajorityCount = 0
noMajorityCount = 0
for x in range (0, len(states)):
if majorityParty[states[x]['abbreviation']]=='Republican':
rMajorityCount += 1
elif majorityParty[states[x]['abbreviation']]=='Democrat':
dMajorityCount += 1
elif majorityParty[states[x]['abbreviation']]=='Tie':
noMajorityCount += 1
else:
print("ERROR")
The resulting numbers aren’t quite right, because the initial Representatives data set includes several non-voting delegates, while the States data set includes both the districts and territories that get delegates (e.g., the District of Columbia and Puerto Rico), and those that do not (e.g., the Northern Mariana Islands). For this purpose, though, it’s really not worth the effort it would take to fix this programatically. It’s easier to just count them, and subtract them from the computed numbers:
The final results are as follows:
Republican majority House delegations:
AL : 6 vs. 1
AK : 1 vs. 0
AZ : 5 vs. 4
AR : 4 vs. 0
CO : 4 vs. 3
FL : 17 vs. 10
GA : 10 vs. 4
ID : 2 vs. 0
IN : 7 vs. 2
IA : 3 vs. 1
KS : 4 vs. 0
KY : 5 vs. 1
LA : 5 vs. 1
MI : 9 vs. 5
MS : 3 vs. 1
MO : 6 vs. 2
MT : 1 vs. 0
NE : 2 vs. 1
NV : 3 vs. 1
NC : 10 vs. 3
ND : 1 vs. 0
OH : 11 vs. 4
OK : 5 vs. 0
PA : 13 vs. 5
SC : 6 vs. 1
SD : 1 vs. 0
TN : 7 vs. 2
TX : 25 vs. 11
UT : 4 vs. 0
VA : 8 vs. 3
WV : 3 vs. 0
WI : 5 vs. 3
WY : 1 vs. 0
Democratic majority House delegations:
CA : 39 vs. 14
CT : 5 vs. 0
DE : 1 vs. 0
HI : 2 vs. 0
IL : 10 vs. 8
MD : 7 vs. 1
MA : 9 vs. 0
MN : 5 vs. 3
NM : 2 vs. 1
NY : 18 vs. 9
OR : 4 vs. 1
RI : 2 vs. 0
VT : 1 vs. 0
WA : 6 vs. 4
Tied House delegations:
ME : 1 vs. 1
NH : 1 vs. 1
NJ : 6 vs. 6
Net figures:
R: 33 states
D: 14 states
Tied: 3 states
The complete code is here:
import json
with open('house.json') as data_file:
house_data = json.load(data_file)
with open('states.json') as data_file:
state_data = json.load(data_file, encoding='utf-8')
representatives = house_data['representatives']
states = state_data['states']
#print (states[0]['name'])
#print (len(states))
rCount = {}
dCount = {}
oCount = {}
for x in range (0,len(states)):
rCount[states[x]['abbreviation']] = 0
dCount[states[x]['abbreviation']] = 0
oCount[states[x]['abbreviation']] = 0
for x in range (0,len(representatives)):
if representatives[x]['party'] == 'Republican':
rCount[representatives[x]['state']] += 1;
elif representatives[x]['party'] == 'Democrat':
dCount[representatives[x]['state']] += 1;
else:
oCount[representatives[x]['state']] += 1;
majorityParty = {}
for x in range (0,len(states)):
if rCount[states[x]['abbreviation']]>dCount[states[x]['abbreviation']]:
majorityParty[states[x]['abbreviation']]='Republican'
elif rCount[states[x]['abbreviation']]<dCount[states[x]['abbreviation']]:
majorityParty[states[x]['abbreviation']]='Democrat'
elif rCount[states[x]['abbreviation']]==dCount[states[x]['abbreviation']]:
majorityParty[states[x]['abbreviation']]='Tie'
else:
print("ERROR")
rMajorityCount = 0
dMajorityCount = 0
noMajorityCount = 0
for x in range (0, len(states)):
if majorityParty[states[x]['abbreviation']]=='Republican':
rMajorityCount += 1
elif majorityParty[states[x]['abbreviation']]=='Democrat':
dMajorityCount += 1
elif majorityParty[states[x]['abbreviation']]=='Tie':
noMajorityCount += 1
else:
print("ERROR")
print ("Republican majority House delegations:", rMajorityCount)
print ("Democratic majority House delegations:", dMajorityCount)
print ("House delegations with no Majority Party:", noMajorityCount)
print ("Republican majority House delegations:")
for x in range (0,len(states)):
if majorityParty[states[x]['abbreviation']]=='Republican':
print(states[x]['abbreviation'],":",rCount[states[x]['abbreviation']], "vs. ",dCount[states[x]['abbreviation']])
print ("Democratic majority House delegations:")
for x in range (0,len(states)):
if majorityParty[states[x]['abbreviation']]=='Democrat':
print(states[x]['abbreviation'],":",dCount[states[x]['abbreviation']], "vs. ",rCount[states[x]['abbreviation']])
print ("Tied House delegations:")
for x in range (0,len(states)):
if majorityParty[states[x]['abbreviation']]=='Tie':
print(states[x]['abbreviation'],":",dCount[states[x]['abbreviation']], "vs. ",rCount[states[x]['abbreviation']])