NYT articles about school segregation

There are two really interesting NYT articles this weekend about contemporary de facto school segregation.

One is a journalistic piece by Richard Fausset about a two-high-school district in Mississippi.

The other is an intensely personal essay by Nikole Hannah-Jones about choosing a school for her daughter in NYC.

Both grapple with the difficult issue of what it takes to keep affluent White students in public schools in districts that are heavily poor, Black, and Hispanic. It would seem that today in the Mississippi Delta, no less than in Brooklyn and Manhattan, White families are not only willing to send their children to “racially diverse” schools, but even view racial diversity as an asset.

Provided, that is, that a tipping point isn’t reached–that there are “enough” White students. NYC public schools are about 15% White; Cleveland MS’s, a bit under 30%. In both cases, the outcome is that most White students attend racially diverse schools with a markedly higher proportion of White students than the district as a whole, while really distressing numbers of poor Black and Hispanic students attend schools that are almost entirely Black, Hispanic, and poor.

I don’t necessarily agree with all of the policy assertions that are implicit or explicit in Hannah-Jones’s essay, but I think she does a great job of laying out the issues, and that her calling out of Milliken v. Bradley is particularly noteworthy. That decision effectively halted cross-district desegregation efforts, and once working across district boundaries is taken off the table, the problem becomes pretty close to intractable. A great companion piece to these is a report that the Fordham Institute issued back in 2010, called “America’s Private Public Schools”, about public schools where less than 5% of the enrolled students are from low-income families.

538’s Pirate Logician Puzzle

538 had a fun riddler this past week:

Ten Perfectly Rational Pirate Logicians (PRPLs) find 10 (indivisible) gold pieces and wish to distribute the booty among themselves.

The pirates each have a unique rank, from the captain on down. The captain puts forth the first plan to divide up the gold, whereupon the pirates (including the captain) vote. If at least half the pirates vote for the plan, it is enacted, and the gold is distributed accordingly. If the plan gets fewer than half the votes, however, the captain is killed, the second-in-command is promoted, and the process starts over. (They’re mutinous, these PRPLs.)

Pirates always vote by the following rules, with the earliest rule taking precedence in a conflict:

  • Self-preservation: A pirate values his life above all else.
  • Greed: A pirate seeks as much gold as possible.
  • Bloodthirst: Failing a threat to his life or bounty, a pirate always votes to kill.

Under this system, how do the PRPLs divide up their gold?

Extra credit: Solve the generalized problem where there are P pirates and G gold pieces.

Here is my answer:

The Captain will propose the following distribution, which will be accepted by all pirates who are offered a coin: 6 gp for himself, and 1 gp each for pirates 2, 4, 6, and 8 (where lower numbers indicate lower ranks).

To see why, we can work backwards from the end game. If it comes down to just P2 and P1, P2 will propose taking all the gold for himself; since 50% is sufficient to ratify a plan, it will be go into effect even though P1 will vote against it. So what does P3 have to offer to do to get a proposal approved? He clearly can’t get the support of P2. (Even if he offered P2 all the gold, P2 would still vote against the plan on account of bloodthirst!) But 1 gp is sufficient to get a vote from P1. P3 doesn’t need to offer P2 anything, so he can take the balance (9 gp) for himself. Continuing this backwards reasoning, P4 also needs 1 vote other than his own . It is most easily acquired from P2, who will end up with nothing if things proceed to the next stage. So P4 can offer P2 1 gp, keep 9 gp for himself, and offer nothing to P1 or P3. The ideal solution for each pirate is thus as follows:
s(P2) = {0, 10} (meaning 0 gp for P1 and 10 gp for P2)
s(P3) = {1, 0, 9}
s(P4) = {0, 1, 0, 9}
s(P5) = {1, 0, 1, 0, 8}
s(P6) = {0, 1, 0, 1, 0, 8}
s(P7) = {1, 0, 1, 0, 1, 0, 7}
s(P8) = {0, 1, 0, 1, 0, 1, 0, 7}
s(P9) = {1, 0, 1, 0, 1, 0, 1, 0, 6}
s(P10)={0, 1, 0, 1, 0, 1, 0, 1, 0, 6}
In each case, every pirate who is offered a gold piece will vote for the proposal, since the alternative is to end up with nothing. Every pirate who is offered nothing will, of course, vote against the proposal.

What about the extra credit problem, with g gold pieces and P pirates. The same basic reasoning applies:

S(P2) = {0, g}
S(P3) = {1, 0, g-1}
S(P4) = {0, 1, 0, g-1}
S(P5) = {1, 0, 1, 0, g-2}
S(P6) = {0, 1, 0, 1, 0, g-2}

For any even P, the captain will award 1 gp to each even-numbered pirate other than himself, and take all the remaining gold pieces for himself.
For any odd P, the captain will award 1 gp to each odd-numbered pirate other than himself, and take all the remaining gold pieces for himself.

But what if there are not enough gold pieces? E.g. consider 2 gp for 5 or 6 pirates? In this case, the captain will be forced to distribute all of the gold to others just to save his own hide: the rule above still holds but leaves the captain with nothing. 2 gp for 7 pirates is even worse: there is no proposal the captain can make that will get 3 (or more) votes other than his own. There are just too many bloodthirsty pirates with nothing to lose. Whatever P7 proposes, it will be rejected and he will be killed. What about 2 gp for 8 pirates? Now, a new wrinkle is introduced: P7 will vote for P8’s proposal to save his own life, even if P8 offers him no gold. Furthermore, pirates will decide whether to vote for P8’s proposal not by comparing their outcome to P7’s doomed hypothetical proposal, but rather to P6’s proposal. Therefore, P8 needs to offer gold pieces to 2 of P1, P3, and P5. Those 2 votes, plus the votes of P8 and P8, will be enough to ratify.

What about P9? P9 can count on P7’s vote and his own vote. He needs 3 more votes, but with only 2 gps to spread around, he isn’t going to get them.

And P10? He can count on P7 and P9’s vote (even if he offers them nothing), and his own. So he only needs 2 more votes. Pirates will decide by comparing his proposal to P8’s hypothetical proposal. But P8 could propose any of three possible distribution schemes, which complicates the votes of P1, P3, and P5. Each has only a 2/3 probability of getting any gold if P10’s proposal fails (or an expected value of 2/3 of a gp), so if they are offered a gp by P10, they will support the proposal. So P10’s proposal will pass.

11 pirates and 2 gp? P11 can count on the votes of himself, P7, and P9. He needs 3 more votes, and has no way to secure them with only 2 gp. Any proposal he makes will fail.

12 pirates and 2 gp? P12 can count on the votes of himself, P7, P9, and P11. He needs 2 more votes. By offering 1 gp each to 2 of P2, P4, P5, and P6 (each of whom otherwise has only a 50% chance of receiving a gp), he can secure them.

With 2 gp, this odd-even pattern will repeat indefinitely. I’m not sure of the general rule, but things get tricky when there are a lot more pirates than gold pieces.

Crunching the House Delegation Numbers

I thought it would be fun to talk about how I calculated the numbers in this post. If you are a developer, this will probably be pretty routine (although maybe you can tell me how to fix the problem with non-ASCII characters), but others may find it interesting.

The first thing I needed was a list of House members by state and party, preferably in a machine-readable format. The best source I found for this was https://www.govtrack.us/developers/api. I’m not going to attempt a general definition of API (Application Programming Interface), but this is an example of a public web-based API. It works pretty similarly to a web page: you can access it in a browser, and it basically treats the URL in a manner similar to a command line. Instead of of the server returning a web page–an HTML document designed to be readable by human beings–it returns a JSON document–a structured text document designed to be readable by computers. The output is still returned in the browser, from which you can copy and paste, save as a file, or interact with it in some other way.

The documentation for the API is a bit tricky to make sense of if you aren’t accustomed to API programming, but in fact it’s not all that complicated. Let’s examine the URL that I used:

https://www.govtrack.us/api/v2/role?current=true&role_type=representative&fields=person__firstname,person__lastname,state,party&limit=500

The first part–https://www.govtrack.us–is the same as any other URL. (The “https” part is called the protocol, and the “www.govtrack.us” part is called the hostname). In a simple static web page, the stuff after the hostname would be identifying a particular file on that web server. Here, the “/api/v2/role” part is telling the web server that you want to access the API. Everything after the question mark is the parameters you are sending to the API. Basically, the web server takes those parameters, converts them into a database query, pulls the information you have asked for out of the database, and then structures it for browser-based delivery.

The “current=true” part tells the web server that I’m only interested in currently serving officials.
The “role_type=representative” part says that I’m only interested in members of the House.
The part after “fields=” indicates the particular values I would like to get back. The database has a ton of fields, most of which weren’t relevant for this purpose. (To see them, try removing from the URL everything from “fields” to the next “&”, and then reloading the page.)
The data has a particular structure, and the part after “fields=” reflects that. For instance, data like name and gender live under “person”, and the double-underline is how you say “From within ‘person’, I want the ‘firstname’ field”.
The part after “limit=” tells the server how many records I want returned at once. The current membership of Congress is a small number for a computer, so it is no problem to ask for all of them at once, especially when I just want a handful of fields. For larger data sets, you might have to break up your requests.

The data comes back in a format called JSON, for JavaScript Object Notation. Although JSON syntax is based on JavaScript, many languages support the import of JSON data. The first part of the return file consists of metadata, i.e., data about the data. The “limit” of 500 corresponds to the limit specified in the URL. Had the number of matching records exceeded the limit, the “offset” would have been needed to track which “page” of results this was. The “total_count” is the total number of matching records; it is 440, rather than 435, because the database includes the non-voting delegates from DC, Guam, etc.

To import the data into Python, I first copy-pasted the text into a text editor, and saved it as a file with the name “house.json”, in my Python working directory. Then I made a few tweaks. First, I deleted the whole metadata section, which I didn’t need. Then, for clarity, I changed the generic label “objects” to the more specific label “representatives”.

The code to import the file is short:

import json
with open('house.json') as data_file:
    house_data = json.load(data_file)

Unfortunately, this generated the following error:

Traceback (most recent call last):
  File "/Users/stein/Documents/Python/House.py", line 12, in 
    house_data = json.load(data_file)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/__init__.py", line 265, in load
    return loads(fp.read(),
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6456: ordinal not in range(128)

This is the sort of thing that can make programming very frustrating. The source of the problem turns out to be that the data includes a number of names with accented characters, such as “Gutiérrez”. The accented é is a non-ASCII character, and non-ASCII characters can be quite difficult for data imports to handle. I’m sure there are ways to get it to work, but I tried a few different things, and none of them did. For this purpose, I don’t need representatives’ names to be rendered perfectly. Strictly speaking, I’m not really using them at all, although without them it would be hard to verify that my code was correct. So I resorted to an unusual-sounding command offered by my text editor (TextWrangler): “Zap Gremlins”. This provides the ability to replace all the non-ASCII characters with their closest ASCII equivalents: good enough for this purpose, and it did the trick.

The original data in the file is structured with a “wrapper” around it. This is a flexible mechanism, but if I left it that way, it would make for clunky references. For example, to reference the nth representative in the data set, I would need to refer to:
house_data[‘representatives’][n]

To avoid that, I created a new variable that is easier to reference:

representatives = house_data['representatives']

Now the nth representative is just:
representatives[n]

Next, I needed a JSON list of all the state abbreviations. It wouldn’t be all that much work to create one from scratch, but of course someone else has already done that. I borrowed one from here: https://gist.github.com/mshafrir/2646763, and imported it using the same technique as discussed for the representatives.

The next step is to create variables to store how many representatives from each state are Republicans, Democrats, or other. To do that, I used a quite cool data structure known in Python as a dictionary; it gets declared like this:

rCount = {}
dCount = {}
oCount = {}

The neat thing about a dictionary is that it permits reference by arbitrary values. If you were going to do this with only the very simplest variables, you might create variables like AL_rCount, GA_rCount, SC_rCount, etc. Iterating through them would be a nightmare! Moving up to a more sophisticated variable type, you could use an array. Then you would have rCount[0], rCount[1], rCount[2], etc. That works a lot better, but you still have to keep track of which state corresponds to each index in the array (e.g., AL, is 1, GA is 2, etc.) With a dictionary, you can instead reference the states directly: rCount[‘AL’], rCount[‘GA’], etc. Brilliant!

The first step is to iterate through the states, initializing all the values to 0:

for x in range (0,len(states)):
    rCount[states[x]['abbreviation']] = 0
    dCount[states[x]['abbreviation']] = 0                              
    oCount[states[x]['abbreviation']] = 0

Next, we iterate through all the representatives, tallying the counts by party:

for x in range (0,len(representatives)):
    if representatives[x]['party'] == 'Republican':
        rCount[representatives[x]['state']] += 1;
    elif representatives[x]['party'] == 'Democrat':
        dCount[representatives[x]['state']] += 1;
    else:
        oCount[representatives[x]['state']] += 1;

With these counts finished, we can iterate through the states a second time, this time identifying which party a majority of the House delegation belongs to:

majorityParty = {}
for x in range (0,len(states)):
    if rCount[states[x]['abbreviation']]>dCount[states[x]['abbreviation']]:
        majorityParty[states[x]['abbreviation']]='Republican'
    elif rCount[states[x]['abbreviation']]<dCount[states[x]['abbreviation']]:
        majorityParty[states[x]['abbreviation']]='Democrat'
    elif rCount[states[x]['abbreviation']]==dCount[states[x]['abbreviation']]:        
        majorityParty[states[x]['abbreviation']]='Tie'
    else:
        print("ERROR")

Finally, we need variables to count the number of states for which each party has a majority, so we can iterate through the states to do the count:

rMajorityCount = 0
dMajorityCount = 0
noMajorityCount = 0

for x in range (0, len(states)):
    if majorityParty[states[x]['abbreviation']]=='Republican':
        rMajorityCount += 1
    elif majorityParty[states[x]['abbreviation']]=='Democrat':
        dMajorityCount += 1
    elif majorityParty[states[x]['abbreviation']]=='Tie':
        noMajorityCount += 1
    else:
        print("ERROR")

The resulting numbers aren’t quite right, because the initial Representatives data set includes several non-voting delegates, while the States data set includes both the districts and territories that get delegates (e.g., the District of Columbia and Puerto Rico), and those that do not (e.g., the Northern Mariana Islands). For this purpose, though, it’s really not worth the effort it would take to fix this programatically. It’s easier to just count them, and subtract them from the computed numbers:

The final results are as follows:
Republican majority House delegations:
AL : 6 vs. 1
AK : 1 vs. 0
AZ : 5 vs. 4
AR : 4 vs. 0
CO : 4 vs. 3
FL : 17 vs. 10
GA : 10 vs. 4
ID : 2 vs. 0
IN : 7 vs. 2
IA : 3 vs. 1
KS : 4 vs. 0
KY : 5 vs. 1
LA : 5 vs. 1
MI : 9 vs. 5
MS : 3 vs. 1
MO : 6 vs. 2
MT : 1 vs. 0
NE : 2 vs. 1
NV : 3 vs. 1
NC : 10 vs. 3
ND : 1 vs. 0
OH : 11 vs. 4
OK : 5 vs. 0
PA : 13 vs. 5
SC : 6 vs. 1
SD : 1 vs. 0
TN : 7 vs. 2
TX : 25 vs. 11
UT : 4 vs. 0
VA : 8 vs. 3
WV : 3 vs. 0
WI : 5 vs. 3
WY : 1 vs. 0

Democratic majority House delegations:
CA : 39 vs. 14
CT : 5 vs. 0
DE : 1 vs. 0
HI : 2 vs. 0
IL : 10 vs. 8
MD : 7 vs. 1
MA : 9 vs. 0
MN : 5 vs. 3
NM : 2 vs. 1
NY : 18 vs. 9
OR : 4 vs. 1
RI : 2 vs. 0
VT : 1 vs. 0
WA : 6 vs. 4

Tied House delegations:
ME : 1 vs. 1
NH : 1 vs. 1
NJ : 6 vs. 6

Net figures:
R: 33 states
D: 14 states
Tied: 3 states

The complete code is here:

import json
with open('house.json') as data_file:
    house_data = json.load(data_file)
with open('states.json') as data_file:
    state_data = json.load(data_file, encoding='utf-8')
representatives = house_data['representatives']
states = state_data['states']
#print (states[0]['name'])
#print (len(states))

rCount = {}
dCount = {}
oCount = {}

for x in range (0,len(states)):
    rCount[states[x]['abbreviation']] = 0
    dCount[states[x]['abbreviation']] = 0                              
    oCount[states[x]['abbreviation']] = 0
        
for x in range (0,len(representatives)):
    if representatives[x]['party'] == 'Republican':
        rCount[representatives[x]['state']] += 1;
    elif representatives[x]['party'] == 'Democrat':
        dCount[representatives[x]['state']] += 1;
    else:
        oCount[representatives[x]['state']] += 1;

majorityParty = {}
for x in range (0,len(states)):
    if rCount[states[x]['abbreviation']]>dCount[states[x]['abbreviation']]:
        majorityParty[states[x]['abbreviation']]='Republican'
    elif rCount[states[x]['abbreviation']]<dCount[states[x]['abbreviation']]:
        majorityParty[states[x]['abbreviation']]='Democrat'
    elif rCount[states[x]['abbreviation']]==dCount[states[x]['abbreviation']]:        
        majorityParty[states[x]['abbreviation']]='Tie'
    else:
        print("ERROR")
        
rMajorityCount = 0
dMajorityCount = 0
noMajorityCount = 0

for x in range (0, len(states)):
    if majorityParty[states[x]['abbreviation']]=='Republican':
        rMajorityCount += 1
    elif majorityParty[states[x]['abbreviation']]=='Democrat':
        dMajorityCount += 1
    elif majorityParty[states[x]['abbreviation']]=='Tie':
        noMajorityCount += 1
    else:
        print("ERROR")

print ("Republican majority House delegations:", rMajorityCount)
print ("Democratic majority House delegations:", dMajorityCount)
print ("House delegations with no Majority Party:", noMajorityCount)

print ("Republican majority House delegations:")
for x in range (0,len(states)):
    if majorityParty[states[x]['abbreviation']]=='Republican':
        print(states[x]['abbreviation'],":",rCount[states[x]['abbreviation']], "vs. ",dCount[states[x]['abbreviation']])
    
print ("Democratic majority House delegations:")
for x in range (0,len(states)):
    if majorityParty[states[x]['abbreviation']]=='Democrat':
        print(states[x]['abbreviation'],":",dCount[states[x]['abbreviation']], "vs. ",rCount[states[x]['abbreviation']])

print ("Tied House delegations:")
for x in range (0,len(states)):
    if majorityParty[states[x]['abbreviation']]=='Tie':
        print(states[x]['abbreviation'],":",dCount[states[x]['abbreviation']], "vs. ",rCount[states[x]['abbreviation']])

An Electoral College Plurality

The electoral college system by which we in the U.S. elect our president is famously convoluted, but one aspect is largely unknown even to those who otherwise understand it: what happens if there is no electoral college majority? This has happened twice: in 1800 there was a dead tie between Aaron Burr and Thomas Jefferson, and in 1824 there was a 4-way race in which the top candidate received only 40% of the electoral vote.

So what happens then? Under the 12th Amendment, the House of Representatives picks the new president, choosing from the top three candidates (based on electoral votes). But wait! It gets better! The House votes not as individual representatives–the way they vote on every other matter–but rather as state delegations. That’s right: the 53 representatives from California would collectively cast a single vote, as would the lone representative from Wyoming. Although residents of Washington DC get to vote in presidential elections, their solitary non-voting delegate to the House would not receive a vote here.

While the electors meet and vote (in their respective states) on the first Monday after the second Wednesday in December (I swear, I’m not making this up), the operative date for this purpose is January 6th, when the House and Senate meet in joint session to conduct the official tally of electoral votes. It is at this point that, were there no majority, the House would select the next president. The precise timing is critically important, since under the 20th Amendment, old Congressional terms end and new ones begin on January 3rd, at noon. Thus it is the newly elected House that gets to select the President.

What would it take for this to happen? An electoral college tie is theoretically possible in a close two-way race, but extremely unlikely. By contrast, any remotely viable third-party candidacy could easily send an election to the House. Assuming a close race between the Democrat and Republican, a third party challenger who was able to win even just a few states could make it quite difficult for anyone to achieve an electoral majority.

What does the House look like from this perspective? In a few quick searches, I couldn’t find any tallies of House membership broken out in the way that would matter if the House were to pick the President. This is not surprising, since the House doesn’t function this way for any other purpose that I am aware of. Plus, if this were to happen, the relevant House wouldn’t even be the current House. Nonetheless, I thought it would be an interesting exercise to see what the current House looks like through this lens (I’ve thrown in the Senate for the sake of comparison):

House, by state delegation majority House, by representative Senate, by Senator
Republican 33 (66%) 246 (57%) 54 (54%)
Democratic 14 (28%) 188 (43%) 44 (44%)
Tied / Vacant / Ind 3 (6%) 1 (0%) 2 (2%)

To put it another way, under regular voting rules, the current ratio of Republican votes to Democratic votes in the House is about 1.3 to 1. Under the rules by which the House would select a president, and assuming party-line voting (with evenly-divided delegations deadlocked and unable to vote), this changes to an astounding 2.4 to 1!

In a later post, I may go into the underlying data and discuss how I calculated these numbers using an API from www.govtrack.us and a bit of Python. [Update: That post is now up.]

[Update 2: To think about the implications of the House voting in this manner today versus in 1788 or 1804, it is helpful to look at how the relative sizes of state delegations have changed over time. I created a data visualization that shows this.]

A Histogram of the Monster Data

And here is a histogram representing the distribution of the number of monsters that had to be killed in order to acquire all three types of gems, produced with the following code:

import matplotlib.pyplot as plt
%matplotlib inline
plt.hist(monsterCount,bins=maxMonsterCount-1)

Monster-histogram

Note that this is for a different run of the code than the sample output I showed in the previous post. In this particular 100,000 trial run, there was one trial where it was necessary to kill 67 monsters in order to acquire all three gem types!

Also, copying the image produced a .png file with partial transparency, which has resulted in black-on-gray numbers here that aren’t very legible. Sorry about that. I think there are mechanisms for displaying Python output graphs in a browser window without just copying and pasting images, but I haven’t figured them out yet.

538’s Monsters and Gems Puzzle

538 has a weekly riddler post, and I thought I would try this week’s riddle:

A video game requires you to slay monsters to collect gems. Every time you slay a monster, it drops one of three types of gems: a common gem, an uncommon gem or a rare gem. The probabilities of these gems being dropped are in the ratio of 3:2:1 — three common gems for every two uncommon gems for every one rare gem, on average. If you slay monsters until you have at least one of each of the three types of gems, how many of the most common gems will you end up with, on average?

Here is the answer I submitted:

I couldn’t figure out how to reason about this with the probability theory that I know, so I thought I would try to simulate it. I’ve been playing around with Python recently, so I wrote a program that simulates an arbitrary number of trials. Running this repeatedly with 100,000 trials, rounded averages were mostly consistent to 2 decimal places, although sometimes the last decimal place varied by one: 3.65 common gems, 2.43 uncommon gems, 1.22 rare gems, 7.30 monsters killed.

On average it should take 6 dead monsters to get a rare gem. By then, most of the time we would expect that both common and uncommon gems would have appeared. So it’s surprising to me that the average is as much higher than 6 as it is. Intuitively, I think part of what is going on is that because it always takes at least 3 dead monsters to get all 3 gem types, the cases where a rare gem is drawn early on aren’t able to “balance out” the cases where it takes a really long time, as would be the case in a simpler probability distribution. About 15% of the time, it takes 12 or more kills to get all three gem types. I’m no closer, though, to a mathematical derivation of the exact answer.

I have pasted below a sample output from the program, and below that the program itself.

number of trials: 100000
Average number of common gems: 3.64636
Average number of uncommon gems: 2.43568
Average number of rare gems: 1.21557
Average number of monsters killed: 7.29761
max monster count 56
1 dead monsters: 0 (0.000% | 0.000%)
2 dead monsters: 0 (0.000% | 0.000%)
3 dead monsters: 16744 (16.744% | 16.744%)
4 dead monsters: 16700 (16.700% | 33.444%)
5 dead monsters: 13746 (13.746% | 47.190%)
6 dead monsters: 10839 (10.839% | 58.029%)
7 dead monsters: 8446 (8.446% | 66.475%)
8 dead monsters: 6552 (6.552% | 73.027%)
9 dead monsters: 5176 (5.176% | 78.203%)
10 dead monsters: 3937 (3.937% | 82.140%)
11 dead monsters: 3241 (3.241% | 85.381%)
12 dead monsters: 2689 (2.689% | 88.070%)
13 dead monsters: 2104 (2.104% | 90.174%)
14 dead monsters: 1653 (1.653% | 91.827%)
15 dead monsters: 1376 (1.376% | 93.203%)
16 dead monsters: 1200 (1.200% | 94.403%)
17 dead monsters: 907 (0.907% | 95.310%)
18 dead monsters: 807 (0.807% | 96.117%)
19 dead monsters: 620 (0.620% | 96.737%)
20 dead monsters: 529 (0.529% | 97.266%)
21 dead monsters: 443 (0.443% | 97.709%)
22 dead monsters: 382 (0.382% | 98.091%)
23 dead monsters: 312 (0.312% | 98.403%)
24 dead monsters: 305 (0.305% | 98.708%)
25 dead monsters: 221 (0.221% | 98.929%)
26 dead monsters: 178 (0.178% | 99.107%)
27 dead monsters: 139 (0.139% | 99.246%)
28 dead monsters: 129 (0.129% | 99.375%)
29 dead monsters: 95 (0.095% | 99.470%)
30 dead monsters: 99 (0.099% | 99.569%)
31 dead monsters: 60 (0.060% | 99.629%)
32 dead monsters: 58 (0.058% | 99.687%)
33 dead monsters: 46 (0.046% | 99.733%)
34 dead monsters: 50 (0.050% | 99.783%)
35 dead monsters: 33 (0.033% | 99.816%)
36 dead monsters: 31 (0.031% | 99.847%)
37 dead monsters: 25 (0.025% | 99.872%)
38 dead monsters: 25 (0.025% | 99.897%)
39 dead monsters: 19 (0.019% | 99.916%)
40 dead monsters: 19 (0.019% | 99.935%)
41 dead monsters: 14 (0.014% | 99.949%)
42 dead monsters: 15 (0.015% | 99.964%)
43 dead monsters: 6 (0.006% | 99.970%)
44 dead monsters: 5 (0.005% | 99.975%)
45 dead monsters: 6 (0.006% | 99.981%)
46 dead monsters: 4 (0.004% | 99.985%)
47 dead monsters: 4 (0.004% | 99.989%)
48 dead monsters: 1 (0.001% | 99.990%)
49 dead monsters: 2 (0.002% | 99.992%)
50 dead monsters: 0 (0.000% | 99.992%)
51 dead monsters: 3 (0.003% | 99.995%)
52 dead monsters: 3 (0.003% | 99.998%)
53 dead monsters: 1 (0.001% | 99.999%)
54 dead monsters: 0 (0.000% | 99.999%)
55 dead monsters: 0 (0.000% | 99.999%)
56 dead monsters: 1 (0.001% | 100.000%)

Here is my python 3 code:

import random
commonCount=[]
uncommonCount=[]
rareCount=[]
monsterCount=[]
numTrials = 100000

maxMonsterCount = 0;

for i in range (0,numTrials):
    commonCount.append(0)
    uncommonCount.append(0)
    rareCount.append(0)
    monsterCount.append(0)

for i in range (0,numTrials):
    while (commonCount[i]==0 or uncommonCount[i]==0 or rareCount[i]==0):
        monsterCount[i] +=1
        if (monsterCount[i] > maxMonsterCount):
            maxMonsterCount = monsterCount[i]
        monster = random.randrange(1,7)
 #randrange syntax includes the first parameter, but excludes the second       
        if (monster==1 or monster==2 or monster==3):
            commonCount[i] += 1
        elif (monster==4 or monster==5):
            uncommonCount[i] += 1
        elif (monster==6):
            rareCount[i] += 1
        else:
            print("ERROR!")            
# Don't run the following lines unless numTrials is small ...
#    print ("i =",i)
#    print ("common:", commonCount[i])
#    print ("uncommon:", uncommonCount[i])
#    print ("rare:", rareCount[i])
#    print ("total:", monsterCount[i])

commonTotal = 0;
uncommonTotal = 0;
rareTotal = 0;
overallTotal = 0;

for i in range (0, numTrials):
    commonTotal += commonCount[i]
    uncommonTotal += uncommonCount[i]
    rareTotal += rareCount[i]
    overallTotal += monsterCount[i]

commonAverage = commonTotal/numTrials
uncommonAverage = uncommonTotal/numTrials
rareAverage = rareTotal/numTrials
totalAverage = overallTotal/numTrials
print ("number of trials:",numTrials)
print ("Average number of common gems:",commonAverage)
print ("Average number of uncommon gems:",uncommonAverage)
print ("Average number of rare gems:",rareAverage)
print ("Average number of monsters killed:", totalAverage)

print("max monster count", maxMonsterCount);

killDistribution=[]
for a in range (0,maxMonsterCount+2):
    killDistribution.append(0)
for i in range (0,numTrials):
    killDistribution[monsterCount[i]] += 1

cumulativePercentage = 0;
for a in range (1,maxMonsterCount+1):
#1 and 2 should always be zero, but I like to leave them in anyway.
    cumulativePercentage += killDistribution[a]/numTrials*100
    print ('{: d} dead monsters: {: d} ({:04.3f}% | {:04.3f}%)'.format(a,killDistribution[a],killDistribution[a]/numTrials*100,cumulativePercentage))