Scoring

The scoring module of gerrychain is a collection of functions that can be used in conjunction with the Partition class to create more complex updaters beyond what is provided natively in the gerrychain library. This module also provides a number of methods for analyzing election results of an ensemble generated by a ReCom chain.

For this tutorial, we will be working with the following shapefile of the state of Maryland:

MD Shapefile

Scoring Districting Plans

Let’s start with the imports that we will need for this section:

from gerrychain import Graph, Partition, Election
from gerrytools.scoring import *
import pandas as pd
import geopandas as gpd

All of our scores are functions that take a GerryChain Partition and produce either a numerical (plan-wide) score or a mapping from district or election IDs to numeric scores. For our examples, we will use a 2020 Maryland VTD shapefile to build our underlying dual graph, since the shapefile has demographic and electoral information that our scores will rely on.

graph = Graph.from_file("MD_vtd20/")

elections = ["PRES12", "SEN12", "GOV14", "AG14", "COMP14",
            "PRES16", "SEN16", "GOV18", "SEN18", "AG18", "COMP18"]

# use our list of elections ablve to create `Election` updaters for each contest
# Ex: in our shapefile, the column `PRES12R` refers to the votes Mitt
# Romney (R) received in the 2012 Presidential general election
updaters = {}
for e in elections:
    updaters[e] = Election(e, {"Dem": e+"D", "Rep": e+"R"})

The :meth`~gerrytools.scoring.demographic_updaters` function returns a dictionary of Tally updaters that track the number of people of a given demographic group. You can pass as a list with as many demographic groups as you wish (example below):

demographic_updaters(["TOTPOP20", "VAP20"])

Which should return something like:

{'TOTPOP20': <gerrychain.updaters.tally.Tally at 0x7f261af03a00>,
'VAP20': <gerrychain.updaters.tally.Tally at 0x7f261af03dc0>}

And then we can continue to add these to our updaters for our partition, and continue as normal

# add updaters that track total population, total voting age population,
# and Black and Hispanic voting age population
updaters.update(demographic_updaters(["TOTPOP20", "VAP20", "BVAP20", "HVAP20", "WVAP20"]))

# create the partition on which we'll generate scores
# since `MD_CD_example.csv` is a CSV with `GEOID20` -> district assignment,
# we need to replace the `GEOID20`s with integer node labels to match the graph's nodes.
geoid_to_assignment = pd.read_csv("data/MD_CD_example.csv", header=None).set_index(0).to_dict()[1]
assignment = {n: geoid_to_assignment[graph.nodes[n]["GEOID20"]] for n in graph.nodes}
partition = Partition(graph, assignment, updaters)

Partisan scores

All our partisan scores require at least a list of elections (we’ll use our elections list defined above). Some of them additionally require the user to specify a POV party (in our case, either Dem or Rep). All of these partisan scores return a dictionary that maps election names to the score for that election; it is up to the user to aggregate (often by summing or averaging) the scores across every election. For a simple example, let’s use the score function that returns the number of Democratic seats won in each election.

seats(elections, "Dem")

This will return:

Score(name='Dem_seats', apply=functools.partial(<function _seats at 0x7f2625fe0720>, election_cols=['PRES12', 'SEN12', 'GOV14', 'AG14', 'COMP14', 'PRES16', 'SEN16', 'GOV18', 'SEN18', 'AG18', 'COMP18'], party='Dem', mean=False), dissolved=False)

Note that the output of seats(elections, "Dem") is of type Score, which functions like a Python namedtuple: for any object x of type Score, x.name returns the name of the score, and x.apply returns a function that takes a Partition as input and returns the score. See below:

seats(elections, "Dem").name

returns

'Dem_seats'

and

seats(elections, "Dem").apply(partition)

returns

{'PRES12': 6,
'SEN12': 6,
'GOV14': 4,
'AG14': 6,
'COMP14': 6,
'PRES16': 6,
'SEN16': 6,
'GOV18': 4,
'SEN18': 6,
'AG18': 6,
'COMP18': 8}

Note that we can easily find the number of Republican seats like so:

seats(elections, "Rep").apply(partition)

This gives us

{'PRES12': 2,
'SEN12': 2,
'GOV14': 4,
'AG14': 2,
'COMP14': 2,
'PRES16': 2,
'SEN16': 2,
'GOV18': 4,
'SEN18': 2,
'AG18': 2,
'COMP18': 0}

Moreover, we can pass mean=True to return the average of the score over all elections, rather than a dictionary:

seats(elections, "Rep", mean=True).apply(partition)

Some partisan scores (mean_median, efficiency_gap, partisan_bias, partisan_gini) do not require the user to specify the POV party in the call. This is not because there isn’t a POV party, but because these functions call GerryChain functions that automatically set the POV party to be the first party listed in the updater for that election. Since we always list Dem first in this notebook, this means Dem will be the POV party for these scores— but this is something you should keep in mind when setting up your updaters and your partition.

# Positive values denote an advantage for the POV party
efficiency_gap(elections).apply(partition)

which will give us

{'PRES12': -0.027366954931038075,
'SEN12': -0.1112428189930485,
'GOV14': -0.016952521996415275,
'AG14': 0.0664089504401374,
'COMP14': -0.03643474212627552,
'PRES16': -0.04564932242915228,
'SEN16': -0.02799189191120642,
'GOV18': 0.09144998629410322,
'SEN18': -0.12475998763996132,
'AG18': -0.06082242557828398,
'COMP18': 0.05664447794898745}

If you know you want to use a lot of scores, it can be helpful to make a list of the scores of interest, like so:

partisan_scores = [
    seats(elections, "Dem"),
    seats(elections, "Rep"),
    # signed_proportionality(elections, "Dem", mean=True),
    # absolute_proportionality(elections, "Dem", mean=True),
    efficiency_gap(elections, mean=True),
    mean_median(elections),
    partisan_bias(elections),
    partisan_gini(elections),
    # Note that `eguia` takes several more arguments — see the documentation for more details
    eguia(elections, "Dem", graph, updaters, "COUNTYFP20", "TOTPOP20"),
]

Now, we can make use of the summarize() function to evaluate all the scores on this partition:

partisan_dictionary = summarize(partition, partisan_scores)
partisan_dictionary["mean_median"]

This will return

{'PRES12': 0.02205704780736839,
'SEN12': 0.04184519796735442,
'GOV14': 0.0128224074264629,
'AG14': 0.03372274606966308,
'COMP14': 0.026622499095666607,
'PRES16': 0.03478025159124121,
'SEN16': 0.03829214902714728,
'GOV18': 0.0195942524690087,
'SEN18': 0.037782714199074086,
'AG18': 0.03906798945053658,
'COMP18': 0.036168324606223434}

and

partisan_dictionary["mean_efficiency_gap"]

gives us

-0.02151975008383212

Demographic Scores

Our demographic scores return a dictionary that maps districts to demographic information, either population counts or shares.

# `demographic_tallies()` takes a list of the demographics you'd like to tally
tally_scores = demographic_tallies(["TOTPOP20", "BVAP20", "HVAP20"])
tally_dictionary = summarize(partition, tally_scores)
tally_dictionary

This will return a dictionary that looks like this:

{'TOTPOP20': {1: 771992,
772346,
772421,
771907,
773001,
772893,
771418,
771246},
'BVAP20': {1: 50513,
186256,
84454,
285475,
106681,
258794,
334253,
82315},
'HVAP20': {1: 40466,
36221,
27363,
44099,
45359,
144187,
43594,
110973}}

And

# `demographic_shares()` takes a dictionary where each key is a total demographic column
# that will be used as the denominator in the share (usually either `TOTPOP20` or `VAP20`)
# and each value is a list of demographics on which you'd like to compute shares
share_scores = demographic_shares({"VAP20": ["BVAP20", "HVAP20"]})
share_dictionary = summarize(partition, share_scores)
share_dictionary

returns

{'BVAP20_share': {1: 0.08427654278144459,
0.3075109503392005,
0.1389347687326854,
0.463149987751003,
0.18038569170027308,
0.4331758821894971,
0.5577436821598711,
0.13770530746350554},
'HVAP20_share': {1: 0.06751399798455716,
0.05980131717762746,
0.045014707140366,
0.07154549893977225,
0.07669701811787184,
0.2413438137099663,
0.07274213867961521,
0.1856474650446164}}

Two things to note:

Both demographic_tallies() and demographic_shares() return lists of Score s (one for each demographic of interest), so if we want to just score one demographic, we’d have to index into the list in order to call .function() :

demographic_tallies(["BVAP20"])[0].apply(partition)

which returns

Moreover, you can only use these scores on demographic columns that have already been tracked as Tally updaters when we instantiated our partition. If you try a new column (say, WVAP20) things won’t work!

demographic_tallies(["WVAP20"])[0].apply(partition)

gives us

{1: 457669,
320218,
458845,
234283,
348325,
127814,
178346,
275860}

Our last demographic updater is gingles_districts(), which takes in a dictionary of the same type as demographic_tallies as well as a threshold between 0 and 1. Just like the other two demographic scores it returns a list of Score s, but here the Score s represent the number of districts where the demographic group’s share is above the threshold. (When the threshold is 0.5 — the default — these districts are called Gingles’ Districts.

gingles_scores = gingles_districts({"VAP20": ["BVAP20", "HVAP20"]}, threshold=0.5)
gingles_dictionary = summarize(partition, gingles_scores)
gingles_dictionary

and this returns to us

{'BVAP20_gingles_districts': 1, 'HVAP20_gingles_districts': 0}