Scoring
The scoring module of gerrychain is a collection of functions that can be
used in conjunction with the Partition class to create more complex updaters
beyond what is provided natively in the gerrychain library. This module
also provides a number of methods for analyzing election results of an
ensemble generated by a ReCom chain.
For this tutorial, we will be working with the following shapefile of the state of Maryland:
Scoring Districting Plans
Let’s start with the imports that we will need for this section:
from gerrychain import Graph, Partition, Election
from gerrytools.scoring import *
import pandas as pd
import geopandas as gpd
All of our scores are functions that take a GerryChain Partition and produce
either a numerical (plan-wide) score or a mapping from district or election IDs to
numeric scores. For our examples, we will use a 2020 Maryland VTD shapefile to build
our underlying dual graph, since the shapefile has demographic and electoral
information that our scores will rely on.
graph = Graph.from_file("MD_vtd20/")
elections = ["PRES12", "SEN12", "GOV14", "AG14", "COMP14",
"PRES16", "SEN16", "GOV18", "SEN18", "AG18", "COMP18"]
# use our list of elections ablve to create `Election` updaters for each contest
# Ex: in our shapefile, the column `PRES12R` refers to the votes Mitt
# Romney (R) received in the 2012 Presidential general election
updaters = {}
for e in elections:
updaters[e] = Election(e, {"Dem": e+"D", "Rep": e+"R"})
The :meth`~gerrytools.scoring.demographic_updaters` function returns a dictionary of
Tally updaters that track the number of people of a given demographic group. You
can pass as a list with as many demographic groups as you wish (example below):
demographic_updaters(["TOTPOP20", "VAP20"])
Which should return something like:
{'TOTPOP20': <gerrychain.updaters.tally.Tally at 0x7f261af03a00>,
'VAP20': <gerrychain.updaters.tally.Tally at 0x7f261af03dc0>}
And then we can continue to add these to our updaters for our partition, and continue as normal
# add updaters that track total population, total voting age population,
# and Black and Hispanic voting age population
updaters.update(demographic_updaters(["TOTPOP20", "VAP20", "BVAP20", "HVAP20", "WVAP20"]))
# create the partition on which we'll generate scores
# since `MD_CD_example.csv` is a CSV with `GEOID20` -> district assignment,
# we need to replace the `GEOID20`s with integer node labels to match the graph's nodes.
geoid_to_assignment = pd.read_csv("data/MD_CD_example.csv", header=None).set_index(0).to_dict()[1]
assignment = {n: geoid_to_assignment[graph.nodes[n]["GEOID20"]] for n in graph.nodes}
partition = Partition(graph, assignment, updaters)
Partisan scores
All our partisan scores require at least a list of elections (we’ll use our
elections list defined above). Some of them additionally require the user to
specify a POV party (in our case, either Dem or Rep). All of these partisan
scores return a dictionary that maps election names to the score for that election; it
is up to the user to aggregate (often by summing or averaging) the scores across every
election. For a simple example, let’s use the score function that returns the number
of Democratic seats won in each election.
seats(elections, "Dem")
This will return:
Score(name='Dem_seats', apply=functools.partial(<function _seats at 0x7f2625fe0720>, election_cols=['PRES12', 'SEN12', 'GOV14', 'AG14', 'COMP14', 'PRES16', 'SEN16', 'GOV18', 'SEN18', 'AG18', 'COMP18'], party='Dem', mean=False), dissolved=False)
Note that the output of seats(elections, "Dem") is of type Score, which
functions like a Python namedtuple: for any object x of type Score,
x.name returns the name of the score, and x.apply returns a function that
takes a Partition as input and returns the score. See below:
seats(elections, "Dem").name
returns
'Dem_seats'
and
seats(elections, "Dem").apply(partition)
returns
{'PRES12': 6,
'SEN12': 6,
'GOV14': 4,
'AG14': 6,
'COMP14': 6,
'PRES16': 6,
'SEN16': 6,
'GOV18': 4,
'SEN18': 6,
'AG18': 6,
'COMP18': 8}
Note that we can easily find the number of Republican seats like so:
seats(elections, "Rep").apply(partition)
This gives us
{'PRES12': 2,
'SEN12': 2,
'GOV14': 4,
'AG14': 2,
'COMP14': 2,
'PRES16': 2,
'SEN16': 2,
'GOV18': 4,
'SEN18': 2,
'AG18': 2,
'COMP18': 0}
Moreover, we can pass mean=True to return the average of the score over all
elections, rather than a dictionary:
seats(elections, "Rep", mean=True).apply(partition)
Some partisan scores (mean_median, efficiency_gap, partisan_bias,
partisan_gini) do not require the user to specify the POV party in the call. This
is not because there isn’t a POV party, but because these functions call GerryChain
functions that automatically set the POV party to be the first party listed in the
updater for that election. Since we always list Dem first in this notebook, this
means Dem will be the POV party for these scores— but this is something you should
keep in mind when setting up your updaters and your partition.
# Positive values denote an advantage for the POV party
efficiency_gap(elections).apply(partition)
which will give us
{'PRES12': -0.027366954931038075,
'SEN12': -0.1112428189930485,
'GOV14': -0.016952521996415275,
'AG14': 0.0664089504401374,
'COMP14': -0.03643474212627552,
'PRES16': -0.04564932242915228,
'SEN16': -0.02799189191120642,
'GOV18': 0.09144998629410322,
'SEN18': -0.12475998763996132,
'AG18': -0.06082242557828398,
'COMP18': 0.05664447794898745}
If you know you want to use a lot of scores, it can be helpful to make a list of the scores of interest, like so:
partisan_scores = [
seats(elections, "Dem"),
seats(elections, "Rep"),
# signed_proportionality(elections, "Dem", mean=True),
# absolute_proportionality(elections, "Dem", mean=True),
efficiency_gap(elections, mean=True),
mean_median(elections),
partisan_bias(elections),
partisan_gini(elections),
# Note that `eguia` takes several more arguments — see the documentation for more details
eguia(elections, "Dem", graph, updaters, "COUNTYFP20", "TOTPOP20"),
]
Now, we can make use of the summarize() function to evaluate all the scores on
this partition:
partisan_dictionary = summarize(partition, partisan_scores)
partisan_dictionary["mean_median"]
This will return
{'PRES12': 0.02205704780736839,
'SEN12': 0.04184519796735442,
'GOV14': 0.0128224074264629,
'AG14': 0.03372274606966308,
'COMP14': 0.026622499095666607,
'PRES16': 0.03478025159124121,
'SEN16': 0.03829214902714728,
'GOV18': 0.0195942524690087,
'SEN18': 0.037782714199074086,
'AG18': 0.03906798945053658,
'COMP18': 0.036168324606223434}
and
partisan_dictionary["mean_efficiency_gap"]
gives us
-0.02151975008383212
Demographic Scores
Our demographic scores return a dictionary that maps districts to demographic information, either population counts or shares.
# `demographic_tallies()` takes a list of the demographics you'd like to tally
tally_scores = demographic_tallies(["TOTPOP20", "BVAP20", "HVAP20"])
tally_dictionary = summarize(partition, tally_scores)
tally_dictionary
This will return a dictionary that looks like this:
{'TOTPOP20': {1: 771992,
7: 772346,
8: 772421,
6: 771907,
3: 773001,
4: 772893,
5: 771418,
2: 771246},
'BVAP20': {1: 50513,
7: 186256,
8: 84454,
6: 285475,
3: 106681,
4: 258794,
5: 334253,
2: 82315},
'HVAP20': {1: 40466,
7: 36221,
8: 27363,
6: 44099,
3: 45359,
4: 144187,
5: 43594,
2: 110973}}
And
# `demographic_shares()` takes a dictionary where each key is a total demographic column
# that will be used as the denominator in the share (usually either `TOTPOP20` or `VAP20`)
# and each value is a list of demographics on which you'd like to compute shares
share_scores = demographic_shares({"VAP20": ["BVAP20", "HVAP20"]})
share_dictionary = summarize(partition, share_scores)
share_dictionary
returns
{'BVAP20_share': {1: 0.08427654278144459,
7: 0.3075109503392005,
8: 0.1389347687326854,
6: 0.463149987751003,
3: 0.18038569170027308,
4: 0.4331758821894971,
5: 0.5577436821598711,
2: 0.13770530746350554},
'HVAP20_share': {1: 0.06751399798455716,
7: 0.05980131717762746,
8: 0.045014707140366,
6: 0.07154549893977225,
3: 0.07669701811787184,
4: 0.2413438137099663,
5: 0.07274213867961521,
2: 0.1856474650446164}}
Two things to note:
Both demographic_tallies() and
demographic_shares() return lists of Score s
(one for each demographic of interest), so if we want to just score one demographic,
we’d have to index into the list in order to call .function() :
demographic_tallies(["BVAP20"])[0].apply(partition)
which returns
{1: 50513,
7: 186256,
8: 84454,
6: 285475,
3: 106681,
4: 258794,
5: 334253,
2: 82315}
Moreover, you can only use these scores on demographic columns that have already been
tracked as Tally updaters when we instantiated our partition. If you try a new
column (say, WVAP20) things won’t work!
demographic_tallies(["WVAP20"])[0].apply(partition)
gives us
{1: 457669,
7: 320218,
8: 458845,
6: 234283,
3: 348325,
4: 127814,
5: 178346,
2: 275860}
Our last demographic updater is gingles_districts(), which
takes in a dictionary of the same type as demographic_tallies as well as a
threshold between 0 and 1. Just like the other two demographic scores it returns a list
of Score s, but here the Score s represent the number of districts where the
demographic group’s share is above the threshold. (When the threshold is 0.5 — the
default — these districts are called Gingles’ Districts.
gingles_scores = gingles_districts({"VAP20": ["BVAP20", "HVAP20"]}, threshold=0.5)
gingles_dictionary = summarize(partition, gingles_scores)
gingles_dictionary
and this returns to us
{'BVAP20_gingles_districts': 1, 'HVAP20_gingles_districts': 0}