=======
Scoring
=======
The ``scoring`` module of ``gerrychain`` is a collection of functions that can be
used in conjunction with the ``Partition`` class to create more complex updaters
beyond what is provided natively in the ``gerrychain`` library. This module
also provides a number of methods for analyzing election results of an
ensemble generated by a ReCom chain.
For this tutorial, we will be working with the following shapefile of the state of
Maryland:
.. raw:: html
Scoring Districting Plans
-------------------------
Let's start with the imports that we will need for this section:
.. code-block:: python
from gerrychain import Graph, Partition, Election
from gerrytools.scoring import *
import pandas as pd
import geopandas as gpd
All of our scores are functions that take a GerryChain ``Partition`` and produce
either a numerical (plan-wide) score or a mapping from district or election IDs to
numeric scores. For our examples, we will use a 2020 Maryland VTD shapefile to build
our underlying dual graph, since the shapefile has demographic and electoral
information that our scores will rely on.
.. code-block:: python
graph = Graph.from_file("MD_vtd20/")
elections = ["PRES12", "SEN12", "GOV14", "AG14", "COMP14",
"PRES16", "SEN16", "GOV18", "SEN18", "AG18", "COMP18"]
# use our list of elections ablve to create `Election` updaters for each contest
# Ex: in our shapefile, the column `PRES12R` refers to the votes Mitt
# Romney (R) received in the 2012 Presidential general election
updaters = {}
for e in elections:
updaters[e] = Election(e, {"Dem": e+"D", "Rep": e+"R"})
The :meth`~gerrytools.scoring.demographic_updaters` function returns a dictionary of
``Tally`` updaters that track the number of people of a given demographic group. You
can pass as a list with as many demographic groups as you wish (example below):
.. code-block:: python
demographic_updaters(["TOTPOP20", "VAP20"])
Which should return something like:
.. code-block:: console
{'TOTPOP20': ,
'VAP20': }
And then we can continue to add these to our updaters for our partition, and
continue as normal
.. code-block:: python
# add updaters that track total population, total voting age population,
# and Black and Hispanic voting age population
updaters.update(demographic_updaters(["TOTPOP20", "VAP20", "BVAP20", "HVAP20", "WVAP20"]))
# create the partition on which we'll generate scores
# since `MD_CD_example.csv` is a CSV with `GEOID20` -> district assignment,
# we need to replace the `GEOID20`s with integer node labels to match the graph's nodes.
geoid_to_assignment = pd.read_csv("data/MD_CD_example.csv", header=None).set_index(0).to_dict()[1]
assignment = {n: geoid_to_assignment[graph.nodes[n]["GEOID20"]] for n in graph.nodes}
partition = Partition(graph, assignment, updaters)
Partisan scores
---------------
All our partisan scores require at least a list of elections (we'll use our
``elections`` list defined above). Some of them additionally require the user to
specify a POV party (in our case, either ``Dem`` or ``Rep``). All of these partisan
scores return a dictionary that maps election names to the score for that election; it
is up to the user to aggregate (often by summing or averaging) the scores across every
election. For a simple example, let's use the score function that returns the number
of Democratic seats won in each election.
.. code-block:: python
seats(elections, "Dem")
This will return:
.. code-block:: console
Score(name='Dem_seats', apply=functools.partial(, election_cols=['PRES12', 'SEN12', 'GOV14', 'AG14', 'COMP14', 'PRES16', 'SEN16', 'GOV18', 'SEN18', 'AG18', 'COMP18'], party='Dem', mean=False), dissolved=False)
Note that the output of ``seats(elections, "Dem")`` is of type ``Score``, which
functions like a Python ``namedtuple``: for any object ``x`` of type ``Score``,
``x.name`` returns the name of the score, and ``x.apply`` returns a function that
takes a ``Partition`` as input and returns the score. See below:
.. code-block:: python
seats(elections, "Dem").name
returns
.. code-block:: console
'Dem_seats'
and
.. code-block:: python
seats(elections, "Dem").apply(partition)
returns
.. code-block:: console
{'PRES12': 6,
'SEN12': 6,
'GOV14': 4,
'AG14': 6,
'COMP14': 6,
'PRES16': 6,
'SEN16': 6,
'GOV18': 4,
'SEN18': 6,
'AG18': 6,
'COMP18': 8}
Note that we can easily find the number of Republican seats like so:
.. code-block:: python
seats(elections, "Rep").apply(partition)
This gives us
.. code-block:: console
{'PRES12': 2,
'SEN12': 2,
'GOV14': 4,
'AG14': 2,
'COMP14': 2,
'PRES16': 2,
'SEN16': 2,
'GOV18': 4,
'SEN18': 2,
'AG18': 2,
'COMP18': 0}
Moreover, we can pass ``mean=True`` to return the average of the score over all
elections, rather than a dictionary:
.. code-block:: python
seats(elections, "Rep", mean=True).apply(partition)
Some partisan scores (``mean_median``, ``efficiency_gap``, ``partisan_bias``,
``partisan_gini``) do not require the user to specify the POV party in the call. This
is not because there isn't a POV party, but because these functions call GerryChain
functions that automatically set the POV party to be the **first** party listed in the
updater for that election. Since we always list ``Dem`` first in this notebook, this
means ``Dem`` will be the POV party for these scores— but this is something you should
keep in mind when setting up your updaters and your partition.
.. code-block:: python
# Positive values denote an advantage for the POV party
efficiency_gap(elections).apply(partition)
which will give us
.. code-block:: console
{'PRES12': -0.027366954931038075,
'SEN12': -0.1112428189930485,
'GOV14': -0.016952521996415275,
'AG14': 0.0664089504401374,
'COMP14': -0.03643474212627552,
'PRES16': -0.04564932242915228,
'SEN16': -0.02799189191120642,
'GOV18': 0.09144998629410322,
'SEN18': -0.12475998763996132,
'AG18': -0.06082242557828398,
'COMP18': 0.05664447794898745}
If you know you want to use a lot of scores, it can be helpful to make a list of the
scores of interest, like so:
.. code-block:: python
partisan_scores = [
seats(elections, "Dem"),
seats(elections, "Rep"),
# signed_proportionality(elections, "Dem", mean=True),
# absolute_proportionality(elections, "Dem", mean=True),
efficiency_gap(elections, mean=True),
mean_median(elections),
partisan_bias(elections),
partisan_gini(elections),
# Note that `eguia` takes several more arguments — see the documentation for more details
eguia(elections, "Dem", graph, updaters, "COUNTYFP20", "TOTPOP20"),
]
Now, we can make use of the ``summarize()`` function to evaluate all the scores on
this partition:
.. code-block:: python
partisan_dictionary = summarize(partition, partisan_scores)
partisan_dictionary["mean_median"]
This will return
.. code-block:: console
{'PRES12': 0.02205704780736839,
'SEN12': 0.04184519796735442,
'GOV14': 0.0128224074264629,
'AG14': 0.03372274606966308,
'COMP14': 0.026622499095666607,
'PRES16': 0.03478025159124121,
'SEN16': 0.03829214902714728,
'GOV18': 0.0195942524690087,
'SEN18': 0.037782714199074086,
'AG18': 0.03906798945053658,
'COMP18': 0.036168324606223434}
and
.. code-block:: python
partisan_dictionary["mean_efficiency_gap"]
gives us
.. code-block:: console
-0.02151975008383212
Demographic Scores
------------------
Our demographic scores return a dictionary that maps districts to demographic
information, either population counts or shares.
.. code-block:: python
# `demographic_tallies()` takes a list of the demographics you'd like to tally
tally_scores = demographic_tallies(["TOTPOP20", "BVAP20", "HVAP20"])
tally_dictionary = summarize(partition, tally_scores)
tally_dictionary
This will return a dictionary that looks like this:
.. code-block:: console
{'TOTPOP20': {1: 771992,
7: 772346,
8: 772421,
6: 771907,
3: 773001,
4: 772893,
5: 771418,
2: 771246},
'BVAP20': {1: 50513,
7: 186256,
8: 84454,
6: 285475,
3: 106681,
4: 258794,
5: 334253,
2: 82315},
'HVAP20': {1: 40466,
7: 36221,
8: 27363,
6: 44099,
3: 45359,
4: 144187,
5: 43594,
2: 110973}}
And
.. code-block:: python
# `demographic_shares()` takes a dictionary where each key is a total demographic column
# that will be used as the denominator in the share (usually either `TOTPOP20` or `VAP20`)
# and each value is a list of demographics on which you'd like to compute shares
share_scores = demographic_shares({"VAP20": ["BVAP20", "HVAP20"]})
share_dictionary = summarize(partition, share_scores)
share_dictionary
returns
.. code-block:: console
{'BVAP20_share': {1: 0.08427654278144459,
7: 0.3075109503392005,
8: 0.1389347687326854,
6: 0.463149987751003,
3: 0.18038569170027308,
4: 0.4331758821894971,
5: 0.5577436821598711,
2: 0.13770530746350554},
'HVAP20_share': {1: 0.06751399798455716,
7: 0.05980131717762746,
8: 0.045014707140366,
6: 0.07154549893977225,
3: 0.07669701811787184,
4: 0.2413438137099663,
5: 0.07274213867961521,
2: 0.1856474650446164}}
Two things to note:
Both :meth:`~gerrytools.scoring.demographic_tallies` and
:meth:`~gerrytools.scoring.demographic_shares` return *lists* of ``Score`` s
(one for each demographic of interest), so if we want to just score one demographic,
we'd have to index into the list in order to call ``.function()`` :
.. code-block:: python
demographic_tallies(["BVAP20"])[0].apply(partition)
which returns
.. code-block:: console
{1: 50513,
7: 186256,
8: 84454,
6: 285475,
3: 106681,
4: 258794,
5: 334253,
2: 82315}
Moreover, you can only use these scores on demographic columns that have already been
tracked as ``Tally`` updaters when we instantiated our partition. If you try a new
column (say, ``WVAP20``) things won't work!
.. code-block:: python
demographic_tallies(["WVAP20"])[0].apply(partition)
gives us
.. code-block:: console
{1: 457669,
7: 320218,
8: 458845,
6: 234283,
3: 348325,
4: 127814,
5: 178346,
2: 275860}
Our last demographic updater is :meth:`~gerrytools.scoring.gingles_districts`, which
takes in a dictionary of the same type as ``demographic_tallies`` as well as a
``threshold`` between 0 and 1. Just like the other two demographic scores it returns a list
of ``Score`` s, but here the ``Score`` s represent the number of districts where the
demographic group's share is above the ``threshold``. (When the threshold is 0.5 — the
default — these districts are called *Gingles' Districts*.
.. code-block:: python
gingles_scores = gingles_districts({"VAP20": ["BVAP20", "HVAP20"]}, threshold=0.5)
gingles_dictionary = summarize(partition, gingles_scores)
gingles_dictionary
and this returns to us
.. code-block:: console
{'BVAP20_gingles_districts': 1, 'HVAP20_gingles_districts': 0}