=======
Scoring
=======

The ``scoring`` module of ``gerrychain`` is a collection of functions that can be
used in conjunction with the ``Partition`` class to create more complex updaters
beyond what is provided natively in the ``gerrychain`` library. This module
also provides a number of methods for analyzing election results of an 
ensemble generated by a ReCom chain.


For this tutorial, we will be working with the following shapefile of the state of
Maryland:

.. raw:: html

    <div class="center-container">
        <a href="https://github.com/peterrrock2/gerrytools-dev/main/glob/docs/source/_static/MD_vtd20.zip" class="download-badge" download>
            MD Shapefile
        </a>
    </div>
    <br style="line-height: 5px;">

Scoring Districting Plans
-------------------------

Let's start with the imports that we will need for this section:

.. code-block:: python 

    from gerrychain import Graph, Partition, Election
    from gerrytools.scoring import *
    import pandas as pd
    import geopandas as gpd

All of our scores are functions that take a GerryChain ``Partition`` and produce
either a numerical (plan-wide) score or a mapping from district or election IDs to
numeric scores. For our examples, we will use a 2020 Maryland VTD shapefile to build
our underlying dual graph, since the shapefile has demographic and electoral
information that our scores will rely on.


.. code-block:: python

    graph = Graph.from_file("MD_vtd20/")

    elections = ["PRES12", "SEN12", "GOV14", "AG14", "COMP14", 
                "PRES16", "SEN16", "GOV18", "SEN18", "AG18", "COMP18"]

    # use our list of elections ablve to create `Election` updaters for each contest
    # Ex: in our shapefile, the column `PRES12R` refers to the votes Mitt 
    # Romney (R) received in the 2012 Presidential general election
    updaters = {}
    for e in elections:
        updaters[e] = Election(e, {"Dem": e+"D", "Rep": e+"R"})

The :meth`~gerrytools.scoring.demographic_updaters` function returns a dictionary of 
``Tally`` updaters that track the number of people of a given demographic group. You
can pass as a list with as many demographic groups as you wish (example below):

.. code-block:: python

    demographic_updaters(["TOTPOP20", "VAP20"])

Which should return something like:

.. code-block:: console

    {'TOTPOP20': <gerrychain.updaters.tally.Tally at 0x7f261af03a00>,
    'VAP20': <gerrychain.updaters.tally.Tally at 0x7f261af03dc0>}

And then we can continue to add these to our updaters for our partition, and
continue as normal

.. code-block:: python

    # add updaters that track total population, total voting age population, 
    # and Black and Hispanic voting age population
    updaters.update(demographic_updaters(["TOTPOP20", "VAP20", "BVAP20", "HVAP20", "WVAP20"]))

    # create the partition on which we'll generate scores
    # since `MD_CD_example.csv` is a CSV with `GEOID20` -> district assignment,
    # we need to replace the `GEOID20`s with integer node labels to match the graph's nodes.
    geoid_to_assignment = pd.read_csv("data/MD_CD_example.csv", header=None).set_index(0).to_dict()[1]
    assignment = {n: geoid_to_assignment[graph.nodes[n]["GEOID20"]] for n in graph.nodes}
    partition = Partition(graph, assignment, updaters)

Partisan scores
---------------


All our partisan scores require at least a list of elections (we'll use our
``elections`` list defined above). Some of them additionally require the user to
specify a POV party (in our case, either ``Dem`` or ``Rep``). All of these partisan
scores return a dictionary that maps election names to the score for that election; it
is up to the user to aggregate (often by summing or averaging) the scores across every
election. For a simple example, let's use the score function that returns the number
of Democratic seats won in each election.

.. code-block:: python

    seats(elections, "Dem")

This will return:

.. code-block:: console

    Score(name='Dem_seats', apply=functools.partial(<function _seats at 0x7f2625fe0720>, election_cols=['PRES12', 'SEN12', 'GOV14', 'AG14', 'COMP14', 'PRES16', 'SEN16', 'GOV18', 'SEN18', 'AG18', 'COMP18'], party='Dem', mean=False), dissolved=False)

Note that the output of ``seats(elections, "Dem")`` is of type ``Score``, which
functions like a Python ``namedtuple``: for any object ``x`` of type ``Score``,
``x.name`` returns the name of the score, and ``x.apply`` returns a function that
takes a ``Partition`` as input and returns the score. See below:

.. code-block:: python

    seats(elections, "Dem").name

returns

.. code-block:: console

    'Dem_seats'

and 

.. code-block:: python

    seats(elections, "Dem").apply(partition)

returns

.. code-block:: console

    {'PRES12': 6,
    'SEN12': 6,
    'GOV14': 4,
    'AG14': 6,
    'COMP14': 6,
    'PRES16': 6,
    'SEN16': 6,
    'GOV18': 4,
    'SEN18': 6,
    'AG18': 6,
    'COMP18': 8}

Note that we can easily find the number of Republican seats like so:

.. code-block:: python

    seats(elections, "Rep").apply(partition)

This gives us

.. code-block:: console

    {'PRES12': 2,
    'SEN12': 2,
    'GOV14': 4,
    'AG14': 2,
    'COMP14': 2,
    'PRES16': 2,
    'SEN16': 2,
    'GOV18': 4,
    'SEN18': 2,
    'AG18': 2,
    'COMP18': 0}

Moreover, we can pass ``mean=True`` to return the average of the score over all
elections, rather than a dictionary:

.. code-block:: python

    seats(elections, "Rep", mean=True).apply(partition)

Some partisan scores (``mean_median``, ``efficiency_gap``, ``partisan_bias``,
``partisan_gini``) do not require the user to specify the POV party in the call. This
is not because there isn't a POV party, but because these functions call GerryChain
functions that automatically set the POV party to be the **first** party listed in the
updater for that election. Since we always list ``Dem`` first in this notebook, this
means ``Dem`` will be the POV party for these scores— but this is something you should
keep in mind when setting up your updaters and your partition.

.. code-block:: python

    # Positive values denote an advantage for the POV party
    efficiency_gap(elections).apply(partition)

which will give us

.. code-block:: console

    {'PRES12': -0.027366954931038075,
    'SEN12': -0.1112428189930485,
    'GOV14': -0.016952521996415275,
    'AG14': 0.0664089504401374,
    'COMP14': -0.03643474212627552,
    'PRES16': -0.04564932242915228,
    'SEN16': -0.02799189191120642,
    'GOV18': 0.09144998629410322,
    'SEN18': -0.12475998763996132,
    'AG18': -0.06082242557828398,
    'COMP18': 0.05664447794898745}

If you know you want to use a lot of scores, it can be helpful to make a list of the
scores of interest, like so:

.. code-block:: python

    partisan_scores = [
        seats(elections, "Dem"),
        seats(elections, "Rep"),
        # signed_proportionality(elections, "Dem", mean=True),
        # absolute_proportionality(elections, "Dem", mean=True),
        efficiency_gap(elections, mean=True),
        mean_median(elections),
        partisan_bias(elections),
        partisan_gini(elections),
        # Note that `eguia` takes several more arguments — see the documentation for more details
        eguia(elections, "Dem", graph, updaters, "COUNTYFP20", "TOTPOP20"),
    ]

Now, we can make use of the ``summarize()`` function to evaluate all the scores on
this partition:

.. code-block:: python

    partisan_dictionary = summarize(partition, partisan_scores)
    partisan_dictionary["mean_median"]

This will return

.. code-block:: console

    {'PRES12': 0.02205704780736839,
    'SEN12': 0.04184519796735442,
    'GOV14': 0.0128224074264629,
    'AG14': 0.03372274606966308,
    'COMP14': 0.026622499095666607,
    'PRES16': 0.03478025159124121,
    'SEN16': 0.03829214902714728,
    'GOV18': 0.0195942524690087,
    'SEN18': 0.037782714199074086,
    'AG18': 0.03906798945053658,
    'COMP18': 0.036168324606223434}

and 

.. code-block:: python

    partisan_dictionary["mean_efficiency_gap"]

gives us

.. code-block:: console

    -0.02151975008383212

Demographic Scores
------------------

Our demographic scores return a dictionary that maps districts to demographic
information, either population counts or shares.

.. code-block:: python

    # `demographic_tallies()` takes a list of the demographics you'd like to tally
    tally_scores = demographic_tallies(["TOTPOP20", "BVAP20", "HVAP20"])
    tally_dictionary = summarize(partition, tally_scores)
    tally_dictionary

This will return a dictionary that looks like this:

.. code-block:: console

    {'TOTPOP20': {1: 771992,
    7: 772346,
    8: 772421,
    6: 771907,
    3: 773001,
    4: 772893,
    5: 771418,
    2: 771246},
    'BVAP20': {1: 50513,
    7: 186256,
    8: 84454,
    6: 285475,
    3: 106681,
    4: 258794,
    5: 334253,
    2: 82315},
    'HVAP20': {1: 40466,
    7: 36221,
    8: 27363,
    6: 44099,
    3: 45359,
    4: 144187,
    5: 43594,
    2: 110973}}

And 

.. code-block:: python

    # `demographic_shares()` takes a dictionary where each key is a total demographic column
    # that will be used as the denominator in the share (usually either `TOTPOP20` or `VAP20`)
    # and each value is a list of demographics on which you'd like to compute shares
    share_scores = demographic_shares({"VAP20": ["BVAP20", "HVAP20"]})
    share_dictionary = summarize(partition, share_scores)
    share_dictionary

returns

.. code-block:: console

    {'BVAP20_share': {1: 0.08427654278144459,
    7: 0.3075109503392005,
    8: 0.1389347687326854,
    6: 0.463149987751003,
    3: 0.18038569170027308,
    4: 0.4331758821894971,
    5: 0.5577436821598711,
    2: 0.13770530746350554},
    'HVAP20_share': {1: 0.06751399798455716,
    7: 0.05980131717762746,
    8: 0.045014707140366,
    6: 0.07154549893977225,
    3: 0.07669701811787184,
    4: 0.2413438137099663,
    5: 0.07274213867961521,
    2: 0.1856474650446164}} 


Two things to note:

Both :meth:`~gerrytools.scoring.demographic_tallies` and 
:meth:`~gerrytools.scoring.demographic_shares` return *lists* of ``Score`` s 
(one for each demographic of interest), so if we want to just score one demographic,
we'd have to index into the list in order to call ``.function()`` :

.. code-block:: python

    demographic_tallies(["BVAP20"])[0].apply(partition)

which returns

.. code-block:: console

    {1: 50513,
    7: 186256,
    8: 84454,
    6: 285475,
    3: 106681,
    4: 258794,
    5: 334253,
    2: 82315}


Moreover, you can only use these scores on demographic columns that have already been
tracked as ``Tally`` updaters when we instantiated our partition. If you try a new
column (say, ``WVAP20``) things won't work!

.. code-block:: python

    demographic_tallies(["WVAP20"])[0].apply(partition)

gives us

.. code-block:: console

    {1: 457669,
    7: 320218,
    8: 458845,
    6: 234283,
    3: 348325,
    4: 127814,
    5: 178346,
    2: 275860}

Our last demographic updater is :meth:`~gerrytools.scoring.gingles_districts`, which
takes in a dictionary of the same type as ``demographic_tallies`` as well as a
``threshold`` between 0 and 1. Just like the other two demographic scores it returns a list
of ``Score`` s, but here the ``Score`` s represent the number of districts where the
demographic group's share is above the ``threshold``. (When the threshold is 0.5 — the
default — these districts are called *Gingles' Districts*.

.. code-block:: python

    gingles_scores = gingles_districts({"VAP20": ["BVAP20", "HVAP20"]}, threshold=0.5)
    gingles_dictionary = summarize(partition, gingles_scores)
    gingles_dictionary

and this returns to us

.. code-block:: console

    {'BVAP20_gingles_districts': 1, 'HVAP20_gingles_districts': 0}