Full Package Reference

Ben

Data

Facilities for processing data and districting plans in a standardized fashion.

class gerrytools.data.AssignmentCompressor(identifiers, window=10, location='compressed.ac')[source]

Bases: object

A class for compressing and decompressing lots of assignments very, very quickly. Intended for use with jsonlines-like libraries (where assignments are read in line-by-line) or for network requests (where assignments are retrieved one-by-one). When decompressing, yields dict s where keys are in sorted order.

The compression schema considers the set of unique identifiers, imposes an ordering (lexicographic order) on the identifiers, and matches the assignment to that ordering. We assign all unassigned units to "-1" and, once the default cache size is hit (or assignments are no longer being read in), compress all assignments in the cache. Assignments are read in and out in the same order, and the keys for each assignment are in the same order.

Example

To compress assignments, we need a set of unique identifiers such that each identifier maps one geometric unit to one district.

geoids = blocks[“GEOID20”].astype(str) ac = AssignmentCompressor(geoids, location=”compressed-assignments.ac”)

with ac as compressor:
for assignment in assignments:

# Here, ensure that all assignments have string keys and # string values; also ensure that an assignment’s keys are # a subset of geoids (or whatever IDs you’re passing). compressor.compress(assignment)

To decompress assignments, we again must have a set of unique geometric identifiers which match the assignments. We can then iterate over the decompressed assignments as they’re read out of the file.

geoids = blocks[“GEOID20”].astype(str) ac = AssignmentCompressor(geoids, location=”compressed-assignments.ac”)

for assignment in ac.decompress():

<do whatever!>

DISTRICT_DELIMITER

A bytestring which separates district identifiers in an assignment.

Type:

bytes

ASSIGNMENT_DELIMITER

A bytestring which separates assignments from each other.

Type:

bytes

CHUNK_DELIMITER

A bytestring which separates assignment chunks from each other.

Type:

bytes

CHUNK_SIZE

Default number of bytes read in from the IO stream at each step.

Type:

int

ENCODING

Default string encoding style.

Type:

str

identifiers

A sortable, iterable collection of unique items corresponding to geographic identifiers.

compressed

A pandas Index containing the identifiers; this is used to quickly perform vectorized identifier matchings, rather than using traditional iterative methods.

cache

Collection of assignments to be compressed. Assignments are loaded into the cache every time the .compress() method is called, and is cleared whenever the length of the cache exceeds the window width.

window

Maximum cache length before the cache is compressed, written to file, and emptied.

default

The default assignment which is updated each time an assignment is passed to the compressor.

location

The place to which compressed data is written or read.

compress(assignment)[source]

Compresses the assignment assignment using zlib.

Parameters:

assignment (dict) – Dictionary which matches geometric identifiers to districts. All keys and values in this dictionary must be strings.

compress_all(assignments)[source]

Compresses all assignments in assignments.

Parameters:

assignments (list) – List of dictionaries which match geometric identifiers to districts. All keys and values in these dictionaries must be strings.

decompress()[source]

Decompresses the data at location. A generator which yield s assignments.

Yields:

Decompressed assignment dictionaries.

match(assignment) SortedDict[source]

Matches an assignment to an index (the set of geometric identifiers) and returns a SortedDict.

Parameters:

assignment (dict) – Dictionary which matches geometric identifiers to districts. All keys and values in this dictionary must be strings.

Returns:

A SortedDict with identifiers matched ti district assignments.s

class gerrytools.data.Submission(*, link: str, plan: dict, id: str, units: str, unitsType: str, tileset: str, type: str)[source]

Bases: BaseModel

Provides a base model for data retrieved from districtr. Allows us to use dot notation when accessing properties rather than dict notation.

id: str

districtr identifier.

A districtr URL.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

plan: dict

districtr plan object.

tileset: str

Mapbox tileset URL.

type: str

Not sure.

units: str

Unit identifier (e.g. GEOID).

unitsType: str

Unit type (e.g. blocks20, blockgroup, etc.)

gerrytools.data.acs5(state, geometry='tract', year=2020, columns=[], white='NHWHITEVAP') DataFrame[source]

Retrieves ACS 5-year population estimates for the provided state, geometry level, and year. Also retrieves ACS-reported CVAP data, which closely matches that reported by the CVAP special tabulation; CVAP data are only returned at the tract level, and are otherwise reported as 0.

Parameters:
  • state (us.State) – State object for the desired state.

  • geometry (str, optional) – Geometry level at which data is retrieved. Acceptable values are “tract” and “block group”. Defaults to “tract”, so data is retrieved at the 2020 Census tract level.

  • year (int, optional) – Year for which data is retrieved. Defaults to 2020.

  • columns (list, optional) – Columns to retrieve. If None, a default set of columns including total populations by race and ethnicity and voting-age populations by race and ethnicity are returned, along with a GEOID column.

  • white (str, optional) – The column removed from totals when calculating POC populations.

Returns:

A DataFrame containing the formatted data.

gerrytools.data.census10(state, table='P8', columns={}, geometry='block')[source]

Retrieves geometry-level 2010 Summary File 1 data via the Census API.

Parameters:
  • state (State) – us.State object (e.g. us.states.WI).

  • table (string, optional) – Table from which we retrieve data. Defaults to the P8 table, which contains population by race regardless of ethnicity.

  • columns (dict, optional) – Dictionary which maps Census column names (from the correct table) to human-readable names. We require this to be a dictionary, _not_ a list, as specifying human-readable names will implicitly protect against incorrect column names and excessive API calls.

  • geometry (string, optional) – Geometry level at which we retrieve data. Defaults to “block” to retrieve block-level data for the state provided. Accepted values are “block”, “block group”, and “tract”.

Returns:

A DataFrame with columns renamed according to their Census description designation and a unique identifier column for joining to geometries.

gerrytools.data.census20(state, table='P1', columns={}, geometry='block', key='75c0c07e6f0ab7b0a9a1c14c3d8af9d9f13b3d65') DataFrame[source]

Retrieves geometry-level 2020 Decennial Census PL94-171 data via the Census API.

Parameters:
  • state (State) – us.State object (e.g. us.states.WI).

  • table (string, optional) – Table from which we retrieve data. Defaults to the P1 table, which gets populations by race regardless of ethnicity.

  • columns (dict, optional) – Dictionary which maps Census column names (from the correct table) to human-readable names. We require this to be a dictionary, _not_ a list, as specifying human-readable names will implicitly protect against incorrect column names and excessive API calls.

  • geometry (string, optional) – Geometry level at which we retrieve data. Defaults to “block” to retrieve block-level data for the state provided. Accepted values are “block”, “block group”, and “tract”.

  • key (string, optional) – Census API key.

Returns:

A DataFrame with columns renamed according to their Census description designation and a GEOID20 column for joining to geometries.

gerrytools.data.csvs(state, ptype='plan')[source]

URL for accessing districtr plan metadata.

Parameters:
  • stateus.States object (e.g. us.states.WI)

  • ptype – Type of plan we’re retrieving; defaults to “plan”.

Returns:

String with the appropriate URL.

gerrytools.data.cvap(state, geometry='tract', year=2020) DataFrame[source]

Retrieves and CSV-formats 5-year CVAP data for the provided state at the specified geometry level. Geometries from the 2010 Census. Variables and descriptions are [listed here](https://tinyurl.com/3mnrm56s).

Parameters:
  • state (us.State) – The State object for which we’re retrieving 2019 ACS CVAP Special Tab.

  • geometry (str, optional) – Level of geometry for which we’re getting data. Accepted values are “block group” for 2010 Census Block Groups, and “tract” for 2010 Census Tracts. Defaults to “tract”.

  • year (int, optional) – Year for which data is retrieved. Defaults to 2020.

Returns

A DataFrame with a GEOID column and corresponding CVAP columns from the ACS CVAP Special Tab for the specified year.

gerrytools.data.dualgraphs20(state, filepath, geometry='block group')[source]

Retrieves Lab-processed dual graph data for the provided state and geometry level and writes it to the provided filepath.

Parameters:

state (us.State) – State for which we retrieve data.

gerrytools.data.estimatecvap2010(base, state, groups, ceiling, zfill, geometry10='tract', year=2019) DataFrame[source]

Function for turning old (2019) CVAP data on 2010 geometries into estimates for current CVAP data on 2020 geometries. This method serves a different purpose than `gerrytools.data.estimatecvap. estimatecvap2020()`: this method is intended to put 2010-era CVAP data on 2020-era geometries, and uses geometric properties to do so.

Users must supply a base GeoDataFrame representing their chosen U.S. state. Additionally, users must specify the demographic groups whose CVAP statistics are to be estimated. For each group, users specify a triple ((X, Y, Z)) where (X) is the old CVAP column for that group, (Y) is the old VAP column for that group, and (Z) is the new VAP column for that group, which must be an existing column on base. Then, the estimated new CVAP for that group will be constructed by multiplying ((X / Y) cdot Z) for each new geometry.

<div style=”text-align: center;”> </br> <img width=”75%” src=”../images/cvap-estimation.png”/> </div>

Parameters:
  • base (GeoDataFrame) – A GeoDataFrame with the appropriate columns for estimating CVAP.

  • state (State) – The us.State object for which CVAP data is retrieved.

  • groups (list) – (X, Y, Z) triples for each desired CVAP group to be estimated, where each of the parameters are column names: X is the column on the 2010 geometries which contains the relevant CVAP data; Y is the column on the 2010 geometries which contains the relevant VAP data; Z is the column on the 2020 geometries to be weighted by the ratio of the per-unit ratios in X and Y. For example, if we wish to estimate Black CVAP, this triple would be (NHBCVAP19, BVAP19, BVAP20), which takes the ratios of the NHBCVAP19 and BVAP19 columns on the 2010 geometries, and multplies the 2020 geometries’ respective BVAP20 values by these ratios.

  • ceiling (float) – Number representing where to cap the weighting ratio of CVAP to VAP20. After this percentage ceiling is passed, the percentage will be set to 1. We recommend setting this to 1.

  • zfill (float) – Fill in ratio for CVAP to VAP20 when there is 0 CVAP in the area. We recommend setting this parameter to 0.1.

  • geometry10 (str, optional) – The 2010 geometry on which cvap will be pulled. Acceptable values are “tract” or “block group”. As tracts are less susceptible to change across Census vintages, setting this parameter to “tract” is recommended, as it is more likely that the 2020 Census blocks fit neatly into the 2010 Census tracts.

Returns:

base geometries with 2019 CVAP-weighted 2020 CVAP estimates attached.

gerrytools.data.estimatecvap2020(state) DataFrame[source]

Estimates 2020 CVAP on 2020 blocks using 2020 PL94 data. This method serves a different purpose than `gerrytools.data.estimatecvap. estimatecvap2010()`: rather than using geometric procedures to put CVAP data on old geometries, this method takes advantage of the Census’s geographic hierarchy, and associates finer-grained 2020 CVAP data with 2020 blocks. No geometric data or procedures are used here. The resulting data can then be adjoined to 2020 block geometries (or assigned to VTDs, assigned to districts, etc.) and be used to build other units of varying size.

Parameters:

state (State) – The us.State for which CVAP will be estimated.

Returns:

A DataFrame of combined Census and ACS data at the Census block level.

gerrytools.data.fetchgeometries(state, geometry) GeoDataFrame[source]

Fetches the 2010 Census geometries on which ACS data are reported.

Parameters:
  • state (State) – The us.State for which CVAP will be estimated.

  • geometry10 (str) – Level of geometry we’re fetching. Accepted values are “tract” and “block group”.

Returns:

A GeoDataFrame of 2010 geometries.

gerrytools.data.geometries20(state, filepath, geometry='tract')[source]

Retrieves Lab-processed geometric data for the provided state and geometry level and writes it to the provided filepath.

Parameters:
  • state (us.State) – State for which we retrieve data.

  • filepath (str) – Location to which we write the compressed data.

  • geometry (str, optional) – Geometry level at which we retrieve data. Accepted values are block group, block, congress, county, cousub, place, senate, house, tract, and vtd. Defaults to tract.

gerrytools.data.ids(state)[source]

URL for accessing districtr identifiers.

Parameters:

state – Name of the state (e.g. “wisconsin”) for which we’re retrieving districtr identifiers.

Returns:

String with the appropriate URL.

gerrytools.data.one(identifier)[source]

URL for accessing an individual districtr plan.

Parameters:

identifier – districtr identifier.

Returns:

String with the appropriate URL.

gerrytools.data.remap(plans, unitmaps, popmap=None) Callable[source]

Re-maps assignments to the specified set of units.

Parameters:
  • plans (DataFrame) – The Pandas DataFrame produced by tabularized().

  • unitmaps (dict) – A dictionary whose keys are unit types appearing in the unitsType column, and whose values are dictionaries mapping unique identifiers of one set of geometries to unique identifiers (or lists of unique identifiers) of another set of geometries; these correspond to mappings generated by unitmap() and the inverse mapping generated by invert().

  • popmap (dict, optional) – A mapping from unit unique identifiers to population values. Only applies when we are mapping from smaller units to larger ones.

Returns:

A function

gerrytools.data.submissions(state, sample=None) List[Submission][source]

Retrieves raw districtr objects; this includes both plan- and COI-based submissions.

Parameters:
  • state (State) – us.State object (e.g. us.states.WI).

  • sample (int, optional) – The number of sample plans to retrieve.

Returns:

A list of Submissions, either to be interpreted raw or tabularized.

gerrytools.data.tabularized(state, submissions) Tuple[DataFrame, DataFrame, DataFrame][source]

Returns districtr submission information in a tabular format.

Parameters:
  • state (State) – us.State object (e.g. us.states.WI).

  • submissions (list) – List of Submission objects returned from a call to submissions.

Returns:

Three dataframes corresponding to plan-based submissions, COI-based submissions, and written submissions to the provided state.

Example

Prototypical example usage.

import us from gerrytools.retrieve import submissions, tabularized

# Set the state. state = us.states.WI

# Retrieve the raw districtr submissions, then tabularize them. subs = submissions(state) plans, cois, written = tabularized(state, subs)

gerrytools.data.variables(table) dict[source]

Produces variable names for the 2020 Census PL94-171 tables. Variables are determined from patterns apparent in PL94 variable [lists for tables P1 through P4](https://tinyurl.com/2s3btptn).

Parameters:

table (string) – The table for which we’re generating variables.

Returns:

A dictionary mapping Census variable codes to human-readable ones.

gerrytools.data.vtds20(state, filepath)[source]

Retrieves Lab-processed geometric data for the provided state and geometry level and writes it to the provided filepath.

Parameters:

state (us.State) – State for which we retrieve data.

Geometry

Provides ease-of-use functionality for geographic and geometric operations.

gerrytools.geometry.arealoverlap(left: GeoDataFrame, right: GeoDataFrame, assignment: str = 'DISTRICT', crs=None) DataFrame[source]

Given two GeoDataFrames, each encoding districting plans, computes the areal overlap between each pair of districts. left is the districting plan to be relabeled (e.g. a proposed districting plan) and right is the districting plan with district labels we’re trying to match (e.g. an enacted districting plan). If left (denoted \(L\)) has \(n\) districts and right (denoted \(R\)) has \(m\) districts, an \(n \times m\) matrix \(C\) is computed, where the entry \(M_{ij}\) represents the area of the intersection of the districts \(L_i\) and \(R_j\). \(C\) is represented as a pandas DataFrame, where the row indices are the labels in left, and are the preimage of the label mapping; column indices are the labels in right, and are the image of the label mapping.

Parameters:
  • left (pd.DataFrame) – GeoDataFrame whose labels are the preimage of the relabeling.

  • right (pd.DataFrame) – GeoDataFrame whose labels are the image of the relabeling.

  • assignment (str) – Column on left and right which contains the district identifier.

Returns:

Cost matrix \(C\), represented as a DataFrame.

gerrytools.geometry.calculate_dispersion(units: GeoDataFrame, enacted_col: str, proposed_col: str, pop_col: str) int[source]

Calculates core dispersion in a state given an column with enacted districts and a column with proposed numberings. Used in WI.

Parameters:
  • units – The units to optimize on. E.g. Census blocks.

  • enacted_col – The column in the GeoDataFrame with the enacted districts.

  • proposed_col – The column in the GeoDataFrame with the proposed districts.

  • pop_col – The column in the GeoDataFrame with population counts.

Returns:

An integer of the absolute number of people who changed districts.

gerrytools.geometry.dataframe(P: Partition, index: str = 'id', assignment: str = 'DISTRICT', columns: list = None) DataFrame[source]

Converts a Partition into a DataFrame.

Parameters:
  • P (Partition) – GerryChain Partition object to have its data framed.

  • index (str, optional) – Graph attribute to use as an index. The networkx default name is “id”.

  • assignment (str, optional) – Column name for assignment.

  • columns (list, optional) – List of columns to add to the dataframe, not including the index. If None (or another falsy value), gets all columns.

Returns:

DataFrame with attached graph data.

gerrytools.geometry.dispersion_updater_closure(units: GeoDataFrame, enacted_col: str, pop_col: str, verbose: bool = False)[source]

An updater to calculate best possible dispersion for a gerrychain.Partition object.

Parameters:
  • units – The units to optimize on. E.g. Census blocks.

  • enacted_col – The column in the GeoDataFrame with the enacted districts.

  • proposed_col – The column in the GeoDataFrame with the proposed districts.

  • extra_constraints – Optional; A function that can add extra constraints to the model, such as parity (in the case of WI).

  • verbose – If true, do not suppress solver output. Otherwise, stay quiet.

Returns:

An updater that calculates the minimal core dispersion of a Partition object.

gerrytools.geometry.dissolve(geometries, by='DISTRICTN', reset_index=True, keep=[], aggfunc='sum') GeoDataFrame[source]

Dissolves geometries on the column by. Intended to dissolve a set of source geometries (e.g. VTDs, blocks, block groups, etc.) to district geometries.

Parameters:
  • geometries (GeoDataFrame) – Set of geometries to be dissolved.

  • by (str) – Name of the column used to group objects.

  • reset_index (boolean, optional) – If true, the index of the resulting GeoDataFrame will be set to an integer index, not by. Defaults to True.

  • keep (list, optional) – Additional columns to keep beyond the geometry and by columns. Defaults to an empty list, so no additional columns are kept.

  • aggfunc (str, optional) – Pandas groupby function type when aggregating; defaults to “sum”.

Returns:

A GeoDataFrame containing dissolved geometries and kept columns computed by the function designated by aggfunc.

gerrytools.geometry.dualgraph(geometries, index=None, geometrycolumn='geometry', colmap={}, buffer=0, edges_to_add=[], edges_to_cut=[]) Graph[source]

Generates a graph dual to the provided geometric data.

Parameters:
  • geometries (GeoDataFrame) – Geometric data represented as a GeoDataFrame.

  • index (str, optional) – Unique identifiers; indexing column of geometries. If this value is not set, vertex labels are integer indices; otherwise, vertex labels are the values of this column. Defaults to “geometry”.

  • colmap (dict, optional) – Maps old column names to new column names.

  • buffer (float, optional) – Geometric buffer distance; defaults to 0.

  • edges_to_add (list, optional) – Edges to add to the graph object. Assumed to be a list of pairs of objects, e.g. [(u, v), …] where u and v are vertex labels consistent with index.

  • edges_to_cut (list, optional) – Edges to cut from the graph object. Assumed to be a list of pairs of objects, e.g. [(u, v), …] where u and v are vertex labels consistent with index.

Returns:

A gerrychain Graph object dual to the geometric data.

gerrytools.geometry.invert(unitmap: Dict[A, B]) Dict[B, List[A]][source]

Inverts the provided unit mapping.

Parameters:

unitmap – Dictionary taking source unique identifiers to target unique identifiers.

Returns:

A dictionary mapping target unique identifiers to _lists_ of source unique identifiers.

gerrytools.geometry.minimize_dispersion(units: GeoDataFrame, enacted_col: str, proposed_col: str, pop_col: str, extra_constraints=None, verbose: bool = False) Dict[str, str][source]

Minimize core dispersion in a state given an column with enacted districts and a column with proposed numberings. Returns a dictionary relabeling the proposed cols. Used in WI. Assumes that district labels are 1-indexed.

Parameters:
  • units – The units to optimize on. E.g. Census blocks.

  • enacted_col – The column in the GeoDataFrame with the enacted districts.

  • proposed_col – The column in the GeoDataFrame with the proposed districts.

  • extra_constraints – Optional; A function that can add extra constraints to the model, such as parity (in the case of WI).

  • verbose – If true, do not suppress solver output. Otherwise, stay quiet.

Returns:

A dictionary mapping proposed labels to optimized labels.

gerrytools.geometry.minimize_dispersion_with_parity(units: GeoDataFrame, enacted_col: str, proposed_col: str, pop_col: str, extra_constraints=None) Dict[str, str][source]

Minimize dispersion and odd->even parity shift in a state given an column with enacted districts and a column with proposed numberings. Returns a dictionary relabeling the proposed cols. Used in WI. Assumes that district labels are 1-indexed.

Parameters:
  • units – The units to optimize on. E.g. Census blocks.

  • enacted_col – The column in the GeoDataFrame with the enacted districts.

  • proposed_col – The column in the GeoDataFrame with the proposed districts.

  • pop_col – The column in the GeoDataFrame with population counts.

  • extra_constraints – Optional; A function that can add extra constraints to the model, such as parity (in the case of WI).

Returns:

A dictionary mapping proposed labels to optimized labels.

gerrytools.geometry.minimize_parity(units: GeoDataFrame, enacted_col: str, proposed_col: str, pop_col: str, verbose: bool = False) Dict[str, bool][source]

Minimize odd->even parity shift in a state given an column with enacted districts and a column with proposed numberings. Returns a dictionary with the parity of the proposed cols. Used in WI. Assumes that district labels are 1-indexed.

Parameters:
  • units – The units to optimize on. E.g. Census blocks.

  • enacted_col – The column in the GeoDataFrame with the enacted districts.

  • proposed_col – The column in the GeoDataFrame with the proposed districts.

  • pop_col – The column in the GeoDataFrame with population counts.

  • verbose – If true, do not suppress solver output. Otherwise, stay quiet.

Returns:

A dictionary mapping proposed labels to booleans values representing the optimal parity. (True if even, False odd).

gerrytools.geometry.optimalrelabeling(left: ~typing.Any, right: ~typing.Any, maximize: bool = True, costmatrix: ~typing.Callable = <function populationoverlap>) dict[source]

Finds the optimal relabeling for two districting plans.

Parameters:
  • left (Any) – Data structure which can be passed to costmatrix to construct a cost matrix. District labels will be the preimage of the relabeling. If the default costmatrix function is used, these must be pandas DataFrames where one row corresponds to one atomic unit (e.g. Census blocks), with at least three columns: one denoting a unique geometric identifier (e.g. GEOID20), one denoting the districting assignment, and another denoting the population of choice. If gerrytools.geometry.optimize.arealoverlap() is used, these must be GeoDataFrames where one row corresponds to one district, and one column denotes the districts’ unique identifiers.

  • right (Any) – Data structure which can be passed to costmatrix to construct a cost matrix. District labels will be the image of the relabeling. If the default costmatrix function is used, these must be pandas DataFrames where one row corresponds to one atomic unit (e.g. Census blocks), with at least three columns: one denoting a unique geometric identifier (e.g. GEOID20), one denoting the districting assignment, and another denoting the population of choice. If gerrytools.geometry.optimize.arealoverlap() is used, these must be GeoDataFrames where one row corresponds to one district, and one column denotes the districts’ unique identifiers.

  • maximize (bool) – Are we finding the largest or smallest linear sum over the cost matrix? Defaults to maximize=True.

  • costmatrix (Callable) – The function (or partial function) which consumes left and right and spits out a cost matrix. This cost matrix is assumed to be a pandas DataFrame, with row indices old district labels and column names new district labels. Examples of these are gerrytools.geometry.optimize.populationoverlap() and gerrytools.geometry.optimize.arealoverlap().

Returns:

A dictionary which maps district labels in left to district labels in right, according to the weighting scheme applied in costmatrix.

This is an assignment problem and is equivalently a (min/max)imal bipartite matching problem. Consider two districting plans \(L\) and \(R\), with \(n\) and \(m\) districts respectively. Set \(V_L\) and \(V_R\) to be sets of vertices such that a vertex \(l_i\) in \(V_L\) corresponds to the district \(L_i\) in \(L\), and similarly for vertices \(r_j\) in \(V_R\); draw edges \((l_i, r_j)\) for each \(i\) from \(1\) to \(n,\) and each \(j\) from \(1\) to \(m.\) In doing so, we construct the bipartite graph <https://bit.ly/39rDldy> \(K_{n,m}\):

We then assign each edge a weight according to some function \(f: L imes R o \mathbb{R}\), which consumes a pair of districts and returns a number. For example, this function could be the amount of area shared by the districts \(L_i\) and \(R_j\), like in gerrytools.geometry.optimize.arealoverlap(), or the amount of population the districts share, like in gerrytools.geometry.optimize.populationoverlap().

We then seek to find the set of weighted edges \(M\) such that all vertices \(l_i\) and \(r_j\) appear at most once in \(M\), and that the sum of \(M\)’s weights is as small (or as large) as possible. To do so, we take the adjacency matrix \(A\) of our graph \(K_{n,m}\), where the \(i, j`th entry records the weight of the edge :math:`(l_i, r_j\)). Then, we want to select at most one entry in each row and column, and ensure those entries have the smallest (or greatest) possible sum. Using the Jonker-Volgenant algorithm (as implemented by scipy), we can find the row and column indices of these entries, and retrieve the district label pairs corresponding to each. The algorithm achieves :math:` extbf{O}(N^3)` worst-case running time, where \(N = \max(n, m)\).

gerrytools.geometry.populationoverlap(left: DataFrame, right: DataFrame, identifier: str = 'GEOID20', population: str = 'TOTPOP20', assignment: str = 'DISTRICT') DataFrame[source]

Given two unit-level DataFrames — i.e. two dataframes where each row represents an atomic unit like Census blocks or VTDs, and each row contains a district assignment — computes the amount of population shared by each pair of districts. left is the districting plan to be relabeled (e.g. a proposed districting plan) and right is the districting plan with district labels we’re trying to match (e.g. an enacted districting plan). If left (denoted \(L\)) has \(n\) districts and right (denoted \(R\)) has \(m\) districts, an \(n imes m\) matrix \(C\) is computed, where the entry \(M_{ij}\) represents the population shared by the districts \(L_i\) and \(R_j\). \(C\) is represented as a pandas DataFrame, where the row indices are the labels in left, and are the preimage of the label mapping; column indices are the labels in right, and are the image of the label mapping.

Parameters:
  • left (pd.DataFrame) – DataFrame whose labels are the preimage of the relabeling.

  • right (pd.DataFrame) – DataFrame whose labels are the image of the relabeling.

  • identifier (str) – Column on left and right which contains the unique identifier for each unit.

  • population (str) – Column on left and right which contains the population total for each unit. This can be modified to be any population.

  • assignment (str) – Column on left and right that denotes district membership.

Returns:

A DataFrame whose row names are the preimage of the relabeling, column names are the image of the relabeling, and values edge weights; a cost matrix.

gerrytools.geometry.unitmap(source, target) dict[source]

Creates a mapping from source units to target units.

Parameters:
  • source (tuple) – 2-tuple containing a GeoDataFrame and an index name corresponding to the unique identifiers of the units, e.g. (vtds, “GEOID20”). Unique identifiers will be keys in the resulting dictionary.

  • target (tuple) – 2-tuple containing a GeoDataFrame and an index name corresponding to the unique identifiers of the units, e.g. (districts, “DISTRICTN”). Unique identifiers will be values in the resulting dictionary.

Returns:

A dictionary mapping _from unique identifiers to _to unique identifiers.

MGRP

Plotting

Makes pretty pictures of districting plans, dual graphs, histograms, boxplots, and violin plots 🎻.

gerrytools.plotting.arrow(ax, text, orientation='horizontal', color='#5c676f', padding=0.1) Axes[source]

For some partisan metrics, we want to draw an arrow showing where the POV-party’s advantage is. Depending on the orientation of the scores (histograms have scores arranged horizontally, violinplots have scores arranged vertically), we either place the arrow at the bottom left, pointing rightward, or in the middle of the y-axis, pointing up.

Parameters:
  • ax (Axes) – Axes object onto which the arrow’s plotted.

  • text (str) – String plotted on top of the arrow.

  • orientation (str, optional) – Direction the arrow’s pointing; acceptable values are “horizontal” and “vertical”. Defaults to “horizontal”.

  • color (str, optional) – Color of the arrow.

  • padding (float, optional) – Spacing between the arrow and its axis. Defaults to 0.1.

Returns:

matplotlib Axes.

gerrytools.plotting.bins(scores, width=None, labels=8) Tuple[array, List, List, float | int][source]

Get necessary information for histograms. If we’re working with only a few discrete, floating point values, then set the bin width to be relatively thin. Otherwise, adaptively set the bin width to the scale of our data.

Parameters:
  • scores (list) – The collection of all observations.

  • width (int, optional) – The width of the bins.

  • labels (int, optional) – The number of histograms to be labeled.

Returns:

A tuple consisting of the histogram bins, the bins that are ticked, the labels for the bins that are ticked, and the bin width.

gerrytools.plotting.boxplot(ax, scores, xticklabels=None, labels=None, proposed_info={}, percentiles=(1, 99), rotation=0, ticksize=12, jitter=0.3333333333333333) Axes[source]

Plot boxplots, which takes scores — a dictionary where each value (corresponding to an ensemble, citizens’ ensemble, or proposed plans), will be a list of lists, where each sublist will be its own box. Proposed scores will be plotted as colored circles on their respective box. Color the boxplots conditioned on the kind of the scores (ensemble or citizen), and trim each sublist to only the values between the specified percentiles.

Parameters:
  • ax (Axes) – Axes object on which the boxplots are plotted.

  • scores (dict) – Dictionary with keys of ensemble, citizen, proposed which map to lists of numerical scores.

  • proposed_info (dict, optional) – Dictionary with keys of colors, names; the (i)th color in color corresponds to the (i)th name in names.

  • percentiles (tuple, optional) – Observations outside this range of percentiles are ignored. Defaults to (1, 99), such that observations between the 1st and 99th percentiles (inclusive) are included, and all others are ignored.

  • rotation (float, optional) – Tick labels are rotated rotation degrees _counterclockwise_.

  • ticksize (float, optional) – Font size for tick labels.

  • jitter (float, optional) – When there is more than one proposed plan, adjust its detail points by a value drawn from (mathcal U (-epsilon, epsilon)) where (epsilon = ) jitter.

  • labels (list, optional) – x- and y-axis labels, if desired.

  • xticklabels (list, optional) – Labels for the boxes, default to integers.

Returns:

Axes object on which the violins are plotted.

gerrytools.plotting.choropleth(geometries, districts=None, assignment=None, demographic_share_col=None, overlays=[], cmap='Purples', cbartitle=None, numbers=False, base_lw=0.125, base_linecolor='lightgray', district_linecolor='black', overlay_lw=0.25, district_lw=1.5, fontsize=15, min=0, max=1, interval=0.1, colorbar=True, figsize=(10, 10)) Axes[source]

Visualization of population shares or totals in a state’s map.

Parameters:
  • geometries (GeoDataFrame) – Base geometries for the state. Population shares or totals will be drawn at this level (i.e. statistics are reported at this base geometric level).

  • districts (GeoDataFrame, optional) – Geometries for the districting plan. Assumes one geometry per district.

  • assignment (str, optional) – Required argument when districts are provided. Column of districts which defines the districing plan.

  • demographic_share_col (str, optional) – The string representing the demographic to be shown on the map. The string should specify a column in geometries. This column must contain values in \([0,1]\).

  • overlays (list, optional) – A list of GeoDataFrames desired to be overlaid on the map. Some options would include overlaying district assignments, blocks, VTDs, or counties. The first set of geometries in the list will be overlaid in the lightest color, and last will be overlaid in the darkest color. cmap (string/ListedColorMap, optional): Defines which colormap to use. Defaults to matplotlib’s Purples colormap. Can be a string which specifies a named matplotlib colormap or a ListedColormap with the appropriate number of bins; by default, this is 10.

  • cbartitle (string, optional) – Title for the colorbar. Defaults to demographic.

  • numbers (bool, optional) – If True, plot district names (as defined by assignment) at districts’ centroids. May only be True when districts is not None.

  • lw (float, optional) – The base geometries’ line widths.

  • min (float, optional) – The lower limit of the data points; defaults to 0.

  • max (float, optional) – The upper limit of the data points; defaults to 1.

  • interval (float, optional) – The width of the interval; a bin.

  • colorbar (bool, optional) – Do we include the color bar?

Returns:

A matplotlib Axes object visualizing a choropleth map with the provided overlays.

gerrytools.plotting.districtnumbers(base, districts, assignment='DISTRICTN', boxstyle='circle,pad=0.2', fc='wheat', ec='black', lw=0.16666666666666666, fontsize=15) Axes[source]

Plots district numbers on top of overlaid district geometries.

TODO: change (x,y) coordinate pairs to representative points rather than centroids.

Parameters:
  • base (Axes) – Base Axes object for the plot.

  • districts (GeoDataFrame) – Geometries for the districting plan. Assumes there is one geometry for each district.

  • assignment (str, optional) – Column of districts which defines the districting plan.

  • boxstyle (str, optional) – Sets the box style for the district number markers. Defaults to circles with 0.2pt padding.

  • fc (str, optional) – District marker face color. Defaults to “wheat”.

  • ec (str, optional) – District marker edge color. Defaults to “black”.

  • lw (float, optional) – District marker edge width. Defaults to 1/6pt.

  • fontsize (float, optional) – District marker font size. Defaults to 15pt.

Returns:

Base axes object.

gerrytools.plotting.districtr(N)[source]
gerrytools.plotting.drawgraph(G, ax=None, x='INTPTLON20', y='INTPTLAT20', components=False, node_size=1, **kwargs) Axes | List[Tuple[Figure, Axes]][source]

Draws a gerrychain Graph object. Returns a single Axes object (for dual graphs drawn whole) and lists of (Figure, Axes) pairs for graphs drawn component-wise.

Parameters:
  • G (Graph) – The dual graph to draw.

  • ax (Axes, optional) – matplotlib.axes.Axes object. If not passed, one is created.

  • x (str, optional) – Vertex property used as the horizontal (E-W) coordinate.

  • y (str, optional) – Vertex property used as the vertical (N-S) coordinate.

  • components (bool, optional) – If True, the graph is assumed to have more than one connected component (e.g. Michigan) and is drawn component-wise and rather than return a single Axes object, return a list of (Figure, Axes) pairs. If something is passed to ax, the same Axes instance is used for each new Figure.

  • node_size (float, optional) – Specifies the default size of a vertex.

  • kwargs (dict, optional) – Arguments to be passed to nx.draw().

Returns:

A tuple of matplotlib (Figure, Axes) objects, or if components is True, returns a list of (Figure, Axes) objects corresponding to each component.

gerrytools.plotting.drawplan(districts, assignment, overlays=[], colors=None, numbers=False, lw=0.5, fontsize=15, edgecolor='black') Axes[source]

Visualizes the districting plan defined by assignment.

Parameters:
  • districts (GeoDataFrame) – Geometries for the districting plan. Assumes there is one geometry for each district.

  • assignment (str) – Column of districts which defines the districting plan.

  • overlays (list, optional) – A list of GeoDataFrames to be plotted over the districts.

  • colors (str, optional) – Column name which specifies colors for each district.

  • numbers (bool, optional) – If True, plots district names (as defined by assignment) at districts’ centroids. Defaults to False.

  • lw (float, optional) – Line thickness if there are more than 20 districts.

  • fontsize (float, optional) – District-number font size; passed to districtnumbers.

Returns:

A matplotlib Axes object for the geometries attached to districts.

gerrytools.plotting.flare(n) list[source]

Returns a list of colors based on the flare Matplotlib/seaborn colormap.

Parameters:

n (int) – Number of colors to generate.

Returns:

List of RGB triples.

gerrytools.plotting.gif_multidimensional(data, proposed_info={}, labels=['X values', 'Y values', 'Histogram values'], filename='testfile', folder='test', limits=None, DPI=150, figsize=(12, 8))[source]

Plot many multidimensional figures in their own {folder}/{filename}/ directory. Each file will represent one ensemble of plans, and this will be stitched together to create a gif.

Parameters:
  • data (dict) – Dictionary with keys of xs, ys, hists, each one a list of length number of frames/ensembles, where each element is a list of all values in the ensemble.

  • proposed_info (dict, optional) – Dictionary with keys of colors, names, x, y, hist; the (i)th color in color corresponds to the (i)th name in names, which corresponds to the (i)th value in x, y, and hist.

  • filename (str) – Name for the final gif, and for the folder the gif’s frames are stored in

  • folder (str) – Folder containing all the frames/gifs.

  • figsize (tuple, optional) – Figure size.

Returns: None.

gerrytools.plotting.histogram(ax, scores, label=None, limits=(), proposed_info={}, ticksize=12, fontsize=24, jitter=False, bin_width=None) Axes[source]

Plot a histogram with the ensemble scores in bins and the proposed plans’ scores as vertical lines. If there are many unique values, use a white border on the bins to distinguish, otherwise reduce the bin width to 80%.

TODO: refactor proposed_info later to use more python builtin tools.

Parameters:
  • ax (Axes) – Axes object on which the histogram is plotted.

  • scores (dict) – Dictionary with keys of ensemble, citizen, proposed which map to lists of numerical scores.

  • label (str, optional) – String for x-axis label.

  • limits (tuple, optional) – X-axis limits (specify to force histogram to extend to these limits).

  • proposed_info (dict, optional) – Dictionary with keys of colors, names; the (i)th color in color corresponds to the (i)th name in names.

  • ticksize (float, optional) – Font size of tick labels.

  • fontsize (float, optional) – Font size of x-axis label.

  • jitter – (Boolean, optional): If True, horizontally jitter proposed plans if they share the same value

  • bin_width – (float, optional): Manually set histogram bin width, if preferred.

Returns:

Axes object on which the histogram is plotted.

gerrytools.plotting.ideal(ax, label, placement, orientation, color='#5c676f', alpha=0.1)[source]

Adds a vertical line, horizontal line, or band indicating the ideal value (or range of values) for the provided score.

Parameters:
  • ax (Axes) – Axes object onto which the line’s plotted.

  • label (str) – Label for the ideal score.

  • placement (float,tuple) – If plotting a line, a single value; if plotting a band, a tuple of (start, end) values.

  • orientation (str) – Indicates the direction of the line or band. Acceptable values are “horizontal” or “vertical”.

  • color (str, optional) – Color of the line or band. Defaults to defaultGray.

  • alpha (float, optional) – Opacity of the line or band. Defaults to 0.1.

gerrytools.plotting.multidimensional(x, y, hist, labels=['X values', 'Y values', 'Histogram values'], bin_width=1, limits=None, proposed_info={}, figsize=(12, 8)) Tuple[Axes, Axes][source]

Plot a multidimensional figure, comparing two metrics as a scatterplot above and one metric as a histogram, below.

Parameters:
  • ax (Axes) – Axes object on which the histogram is plotted.

  • x (list) – Score on the x-axis of the scatterplot.

  • y (list) – Score on the y-axis of the scatterplot.

  • hist (list) – Score to be plotted as a histogram below.

  • limits (list, optional) – x, y, and histogram limits, if wanted.

  • proposed_info (dict, optional) – Dictionary with keys of colors, names, x, y, hist; the (i)th color in color corresponds to the (i)th name in names, which corresponds to the (i)th value in x, y, and hist.

  • figsize (tuple, optional) – Figure size.

Returns:

The scatterplot and histogram axes.

gerrytools.plotting.purples(n) list[source]

Returns a list of colors based on the Purples Matplotlib/seaborn colormap.

Parameters:

n (int) – Number of colors to generate.

Returns:

List of RGB triples.

gerrytools.plotting.redbluecmap(n) List[Tuple][source]

Generates a red/white/blue color palette in n colors with white at the mid th index.

Parameters:

n (int) – The number of colors to generate.

Returns:

List of RGB tuples.

gerrytools.plotting.scatterplot(ax, x, y, labels=None, limits={}, bins=None, axis_range=None, show_legend=True) Axes[source]

Plot a scatterplot comparing two scores, with the proposed plans’ scores as points.

Parameters:
  • ax (Axes) – Axes object on which the histogram is plotted.

  • x (list) – Score on the x-axis. This will be a list of lists, where each sub-list corresponds to the scores for an individual plan.

  • y (list) – Score on the y-axis. This will be a list of lsits where each sub-list corresponds to the scores for an individual plan.

  • labels (list, optional) – Strings for x- and y-axis labels.

  • limits (tuple, optional) – Axis limits (specify to force plot to extend to these limits).

  • colors (list, optional) – A list of colors where the ith color corresponds to the ith score sub-list.

  • show_legend (bool, optional) – If True, show the legend. Generally helpful when trying to distinguish relationships between blocs within congressional districts, but can be cumbersome when there are many districts (e.g., 20+). Defaults to True.

Returns:

Axes object on which the scatterplot is plotted.

gerrytools.plotting.sealevel(ax, scores, num_districts, proposed_info, ticksize=12) Axes[source]

Plot a sea level plot: Each plan is a line across our elections on the x-axis, with Democratic vote share on the y-axis. The statewide Dem. vote share (proportionality) is plotted as a thick blue line.

Parameters:
  • ax (Axes) – Axes object on which the sea level plot is plotted.

  • scores (dict) – Dictionary with keys of each plan plus a statewide key for proportionality. Each value is another dictionary, with keys for each election, values are the # seats.

  • proposed_info (dict, optional) – Dictionary with keys of colors, names; the (i)th color in color corresponds to the (i)th name in names.

  • ticksize (float, optional) – Font size for tick labels.

gerrytools.plotting.violin(ax, scores, xticklabels=None, labels=None, proposed_info={}, percentiles=(1, 99), rotation=0, ticksize=12, jitter=0.3333333333333333) Axes[source]

Plot a violin plot, which takes scores — a dictionary where each value (corresponding to an ensemble, citizens’ ensemble, or proposed plans), will be a list of lists, where each sublist will be its own violin. Proposed scores will be plotted as colored circles on their respective violin. Color the violins conditioned on the kind of the scores (ensemble or citizen), and trim each sublist to only the values between the percentiles.

Parameters:
  • ax (Axes) – Axes object on which the violins are plotted.

  • scores (dict) – Dictionary with keys of ensemble, citizen, proposed which map to lists of numerical scores.

  • proposed_info (dict, optional) – Dictionary with keys of colors, names; the (i)th color in color corresponds to the (i)th name in names.

  • percentiles (tuple, optional) – Observations outside this range of percentiles are ignored. Defaults to (1, 99), such that observations between the 1st and 99th percentiles (inclusive) are included, and all others are ignored.

  • rotation (float, optional) – Tick labels are rotated rotation degrees _counterclockwise_.

  • ticksize (float, optional) – Font size for tick labels.

  • jitter (float, optional) – When there is more than one proposed plan, adjust its detail points by a value drawn from (mathcal U (-epsilon, epsilon)) where (epsilon = ) jitter.

  • labels (list, optional) – x- and y-axis labels, if desired.

  • xticklabels (list, optional) – Labels for the violins, default to integers.

Returns:

Axes object on which the violins are plotted.

Scoring

Basic functionality for evaluating districting plans.

gerrytools.scoring.aggregate_seats(election_cols: Iterable[str], party: str) Score[source]

Score representing how many total seats (districts) within a given plan the POV party won across elections.

Parameters:
  • election_cols (Iterable[str]) – The names of the election updaters over which to compute results for.

  • party (str) – The “point of view” political party.

Returns:

A score object with name “aggregate_{party}_seats” and associated function that takes a partition and returns an PlanWideScoreValue for the total number of seats won by the POV party across elections.

gerrytools.scoring.competitive_contests(election_cols: Iterable[str], party: str, points_within: float = 0.03, alias: str = None) Score[source]

Score representing the number of competitive contests in a plan.

Parameters:
  • election_cols (Iterable[str]) – The names of the election updaters over which to compute results for.

  • party (str) – The “point of view” political party.

  • points_within (float, optional) – The margin from 0.5 that is considered competitive. Default is 0.03, corresponding to a competitive range of 47%-53%.

Returns:

A score object with name “competitive_contests_0.03” and associated function that takes a partition and returns a PlanWideScoreValue for the number of competitive districts.

gerrytools.scoring.contiguous(P: Partition) bool[source]

Determines whether the districting plan defined by the partition is contiguous.

Parameters:

P (Partition) – GerryChain Partition object.

Returns:

Whether the districting plan defined by the partition is contiguous.

gerrytools.scoring.convex_hull() Score[source]

Returns the convex-hull score for each district in a plan.

Returns:

A dictionary with districts as keys and convex-hull scores as values.

gerrytools.scoring.cut_edges() Score[source]

Returns the number of cut edges in a plan.

gerrytools.scoring.demographic_shares(population_cols: Mapping[str, Iterable[str]]) List[Score][source]

A list of scores representing subgroup population shares.

Parameters:

population_cols (Mapping[str, Iterable[str]]) – A mapping encoding the total population group divisor as well as the subgroups to create shares for. The mapping has the format: { (P) : [ (P_1), (P_2), …, (P_k)], …} where (P) is the population and ( P_i subseteq P ) forall subgroups (P_i).

Returns:

A list of score objects named with the pattern “{column}_share” and with associated functions that take a partition and return a DistrictWideScoreValue for the demographic share of each district.

gerrytools.scoring.demographic_tallies(population_cols: Iterable[str]) List[Score][source]

A list of scores representing population tallies.

Parameters:

population_cols (Iterable[str]) – The population column to create tallies for.

Returns:

A list of score objects named by “{column}” and with associated functions that take a partition and return a DistrictWideScoreValue for the demographic totals of each district.

gerrytools.scoring.demographic_updaters(demographic_keys: Iterable[str], aliases: Iterable[str] = None)[source]
gerrytools.scoring.deviations(P, popcolumn) dict[source]

Determines the districting plan’s population deviation percentages.

Parameters:
  • P (Partition) – GerryChain Partition object.

  • popcolumn (str) – Column for tallying the desired population.

Returns:

A dictionary which maps district names to population deviation percentages.

gerrytools.scoring.efficiency_gap(election_cols: Iterable[str], mean: bool = False) Score[source]

Score representing the efficiency gap metric of a plan with respect to a set of elections.

Parameters:
  • election_cols (Iterable[str]) – The names of the election updaters over which to compute results for.

  • mean (bool) – Whether to return the mean of the score over all elections, or a dictionary of the score for each election.

Returns:

A score object with name “efficiency_gap” and associated function that takes a partition and returns a PlanWideScoreValue for efficiency gap metric.

gerrytools.scoring.eguia(election_cols: Iterable[str], party: str, graph: Graph, updaters: Mapping[str, Callable[[Partition], float | int | Mapping[int | str, float | int] | Mapping[str, float | int]]], county_col: str, totpop_col: str = 'population', mean: bool = False) Score[source]

Score representing the Equia metric of a plan with respect to a set of elections.

Parameters:
  • election_cols (Iterable[str]) – The names of the election updaters over which to compute results for.

  • party (str) – The “point of view” political party.

  • graph (gerrychain.Graph) – The underlying dual graph of a partition. Used to generated a plan of the counties.

  • updaters (Mapping[str, Callable[[gerrychain.Partition], ScoreValue]]) – A set of updaters that contains a tally for the total population by district and the election updaters whose names are listed in election_cols.

  • county_col (str) – The column name in the dual graph that encodes the county assignment of each unit.

  • totpop_col (str, optional) – The name of the updater that computes total population by district.

  • mean (bool) – Whether to return the mean of the score over all elections, or a dictionary of the score for each election.

Returns:

A score object with name “eguia” and associated function that takes a partition and returns a PlanWideScoreValue for the eguia metric.

gerrytools.scoring.gingles_districts(population_cols: Mapping[str, Iterable[str]], threshold: float = 0.5) List[Score][source]

A list of scores representing the number of districts where a sub-population share is above a given threshold. When the threshold is 50% these are commonly called Gingles’ Districts.

Parameters:

population_cols (Mapping[str, Iterable[str]]) – A mapping encoding the total population group divisor as well as the subgroups to create gingles district counters for. The mapping has the format: { (P) : [ (P_1), (P_2), …, (P_k)], …} where (P) is the population and ( P_i subseteq P ) forall subgroups (P_i).

Returns:

A list of score objects named with the pattern “{column}_gingles_districts” and with associated functions that take a partition and return a PlanWideScoreValue for the number of districts above the population share threshold.

gerrytools.scoring.max_deviation(totpop_col: str, pct: bool = False) Score[source]

Returns the maximum deviation from ideal population size among all the districts. If pct, return the deviation as a percentage of ideal population size.

Parameters:
  • totpop_col (str, optional) – The name of the updater that computes total population by district.

  • pct (bool) – Whether to return the maximum deviation as a count or as a percentage of ideal district size.

gerrytools.scoring.mean_median(election_cols: Iterable[str], mean: bool = False) Score[source]

Score representing the mean median metric of a plan with respect to a set of elections.

Parameters:
  • election_cols (Iterable[str]) – The names of the election updaters over which to compute results for.

  • mean (bool) – Whether to return the mean of the score over all elections, or a dictionary of the score for each election.

Returns:

A score object with name “mean_median” and associated function that takes a partition and returns a PlanWideScoreValue for the mean median metric.

gerrytools.scoring.opp_party_districts(election_cols: Iterable[str], party: str) Score[source]

Score representing the number of districts in a plan that are always won by the opposition party over a set of elections. Note that this assumes that all elections are two-party races. In the case where elections have more than two parties running this score represents the number of districts that are never won by the POV party.

Parameters:
  • election_cols (Iterable[str]) – The names of the election updaters over which to compute results for.

  • party (str) – The “point of view” political party.

Returns:

A score object with name “opp_party_districts” and associated function that takes a partition and returns a PlanWideScoreValue for the number of safe opposition party districts.

gerrytools.scoring.partisan_bias(election_cols: Iterable[str], mean: bool = False) Score[source]

Score representing the partitisan bias metric of a plan with respect to a set of elections.

Parameters:
  • election_cols (Iterable[str]) – The names of the election updaters over which to compute results for.

  • mean (bool) – Whether to return the mean of the score over all elections, or a dictionary of the score for each election.

Returns:

A score object with name “partisan_bias” and associated function that takes a partition and returns a PlanWideScoreValue for partisan bias metric.

gerrytools.scoring.partisan_gini(election_cols: Iterable[str], mean: bool = False) Score[source]

Score representing the partisan gini metric of a plan with respect to a set of elections.

Parameters:
  • election_cols (Iterable[str]) – The names of the election updaters to compute results for.

  • party (str) – The “point of view” political party.

  • mean (bool) – Whether to return the mean of the score over all elections, or a dictionary of the score for each election.

Returns:

A score object with name “partisan_gini” and associated function that takes a partition and returns a PlanWideScoreValue for the partisan gini metric.

gerrytools.scoring.party_districts(election_cols: Iterable[str], party: str) Score[source]

Score representing the number of districts in a plan that are always won by the POV party over a set of elections.

Parameters:
  • election_cols (Iterable[str]) – The names of the election updaters over which to compute results for.

  • party (str) – The “point of view” political party.

Returns:

A score object with name “party_districts” and associated function that takes a partition and returns a PlanWideScoreValue for the number of safe POV party districts.

gerrytools.scoring.party_wins_by_district(election_cols: Iterable[str], party: str) Score[source]

Score representing how many elections the POV party won in each district in a given plan.

Parameters:
  • election_cols (Iterable[str]) – The names of the election updaters over which to compute results for.

  • party (str) – The “point of view” political party.

Returns:

A score object with name “party_wins_by_district” and associated function that takes a partition and returns a DistrictWideScoreValue for the number of elections won by the POV party in each district.

gerrytools.scoring.pieces(unit: str, names: bool = False, popcol: str = None, how: str = 'pandas', alias: str = None) Score[source]

Score representing the number of “unit pieces” produced by the plan. For example, consider a state with 100 counties. Suppose that one county is split twice, and another once. Then, there are 3 + 2 = 5 “pieces,” disregarding the counties kept whole.

Bear in mind that this calculates the number of unit splits, not the number of units split: for example, if a district divides a county into three pieces, the former reports two splits (as a unit divided into three pieces is cut twice), while the latter would report one split (as there is one county being split).

Parameters:
  • unit (str) – Data column; each assigns a vertex to a unit. Generally, these units are counties, VTDs, precincts, etc.

  • popcol (str, optional) – The population column on the Partition’s dual graph. If this is passed, then a unit is only considered “split” if the _populated_ base units end up in different districts.

  • how (str, optional) – How do we perform these calculations on the back end? Acceptable values are “pandas” and “gerrychain”; defaults to “pandas”.

  • names (bool, optional) – Whether we return the identifiers of the things being split.

Returns:

A score object with the name “{alias}_pieces” and associated function that takes a partition and returns a PlanWideScoreValue for the number of pieces.

gerrytools.scoring.polsby_popper() Score[source]

Returns the polsby-popper score for each district in a plan.

Returns:

A dictionary with districts as keys and polsby-popper scores as values.

gerrytools.scoring.pop_polygon(block_gdf: GeoDataFrame, pop_col: str = 'TOTPOP20') Score[source]

Returns the population polygon compactness metric for each district in a plan. :param block_gdf: Block level shapefile for the state. :type block_gdf: GeoDataFrame :param pop_col: Population column reflected in block_gdf and gdf. :type pop_col: str

Returns:

A dictionary with districts as keys and population polygon scores as values.

gerrytools.scoring.reock() Score[source]

Returns the reock score for each district in a plan. :returns: A dictionary with districts as keys and reock scores as values.

gerrytools.scoring.responsive_proportionality(election_cols: Iterable[str], party: str) Score[source]

Score representing how many the responsive proportionality across a set of elections.

Parameters:
  • election_cols (Iterable[str]) – The names of the election updaters over which to compute results for.

  • party (str) – The “point of view” political party.

Returns:

A score object with name “responsive_proportionality” and associated function that takes a partition and returns an PlanWideScoreValue for the responsive proportionality across the elections.

gerrytools.scoring.schwartzberg() Score[source]

Returns the schwartzberg score for each district in a plan. :returns: A dictionary with districts as keys and schwartzberg scores as values.

gerrytools.scoring.seats(election_cols: Iterable[str], party: str, mean: bool = False) Score[source]

Score representing how many seats (districts) within a given plan the POV party won in each election

Parameters:
  • election_cols (Iterable[str]) – The names of the election updaters over which to compute results for.

  • party (str) – The “point of view” political party.

  • mean (bool) – Whether to return the mean of the score over all elections, or a dictionary of the score for each election.

Returns:

A score object with name “{party}_seats” and associated function that takes a partition and returns an ElectionWideScoreValue for the number of seats won by the POV party for each election.

gerrytools.scoring.simplified_efficiency_gap(election_cols: Iterable[str], party: str, mean: bool = False) Score[source]

Score representing the simplified efficiency gap metric of a plan with respect to a set of elections. The original formulation of efficiency gap quantifies the difference in “wasted” votes for the two parties across the state, as a share of votes cast. This is sensitive to turnout effects. The simplified score is equal to standard efficiency gap when the districts have equal turnout.

Parameters:
  • election_cols (Iterable[str]) – The names of the election updaters over which to compute results for.

  • party (str) – The “point of view” political party.

  • mean (bool) – Whether to return the mean of the score over all elections, or a dictionary of the score for each election.

Returns:

A score object with name “efficiency_gap” and associated function that takes a partition and returns a PlanWideScoreValue for efficiency gap metric.

gerrytools.scoring.splits(unit: str, names: bool = False, popcol: str = None, how: str = 'pandas', alias: str = None) Score[source]

Score representing the number of units split by the districting plan.

Bear in mind that this calculates the number of unit splits, not the number of units split: for example, if a district divides a county into three pieces, the former reports two splits (as a unit divided into three pieces is cut twice), while the latter would report one split (as there is one county being split).

Parameters:
  • unit (str) – Data column; each assigns a vertex to a unit. Generally, these units are counties, VTDs, precincts, etc.

  • popcol (str, optional) – The population column on the Partition’s dual graph. If this is passed, then a unit is only considered “split” if the _populated_ base units end up in different districts.

  • how (str, optional) – How do we perform these calculations on the back end? Acceptable values are “pandas” and “gerrychain”; defaults to “pandas”.

  • names (bool, optional) – Whether we return the identifiers of the things being split.

Returns:

A score object with the name “{alias}_splits” and associated function that takes a partition and returns a PlanWideScoreValue for the number of splits.

gerrytools.scoring.stable_proportionality(election_cols: Iterable[str], party: str) Score[source]

Score representing how many the stable proportionality across a set of elections.

Parameters:
  • election_cols (Iterable[str]) – The names of the election updaters over which to compute results for.

  • party (str) – The “point of view” political party.

Returns:

A score object with name “stable_proportionality” and associated function that takes a partition and returns an PlanWideScoreValue for the stable proportionality across the elections.

gerrytools.scoring.summarize(part: Partition, scores: Iterable[Score], gdf: GeoDataFrame | None = None, join_on: str | None = None) Dict[str, float | int | Mapping[int | str, float | int] | Mapping[str, float | int]][source]

Summarize the given partition by the passed scores.

Parameters:
  • part (Partition) – The plan to summarize.

  • scores (Iterable[Score]) – Which scores to include in the summary.

  • gdf (GeoDataFrame) – Geometries of nodes in the dual graph used by part. Only necessary when using scoring functions that rely on dissolved district geometries (most geometric scoring functions).

  • join_on (str) – Field used to join part.graph to gdf. If not specified, geometries are joined by matching the index of gdf to the node keys of part.graph.

Raises:

ValueError – If gdf is not specified and at least one score in scores is dissolved.

Returns:

A dictionary that maps score names to the corresponding ScoreValues of the score functions applied to the plan.

ie. {“cut_edges”: 4050, “num_party_seats”: 3, … }

gerrytools.scoring.summarize_many(parts: Iterable[Partition], scores: Iterable[Score], gdf: GeoDataFrame | None = None, join_on: str | None = None, plan_names: List[str] = None, output_file: str = None, compress: bool = False, verbose: bool = False) List[Dict[str, float | int | Mapping[int | str, float | int] | Mapping[str, float | int]]] | None[source]

Summarize the given partitions by the passed scores.

Parameters:
  • parts (Iterable[Partition]) – The plans to summarize.

  • scores (Iterable[Score]) – Which scores to include in the summaries.

  • gdf (GeoDataFrame) – Geometries of nodes in the dual graph used by each partition in parts. Only necessary when using scoring functions that rely on dissolved district geometries (most geometric scoring functions).

  • join_on (str) – Field used to join the graph associated with the partitions in parts to gdf. If not specified, geometries are joined by matching the index of gdf to the node keys of the graph.

  • plan_names (Iterable[str], optional) – Plan identifiers, corresponding to plan by index. If no plan name exists for a given plan’s index, the plan’s index is used as the identifier. Default is None, plans identified by index.

  • output_file (str, optional) – Name of file to save the results jsonl encoding of the scores. If None, returns a list of the dictionary summary of each plan. Defaults to None.

  • compress (bool, optional) – Whether to compress the output file with gzip. Default is False.

Returns:

A list dictionaries that maps score names to the corresponding ScoreValues of the score functions applied to each plan, if NO output file is passed. If an output file IS specified, the plan summaries are written to file and the function is void.

gerrytools.scoring.swing_districts(election_cols: Iterable[str], party: str) Score[source]

Score representing the number of swing districts in a plan. A swing districts is one that is not solely won by a single party over a set of elections.

Parameters:
  • election_cols (Iterable[str]) – The names of the election updaters over which to compute results for.

  • party (str) – The “point of view” political party.

Returns:

A score object with name “swing_districts” and associated function that takes a partition and returns a PlanWideScoreValue for the number of swing districts.

gerrytools.scoring.unassigned_population(P, popcolumn)[source]

Determines the number of unassigned people in the districting plan.

Parameters:
  • PPartition object.

  • popcolumn – Column for tallying the desired population.

Returns:

Returns a

gerrytools.scoring.unassigned_units(P: Partition, raw: bool = False) float | int[source]

Determines the proportion (or raw number) of units without a district assignment. An unassigned unit is a unit without a districting assignment an empty/corrupted assignment.

Parameters:
  • P (Partition) – GerryChain Partition object.

  • raw (bool, optional) – If True, report the raw number of unassigned units. Defaults to False.

Returns:

float representing the proportion of units that are unassigned (or the whole number of unassigned units).

Utilities

Oft-used utilities for working with geometric data.

class gerrytools.utilities.JSONtoObject(*, column: str, locator: str, title: Any = None, type: str = None)[source]

Bases: BaseModel

Plan specification models. To better work with multiple plans at once, this plan specification allows users to specify information which should remain consistent across all operations; for example, the column field should be the name of the column in which the corresponding plan’s assignment is stored across all data products.

column: str

Column on all data products which contains district assignment information for this districting plan.

locator: str

File/directory name for all resources which contain information about this districting plan.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

title: Any

Official title of the plan.

type: str

The “type” of plan; could denote a party affiliation, a chamber, whatever.

gerrytools.utilities.jsonify(location) list[source]

Reads in JSON data and creates a Python object out of it. If the JSON data read in is a list of JSON objects, a list of Python objects are returned.

Parameters:

location (string) – Filepath.

Returns:

A list of pydantic dot-notation-accesible objects, which should contain information about districting plans.

gerrytools.utilities.rename(old, new)[source]

Renames all files in the path specified by old to new; intended for use with directories containing shapefiles. For example, if a directory called blocks20/ contains no shapefile called blocks20.shp, this is (a) bad practice and (b) prevents GeoPandas from reading the shapefile from a partial filepath (e.g. gpd.read_file(“blocks20/”)).

Example

Basic usage.

from gerrytools.utils import rename

old = “./data/geometries/to-be-renamed” new = “blocks20” rename(old, new)

Parameters:
  • old (str) – _Directory_ where files to be renamed are located.

  • new (str) – New name to be applied to the directory and all files in it.