==== Data ==== The ``data`` module of ``gerrytools`` is designed to handle the loading and the processing of data. In partucular, it provides methods for loading and managing data from the census. Below is a basic tutorial on what we veiw to be a common workflow for someone using this module. .. note:: Sometimes, when calling functions that work with the `us `_ module (and ``data`` does do this), you may see the following error: .. code-block:: console ValueError: Unexpected response (URL: ...): Sorry, the system is currently undergoing maintenance or is busy. Please try again later. This is due to a Census API issue and cannot be fixed from the python side of things. However, re-running the code generally fixes the issue. For this tutorial, we will assume that the reader is working through with a Jupyter notebook. All of the required packages needed to run this tutorial can be found in the `tutorial\_requirements.txt <../_static/tutorial_requirements.txt>`_ file. Census ------ There are two different methods that ``data`` provides for loading census data: :meth:`~gerrytools.data.census20` and :meth:`~gerrytools.data.census10` . As the names would suggest, the former is liked to the US census data collected in 2020, and the latter is linked to the US census data collected in 2010. There is significant difference between the two methods, so please be sure to refer to the documentation. For the purposes of this tutorial, we will be using the 2020 census data. The first thing to do is to load all of the necessary packages: .. code-block:: python from gerrytools.data import * import geopandas as gpd import pandas as pd import us And now we would like to load the census data for the state of Massachusetts. When we go to load this data, we should be aware that there are 5 different tables that are available on the Census Bureau's API for retrieving the 2020 Decennial Census PL 94-171 at the stated level of geography. These tables are: - **P1**: Race - **P2**: Hispanic or Latino, and Not Hispanic or Latino by Rac - **P3**: Race for the Population 18 Years and Over (Race by VAP) - **P4**: Hispanic or Latino, and Not Hispanic or Latiny by Race for the Population 18 Years and Over - **P5**: Group Quarters Population by Group Quarters Type .. code-block:: python df = census20( us.states.MA, table="P3", columns={}, geometry="tract", ) df[["GEOID20", "VAP20", "WHITEVAP20", "BLACKVAP20", "ASIANVAP20", "OTHVAP20"]].head() In jupyter, this will display the following table: +---+------------+------+-----------+-----------+-----------+---------+ | | GEOID20 | VAP20| WHITEVAP20| BLACKVAP20| ASIANVAP20| OTHVAP20| +===+============+======+===========+===========+===========+=========+ | 0 | 25001012601| 2657 | 1868 | 153 | 122 | 172 | +---+------------+------+-----------+-----------+-----------+---------+ | 1 | 25001012602| 4564 | 2444 | 517 | 147 | 547 | +---+------------+------+-----------+-----------+-----------+---------+ | 2 | 25001012700| 4059 | 3445 | 119 | 49 | 144 | +---+------------+------+-----------+-----------+-----------+---------+ | 3 | 25001012800| 3464 | 2971 | 86 | 42 | 84 | +---+------------+------+-----------+-----------+-----------+---------+ | 4 | 25001012900| 3568 | 3011 | 101 | 47 | 103 | +---+------------+------+-----------+-----------+-----------+---------+ Of course, anyone that is familiar with the way that the census data is organized would realize that the column names here not the same as the ones that the Census Bureau uses. This is because the ``census20`` method has it's own mapping of the Census Bureau's column names to the ones that are a bit easier to understand. If you would like to see the mapping, you use the :meth:`~gerrytools.data.variables` method; so, for the "P3" table, we would call: .. code-block:: python variables("P3") Which outputs the following: .. code-block:: console {'P3_001N': 'VAP20', 'P3_003N': 'WHITEVAP20', 'P3_004N': 'BLACKVAP20', 'P3_005N': 'AMINVAP20', 'P3_006N': 'ASIANVAP20', 'P3_007N': 'NHPIVAP20', 'P3_008N': 'OTHVAP20', 'P3_011N': 'WHITEBLACKVAP20', 'P3_012N': 'WHITEAMINVAP20', 'P3_013N': 'WHITEASIANVAP20', 'P3_014N': 'WHITENHPIVAP20', 'P3_015N': 'WHITEOTHVAP20', 'P3_016N': 'BLACKAMINVAP20', 'P3_017N': 'BLACKASIANVAP20', 'P3_018N': 'BLACKNHPIVAP20', 'P3_019N': 'BLACKOTHVAP20', 'P3_020N': 'AMINASIANVAP20', 'P3_021N': 'AMINNHPIVAP20', 'P3_022N': 'AMINOTHVAP20', 'P3_023N': 'ASIANNHPIVAP20', 'P3_024N': 'ASIANOTHVAP20', 'P3_025N': 'NHPIOTHVAP20', 'P3_027N': 'WHITEBLACKAMINVAP20', 'P3_028N': 'WHITEBLACKASIANVAP20', 'P3_029N': 'WHITEBLACKNHPIVAP20', 'P3_030N': 'WHITEBLACKOTHVAP20', 'P3_031N': 'WHITEAMINASIANVAP20', 'P3_032N': 'WHITEAMINNHPIVAP20', 'P3_033N': 'WHITEAMINOTHVAP20', 'P3_034N': 'WHITEASIANNHPIVAP20', 'P3_035N': 'WHITEASIANOTHVAP20', 'P3_036N': 'WHITENHPIOTHVAP20', 'P3_037N': 'BLACKAMINASIANVAP20', 'P3_038N': 'BLACKAMINNHPIVAP20', 'P3_039N': 'BLACKAMINOTHVAP20', 'P3_040N': 'BLACKASIANNHPIVAP20', 'P3_041N': 'BLACKASIANOTHVAP20', 'P3_042N': 'BLACKNHPIOTHVAP20', 'P3_043N': 'AMINASIANNHPIVAP20', 'P3_044N': 'AMINASIANOTHVAP20', 'P3_045N': 'AMINNHPIOTHVAP20', 'P3_046N': 'ASIANNHPIOTHVAP20', 'P3_048N': 'WHITEBLACKAMINASIANVAP20', 'P3_049N': 'WHITEBLACKAMINNHPIVAP20', 'P3_050N': 'WHITEBLACKAMINOTHVAP20', 'P3_051N': 'WHITEBLACKASIANNHPIVAP20', 'P3_052N': 'WHITEBLACKASIANOTHVAP20', 'P3_053N': 'WHITEBLACKNHPIOTHVAP20', 'P3_054N': 'WHITEAMINASIANNHPIVAP20', 'P3_055N': 'WHITEAMINASIANOTHVAP20', 'P3_056N': 'WHITEAMINNHPIOTHVAP20', 'P3_057N': 'WHITEASIANNHPIOTHVAP20', 'P3_058N': 'BLACKAMINASIANNHPIVAP20', 'P3_059N': 'BLACKAMINASIANOTHVAP20', 'P3_060N': 'BLACKAMINNHPIOTHVAP20', 'P3_061N': 'BLACKASIANNHPIOTHVAP20', 'P3_062N': 'AMINASIANNHPIOTHVAP20', 'P3_064N': 'WHITEBLACKAMINASIANNHPIVAP20', 'P3_065N': 'WHITEBLACKAMINASIANOTHVAP20', 'P3_066N': 'WHITEBLACKAMINNHPIOTHVAP20', 'P3_067N': 'WHITEBLACKASIANNHPIOTHVAP20', 'P3_068N': 'WHITEAMINASIANNHPIOTHVAP20', 'P3_069N': 'BLACKAMINASIANNHPIOTHVAP20', 'P3_071N': 'WHITEBLACKAMINASIANNHPIOTHVAP20'} For more information on the variables that are available in each of these tables, please refer to the `census website `_ . ACS5 ---- This is a method that is used to load the 5-year American Community Survey data that that he Census Bureau uses for the 5-year population estimates of the United States. .. warning:: The ACS5 data uses geometries from the 2010 census, and not the 2020 census. .. code-block:: python acs5_df = acs5( us.states.MA, geometry="block group", # data granularity, either "tract" (default) or "block group" year=2019, ) acs5_df[["BLOCKGROUP10", "TOTPOP19", "WHITE19", "BLACK19", "ASIAN19", "OTH19"]].head() This will print the following table: +---+---------------+---------+--------+--------+--------+------+ | i | BLOCKGROUP10 | TOTPOP19| WHITE19| BLACK19| ASIAN19| OTH19| +===+===============+=========+========+========+========+======+ | 0 | 250173173012 | 571 | 340 | 15 | 137 | 0 | +---+---------------+---------+--------+--------+--------+------+ | 1 | 250173531012 | 1270 | 660 | 311 | 93 | 0 | +---+---------------+---------+--------+--------+--------+------+ | 2 | 250173222002 | 2605 | 2315 | 61 | 96 | 21 | +---+---------------+---------+--------+--------+--------+------+ | 3 | 250251101035 | 1655 | 1077 | 242 | 82 | 0 | +---+---------------+---------+--------+--------+--------+------+ | 4 | 250251101032 | 659 | 158 | 225 | 0 | 0 | +---+---------------+---------+--------+--------+--------+------+ Estimating CVAP --------------- .. raw:: html
Sometimes, we might want to estimate the citizen voting age population (CVAP) for a particular demographic group. This is especially true in the case where we are working with potentially new geometries for a particular state, as tends to happen after the Decennial census, which we would like to use to make projections based on our previous knowledge of the state demographics. In our case, we will be using the :meth:`~gerrytools.data.estimate_cvap10` method to estimate the CVAP for particular geometries in the year 2020 using information from the previous ACS. The :meth:`~gerrytools.data.estimate_cvap10` method wraps the above ``cvap()`` and ``acs5()`` functions to help users pull forward CVAP estimates from 2019 (on 2010 geometries) to estimates for 2020 (on 2020 geometries). To use this, one must supply a base geodataframe with the 2020 geometries on which they want CVAP estimates. Additionally, users must specify the demographic groups whose CVAP statistics are to be estimated. For each group, users specify a triple :math:`(X, Y, Z)` where :math:`X` is the old CVAP column for that group, :math:`Y` is the old VAP column for that group, and :math:`Z` is the new VAP column for that group, which must be an existing column on ``base``. Then, the estimated new CVAP for that group will be constructed by multiplying :math:`X / Y \cdot Z` for each new geometry. Let's start with grabbing the geometries for Alabama and looking at the ``acs5()`` and ``cvap()`` data: .. code-block:: python base = gpd.read_file("al_bg") acs5_cvap19 = acs4(us.states.AL, year=2019) cvap_cvap19 = cvap(us.states.AL, year=2019) .. admonition:: Tips for picking :math:`X`, :math:`Y`, and :math:`Z` :class: tip Your :math:`X` should be any CVAP column returned by either ``acs5()`` or ``cvap()``, so anything generated by: .. code-block:: python print([col for col in pd.concat([acs_cvap19, cvap_cvap19]) if "CVAP" in col])]) Which, in our case, would be: .. code-block:: console ['WHITECVAP19', 'BLACKCVAP19', 'AMINCVAP19', 'ASIANCVAP19', 'NHPICVAP19', 'OTHCVAP19, '2MORECVAP19', 'NHWHITECVAP19', 'HCVAP19', 'CVAP19', 'POCVAP19', 'CVAP19e', 'NHCVAP19', 'NHCVAP19e', 'NHAMINCVAP19', 'NHAMINCVAP19e', 'NHASIANCVAP19', 'NHASIANCVAP19e', 'NHBLACKCVAP19', 'NHBLACKCVAP19e', 'NHNHPICVAP19', 'NHNHPICVAP19e', 'NHWHITECVAP19e', 'NHWHITEAMINCVAP19', 'NHWHITEAMINCVAP19e', 'NHWHITEASIANCVAP19', 'NHWHITEASIANCVAP19e', 'NHWHITEBLACKCVAP19', 'NHWHITEBLACKCVAP19e', 'NHBLACKAMINCVAP19', 'NHBLACKAMINCVAP19e', 'NHOTHCVAP19', 'NHOTHCVAP19e', 'HCVAP19e', 'POCCVAP19'] Note that the ``acs5()`` method returns things like ``BCVAP19`` or ``HCVAP19`` (Black-alone CVAP and Hispanic CVAP, respectively) while the ``cvap()`` method returns things like ``NHBCVAP19`` (Non-Hispanic Black-alone CVAP). There are also columns like ``NHBCWVAP19``, which refer to all Non-Hispanic citizens of voting age who self-identified as Black and White. However, since our choice of :math:`Y` is restricted to single-race or ethnicity columns, we recommend only estimating CVAP for single-race or ethnicity columns, like ``BCVAP19``, ``HCVAP19``, or ``NHBCVAP19``). Lastly, one should choose :math:`Z` to match one's choice for :math:`Y` (say, ``BVAP20`` to match ``BVAP19``). However, in some cases it is reasonable to choose a :math:`Z` that is a close but imperfect match. For example, setting :math:`(X, Y, Z) =` ``(BCVAP19, BVAP19, APBVAP20)`` (where :math:`Z =` ``APBVAP`` refers to all people of voting age who selected Black alone or in combination with other Census-defined races) would allow one to estimate the 2020 CVAP population of people who selected Black alone or in combination with other races. One final note: there are some instances in which, due to small Census reporting discrepancies, the ``acs5()`` and the ``cvap()`` methods disagree on ``CVAP19`` estimates (this might happen for total ``CVAP19`` or ``HCVAP19``, for example). In these cases we default to the ``acs5()`` numbers. Now we may construct the estimated CVAP for 2020: .. code-block:: python estimates = estimatecvap2010( base, us.states.AL, # Group order goes (Old CVAP, Old VAP, new VAP) groups=[ ("WHITECVAP19", "WHITEVAP19", "WVAP20"), ("BLACKCVAP19", "BLACKVAP19", "BVAP20"), ], ceiling=1, zfill=0.1, geometry10="tract" ) The ``ceiling`` parameter marks when we will cap the CVAP / VAP ratio to 1. Set to 1, this means that if there is ever more ``CVAP19`` in a geometry than ``VAP19`` , we will "cap" the ``CVAP20`` estimate to 100\% of the ``VAP20`` . The ``zfill`` parameter tells us what to do when there is 0 ``CVAP19`` in a geometry. Set to 0.1, this will estimate that 10\% of the ``VAP20`` is ``CVAP``. Now we can print our results: .. code-block:: python print(f"Al BLACKCVAP20: {estimates.BLACKCVAP20_EST.sum()}") print(f"Al BLACKVAP19: {estimates.BLACKVAP19.sum()}") Which returns to us: .. code-block:: console AL BLACKCVAP20: 970120.3645540088 AL BLACKCVAP19: 970239 We can see that our estimate for Black-alone Voting Age Population in Alabama in 2020 is 970,120, down slightly from 970,239 in 2019. We can also make estimates of Black VAP in Alabama among ``APBVAP`` — Alabamians who identified as Black alone or in combination with other races. This bumps up the estimate to around 1,007,363 as we can see below: .. code-block:: python estimates = estimatecvap2010( base, us.states.AL, # Changing the new VAP column from BVAP20 -> APBVAP20 groups=[ ("BLACKCVAP19", "BLACKVAP19", "APBVAP20"), ], ceiling=1, zfill=0.1, geometry10="tract" ) print(f"AL APBCVAP20 estimate: {estimates.BLACKCVAP20_EST.sum()}") Which returns: .. code-block:: console AL APBCVAP20 estimate: 1007362.5586538106