Partner Data Validation
In order to ensure that the Election Results and Precinct Boundary (ERPB) files that the RDH hosts are accurate and reproducible, our Data Team started the Partner Data Validation (PDV) project. In past redistricting cycles, every organization working on redistricting has done this work in their own state, often duplicating one another’s work. By tackling the project across the U.S., the RDH is saving these organizations time and effort.
Election Results and Precinct Boundary files are integral to drawing legally compliant maps. However, collecting and processing these files is a complex project. For more detail about how the election results files and precinct boundary files are sourced and merged, and why the merged files are so important, see Election Results and Precinct Boundaries.
The Redistricting Data Hub works with a number of data partners who collect election results from state agencies, usually the Secretary of State, and precinct boundary shapefiles from local election officials, the Census Bureau (through its Voting District Project), or the Secretary of State. Our data partners merge these files so that the election results from a given precinct are tied to the geography of that precinct. This allows an individual to use the data sets while drawing a map. The RDH takes these merged files from our data partners and begins the PDV process.
The PDV process has two key parts:
- Ensuring that our partners edited and merged the precinct boundary files and election results files error free
- Ensuring that the documentation explaining how those files were edited and merged allows for exact reproduction, including saving the source files.
Following the documentation written by our data partners, the RDH Data Team takes the original election results files and precinct boundary shapefile and attempts to replicate the entire merging process from start to finish. At each step, the Data Team compares its results to the results of our data partners.
The first check is whether the candidate vote totals in the state’s election results file are equal to the vote totals in our data partner’s file. The Data Team checks these vote totals at the statewide, county and precinct levels. Oftentimes a state does not allocate its absentee or early votes to the precinct where the voter lives, and instead reports these votes separately. If this occurs, our data partners reallocate these votes to the correct precinct. The RDH Data Team independently performs this reallocation, and then confirms that after the reallocation, all the precincts have candidate vote totals equalling those reported by our data partners. If our Data Team’s results match those reported by our data partners, the Data Team moves to merge the precinct boundaries with precinct-level election results.
This part of the process is more involved. Usually, local election officials, rather than state level agencies, keep track of precinct boundaries in their county. These officials often create unique names for the precincts within their county, such as “01 – Boston” and “02 – Boston.” However, the larger state agency that holds election results may title those same precincts differently, such as “1 – Boston, 2016” and “2 – Boston, 2016,” when storing the election results. Because of this variation, the election results for each precinct cannot be automatically merged with the precinct shapefiles.
Additionally, sometimes the number of precinct boundaries exceeds the number of precincts in the election results file because a precinct boundary was split. If this occurs, our data partners will recombine the split precinct if it can be determined which one was split, and if not, our data partners contact the municipality directly to understand what occurred.
Our data partners comb through the precinct shapefiles and election results files to give each precinct a unique identifier. This ensures that the votes from a given precinct are applied to the correct corresponding precinct in the shapefile. The RDH’s Data Team then goes through the exact same process of giving each precinct a unique identifier, ensuring that no errors occurred during this process. After every precinct is given a unique identifier, the election results for each precinct are merged with that precinct’s shapefile. Again, the Data Team checks to see that their results match those of the data partner. One final check is performed against a third party source, such as Ballotpedia. The Data Team compares candidate vote totals at the state level to these third party groups to make sure that while processing the ERPB file the team did not duplicate an error performed by our data partner. Once this last check is performed, the final Election Results and Precinct Boundary file is hosted on the RDH website.
While going through the entire validation process, the Data Team creates a Validation Report, such as the example for Georgia. This report documents the PDV process for that specific Election Results and Precinct Boundary file, and includes information about how to access the raw data used to create the ERPB file, processing steps to create the ERPB file, and additional information relevant to the validation process. Indicating the raw files the Data Team began with and documenting any changes that were performed allows our data users to identify errors and trace them back to where they occurred. Additionally, by outlining the methodologies for creating the ERPB files in the Validation Report, the Data Team gives users the ability to review the methodologies employed and determine whether they would like to use our data.
A visual summary of this validation report can be found just below the Validation Report download button on a specific ERPB’s page. The visual summary proves Yes, No or N/A answers to 6 criteria of the validation:
- Raw Data available?
- Processing steps available?
- Able to replicate joining election data and shapefiles?
- Able to replicate by joining demographic data?
- Able to replicate by joining boundary data?
- Successfully ran validation?
For an expanded explanation of how each of these six criteria are answered, please see the Checkbox Criteria Explainer.
The scripts used to create a specific Election Results and Precinct Boundary file can be found on the RDH’s Github page.
Do you have more questions?
Our help desk team can answer your questions about redistricting data and the redistricting process. Send a message and they will respond within one buisness day!