Leading zero problems in the NVRA Dataset

In the EAC’s NVRA 2010 dataset, the FIPS code is stored as a numeric value, and this will cause problems with anyone who is trying to match / merge this with other data because 10 state FIPS codes start with a “0”.  Most statistical programs will strip that leading zero, making it hard to identify the states properly.

For instance, lines 2-4 of the Excel file look like this:

AK ALASKA 200000000 560146 only active voters
AL AUTAUGA COUNTY 0100100000 34727 active and inactive registered voters
AL BALDWIN COUNTY 0100300000 114952 active and inactive registered voters

The third column contains the FIPS code; columns 1-2 contain the state code, columns 3-5 the county code, an additional columns refer to smaller jurisdictions (mainly townships in NE and the Midwest).

The first problem you’ll notice is that Alaska’s FIPS is already artificially shortened to “2” not “02” an users will need to fix this manually.  Next, additional FIPS codes will come across a “10010000” (for example) without the leading zero.

The solution for now is a kludge; insert a dummy line into the Excel file and include a string value such as an “x” in column 3.  This will trick Excel into outputting the values as text, and statistical programs will input the data as character strings.  Then the process is straightforward; capture substrings (01, 02, etc) and match / merge away!