More geeky information on the NVRA data

Watch your denominator and watch your unit of analysis!

There are two ongoing points of confusion with the EAC data. First, the unit of analysis is the reporting jurisdiction (most commonly, county). This means that many statistics are based on a mean of the means, which does not weight information by the size of the county. The result is that the proportion of forms processed in LA County, 110005/1468772 = .0748 is given the same weight in the statewide mean as is their sister county of Madera (313/8157 = .0384).

Second, states do not report all data for all jurisdictions. This is reasonable, but when calculating the percentage of forms received by the DMV out of all forms, wouldn’t you only consider the jurisdictions for which you have complete data?

The result is that base statistics in the tables are off, and it’s impossible to calculate by how much because you don’t know the correct denominator OR the correct weight. For instance, there are 65 reporting jurisdictions (counties) in Alabama, but only 63 reported information for QA5_Total AND only 61 reported on the percentage of forms that were processed via the DMV (QA6d).

The most accurate estimate of the proportion of registration forms processed by the DMV has to be based on the most complete sample of counties–61–and on the total number of forms reported in each county.

A simple example should suffice. Assume 1500 registration forms processed in four counties. County 1 processed 250 / 500 at the DMV, County 2 processed 100/100 at the DMV, County 3 did not report, and County 4 processed 200/400 at the DMV.

The EAC would report 50% + 100% + 50% / 4 = 50%.

The unweighted means of the means is 50% + 100% + 50% / 3 = 66%.

The weighted mean is 250 + 100 + 200 / 55%.

The figure I presented earlier treated counties as the unit of analysis, and mean values were calculated did not weight by the number of registration forms.  I am hoping my friend from MIT can help produce an alternative graphic, which should display observations by their underlying case count, thus visually “weighing” larger jurisdictions.