EAC "Election Day Survey" underscores need for better data retention, collection and distribution practices

The just-released and complete EAC “Election Day Survey” report has an enormous amount of information in it, and will certainly keep election researchers busy for a long time (especially if and when the EAC releases the jurisdiction-level data).

But before anyone rushes head-first into number crunching mode, much caution is warrented. In particular, users of the report and the tabulated data must read the material on survey coverage, contained in pages 6-8 of the full report. The coverage rate varies dramatically across the survey questions in the “Election Day Survey”, for example, 98.8% of jurisdictions appear to have provided responses to item 2a, “Ballots counted.” But only 67.4% provided information for item 10a, “Presidential undervotes”, and only 18.9% for item 11a, “Presidential overvotes” (both of the latter are likely to be heavily used data by researchers and policymakers).

However, what we really don’t know here in the tabulated data (at least I’ve not been able to find it) is a sense for which jurisdictions are not reporting data, either across the survey instrument or for specific questions. I hope that once the jurisdiction-level data is available, this can be studied in more detail, as one wonders if there isn’t some form of selection bias going on in the data; for example, are jurisdictions that tend to perform poorly on some of these dimensions either not reporting data or are they under-resourced and thus unable to retain and provide the data? If so, such selection bias can distort, or even bias, analyses of the “Election Day Survey” data.

One other issue, again that will have to await the distribution of the jurisdiction-level data from this survey, is whether there are statistical procedures that can be used to either “plug the gaps” in the survey data, or to study their possible influence on survey results. The former strategy might be quite promising, as there has been a great deal of progress in recent years regarding how to deal with missing data, using some quite easy and effective new software tools (see, for example, Gary King’s material on his missing data estimation and analysis tool, Amelia). Hopefully, once the jurisdiction-level data is released, either the EAC or some other entity might go ahead and provide datasets that have the missing data “imputed”, in a manner like the U.S. Census Bureau often releases data; this will certainly help end-users of the EAC “Election Day Survey” data.

Finally, the EAC should be congratulated for taking on this task; many ( including the Caltech/MIT Voting Technology Project) have called upon the EAC to work to collect just this type of data.

In conclusion, after a very, very quick read of the 266 page report:

  1. Thanks to the EAC for undertaking this project, and we call upon them to continue this effort in future federal election cycles.
  2. The jurisdiction-level data should be released for further research and analysis, hopefully with some work done to better understand the impact that survey nonresponse has in this sruvey, and to patch the missing data in the existing dataset.
  3. Users of the data in the current form should be very, very careful of the coverage problems, and should be on notice that their inferences from this data on survey questions with low coverage rates might be distorted or incorrect due to the nonresponse problem.

Time to roll up the sleeves and get to work!

11:43AM update: note — the jurisdiction-level data is available, please see subsequent posting for details.