Category Archives: voter registration

Seo-young Silvia Kim’s research on the costs of moving on turnout

Seo-young Silvia Kim, one of our PhD students at Caltech working on our Monitoring the Election project, has recently posted a really interesting working paper online, “Getting Settled in Your New Home: The Costs of Moving on Voter Turnout.” Silvia’s recently presented this paper at a couple of conferences and in research seminars at a number of universities.

What is the dynamic impact of moving on turnout? Moving depresses turnout by imposing various costs on voters. However, movers eventually settle down, and such detrimental effects can disappear over time. I measure these dynamics using United States Postal Services (USPS) data and detailed voter panel data from Orange County, California. Using a generalized additive model, I show that previously registered voters who move close to the election are significantly less likely to vote (at most -16.2 percentage points), and it takes at least six months on average for turnout to recover. This dip and recovery is not observed for within-precinct moves, suggesting that costs of moving matter only when the voter’s environment has sufficiently changed. Given this, can we accelerate the recovery of movers’ turnout? I evaluate an election administration policy that resolves their re-registration burden. This policy proactively tracks movers, updates their registration records for them, and notifies them by mailings. Using a natural experiment, I find that it is extremely effective in boosting turnout (+5.9 percentage points). This success of a simple, pre-existing, and non-partisan safety net is promising, and I conclude by discussing policy implications.

This is important and innovative work, I highly recommend her paper for readers interested in voter registration and voter turnout. She uses two different methods, one observational and the other causal, to show the reduction in the likelihood of turnout for registered voters who move.

Auditing Voter Registration Databases

As many readers know, we’ve been working on a variety of election performance auditing projects, focusing on voter registration database auditing and other types of statistical forensics (you can see a number of examples on our Monitoring The Election project page). We also have been working on post-election ballot auditing, including our recent Election Audit Summit.

Recently the first paper from our new election integrity project in Orange County (CA) was published, in the peer-reviewed journal American Politics Research. This paper, “Evaluating the Quality of Changes in Voter Registration Databases”, was co-authored by myself, Silvia Seo-young Kim (a Ph.D. student here at Caltech), and Spencer Schneider (a Caltech undergrad). Here’s the paper’s abstract:

The administration of elections depends crucially upon the quality and integrity of voter registration databases. In addition, political scientists are increasingly using these databases in their research. However, these databases are dynamic and may be subject to external manipulation and unintentional errors. In this article, using data from Orange County, California, we develop two methods for evaluating the quality of voter registration data as it changes over time: (a) generating audit data by repeated record linkage across periodic snapshots of a given database and monitoring it for sudden anomalous changes and (b) identifying duplicates via an efficient, automated duplicate detection, and tracking new duplicates and deduplication efforts over time. We show that the generated data can serve not only to evaluate voter file quality and election integrity but also as a novel source of data on election administration practices.

An ungated pre-print version of this paper is available from the Caltech/MIT Voting Technology Project’s website, as Working Paper 134.

We are continuing this work with Orange County, and have in recent months been working to explore how these same voter registration database auditing methodologies can work in larger jurisdictions (Los Angeles County) and in states (Oregon). More on those results soon.

The process that led to the development of this project, and to the publication of this paper, is also interesting to recount. In this paper, we make use of daily voter registration “snapshots”, that we obtained from the Orange County Registrar of Voters, starting back in April 2018. This required that we collaborate closely with Neal Kelley, the Orange County Registrar of Voters, and his staff. We are very happy to participate in this collaborative effort, and thank Neal and his team for their willingness to work with us. It’s been a very productive partnership, and we are very excited to continue our collaboration with them going in the 2020 election cycle. This is the sort of academic-election official partnership that we have worked to build and foster at the Caltech/MIT Voting Technology Project since our project’s founding in the immediate aftermath of the 2000 presidential election.

It’s also fun to note that both of my coauthors are Caltech students. Silvia is in her final year in our Social Science Ph.D. program, and she is working on related work for her dissertation (I’ll write later about some of that work, which you can see on Silvia’s website). Spencer worked closely with us on this project in 2018, as he participated in Caltech’s Summer Undergraduate Research Fellowship program. His project was to work with us to help build the methodology for voter registration database auditing. Currently, Spencer is working in computer science and engineering here at Caltech. This paper is a great example of how we like to involve graduate and undergraduate students in our voting technology and election administration research.

How to avoid an election meltdown in 2020: Improve voter registration database security and monitoring

One of the most shocking parts of the Mueller report details the widespread efforts by Russian hackers to attack American election infrastructure in 2016.

Specifically, the report presents evidence that the Russian intelligence (GRU) targeted state and local election administration systems, that they have infiltrated the computer network of the Illinois State Board of Elections and at least one Florida County during the 2016 presidential election, using means such as SQL injection and spear phishing. They also targeted private firms that provide election administration technologies, like software systems for voter registration.

This is stunning news, and a wake-up call for improving the integrity and security of election administration and technology in the United States.

The Mueller report does not provide evidence that these hacking attempts altered the reported results of elections in 2016 or 2018. Instead the report highlights hacking efforts aimed at gaining access to voter registration databases, which might seem surprising to many.

Prior to the 2000 presidential election, voter registration data was maintained in a hodgepodge of ways by county and state election officials. After the passage of the Help America Vote Act in 2002, states were required to centralize voter registration data in statewide electronic databases, to improve the accuracy and accessibility of voter registration data in every state.

But one consequence of building statewide voter registration datasets is that they became attractive targets for hackers. Rather than targeting hundreds or thousands of election administration systems at the county level, hackers can now target a single database system in every state.

Why would hackers want to target voter registration systems?

First, a hacker could alter registration records in a state or county, or delete records, with the goal being to wreak havoc on Election Day. By dropping voters, or by changing voter addresses, names, or partisan affiliations, a hacker could create chaos on Election Day—for instance, voters could go to the right polling place, only to find that their name is not on the roster, and thus be denied the chance to vote.

A hack of this type, if done in a number of counties in a battleground state like Florida, could lead to an election meltdown like we saw in the 2000 presidential election.

Second, a hacker could be more systematic in their efforts. They could add fake voters to the database, and if they had access to the electronic systems used to send absentee ballots, get access to ballots for these fake voters.

This type of hack could enable a large-scale effort to actually change the outcome of an election, if the hackers marked and returned the ballots for these fake voters.

These vulnerabilities are real, and an unintended consequence of the development of centralized electronic statewide voter registration databases in the United States. There is little doubt that the attempts by hackers to target voter registration systems in 2016 and 2018 could have produced widespread disruption of either election, had they been successful.

There is also little doubt that efforts to hack voter registration databases in the United States will continue. The GRU will have better knowledge as to what vulnerabilities exist in our election systems and how to target them. What can we do to secure these databases, to prevent these attacks and to make sure that we can detect them if hackers gain access to registration databases?

Obviously, state and county election officials must continue their efforts to solidify the security of voter registration databases. They must also continue their efforts to make sure that strong electronic security practices are in place, to make sure that hackers cannot gain access to passwords and other administrative systems they might exploit to gain access to registration data.

There are further steps that can be taken by election officials to secure registration data.

In a pilot project that we at Caltech have conducted with the Orange County (California) Registrar of Voters, we built a set of software applications that monitor the County’s database of registered voters for anomalies. This pilot project was financially supported by a research grant to Caltech from the John Randolph Haynes and Dora Haynes Foundation. Details are available on the project’s website.

Working with the Registrar, we began getting daily snapshots of the County’s dataset of about 1.5 million registered voters about a year ago. We run our algorithms to look for anomalous changes in the database. Our algorithms can detect situations when unexpectedly large numbers of records are removed or added, and when unexpectedly large numbers of records are being changed. Thus, our algorithms can detect attempts to manipulate voter registration data.

After running our algorithms, we produce detailed reports that we send to the Registrar, letting them know if we see anomalies that require further investigation. We have developed other data-driven tools to monitor the 2018 elections in Orange County, looking at voting-by-mail patterns, turnout, and social media mentions. The results of this comprehensive monitoring appear on our pilot project’s website, providing transparency that we believe helps voters and stakeholders remain confident that the County’s voter registration data is secure.

This type of database and election system monitoring is critical for detecting and mitigating attempts to hack an election. It also helps isolate other issues that might occur in the administration of an election. By finding problems quickly, election officials can resolve them. By making the results of our monitoring available to the public, voters and stakeholders can be assured in the integrity of the election.

We are now working to build similar collaborations with other state and county election officials, to provide independent third-party monitoring of registration databases, and other related election administration infrastructure. Not only is it critical for election officials to monitor their data systems to make sure they have a high degree of integrity, it is also important that the public know that registration data is being monitored and is secure.

It’s happened again — more DMV-generated registration snafus in California

Late this past week, there were stories in California newspapers about yet another snafu by the DMV in their implementation of the state’s “motor voter” process. This time, DMV seems to have incorrectly put people into the voter registration system — though exactly how that happened is unclear.

For example, the LA Times reported about the new snafus in their coverage, “California’s DMV finds 3,000 more unintended voter registrations”:

Of the 3,000 additional wrongly enrolled voters, DMV officials said that as many as 2,500 had no prior history of registration and that there’s no clear answer as to what mistake was made that caused registration data for them to be sent to California’s secretary of state.

The Secretary of State’s Office is reportedly going to drop these unintended registrations from the state’s database.

As we are nearing the November 2018 midterm elections, and as there is a lot of energy and enthusiasm in California about these elections, there’s no doubt that the voter registration system will come under some stress as we get closer and closer to election day.

Our advice is that if you are concerned about your voter registration status, check it. The Secretary of State provides a service that you can use to check if you are registered to vote. Or if you’d rather not use that service, you can contact your county election official directly (many of they have applications on their websites to verify your registration status).

Voter registration snafus in California

There’s a story circulating today that another round of voter registration snafus have surfaced in California. This story in today’s LA Times, “More than 23,000 Californians were registered to vote incorrectly by state DMV” has some details about what appears to have happened:

“The errors, which were discovered more than a month ago, happened when DMV employees did not clear their computer screens between customer appointments. That caused some voter information from the previous appointment, such as language preference or a request to vote by mail, to be “inadvertently merged” into the file of the next customer, Shiomoto and Tong wrote. The incorrect registration form was then sent to state elections officials, who used it to update California’s voter registration database.”

This comes on the heels of reports before the June 2018 primary in California of potential duplicate voter registration records being produced by the DMV, as well as the snafu in Los Angeles County that left approximately 118,000 registered voters off the election-day voting rolls.

These are the sorts of issues in voter registration databases that my research group is looking into, using data from the Orange County Registrar of Voters. Since earlier this spring, we have been developing methodologies and applications to scan the County’s voter registration database to identify situations that might require additional examination by the County’s election staff. Soon we’ll be releasing more information about our methodology, and some of the results. For more information about this project, you can head to our Monitoring the Election website, or stay tuned to Election Updates.

Let’s not forget the voters

Recently my colleague and co-blogger, Charles Stewart, wrote a very interesting post, “Voters Think about Voting Machines.” His piece reminds me of something a point that Charles and I have been making for a long time — that election officials should focus attention on the opinions of voters in their jurisdictions. After all, those voters are one of the primary customers for the administrative services that election officials provide.

Of course, there are lots of ways that election officials can get feedback about the quality of their administrative services, ranging from keeping data on interactions with voters to doing voter satisfaction and confidence surveys.

But as election officials throughout the nation think about upcoming technological and administrative changes to the services they provide voters, they might consider conducting proactive research, to determine in advance of administrative or technological change what voters think about their current service, to understand what changes voters might want, and to see what might be causing their voters to desire changes in administrative services or voting technologies.

This is the sort of question that drove Ines Levin, Yimeng Li, and I to look at what might drive voter opinions about the deployment of new voting technologies in our recent paper, “Fraud, convenience, and e-voting: How voting experience shapes opinions about voting technology.” This paper was recently published in American Politics Research, and we use survey experiments to try to determine what factors seem to drive voters to prefer certain types of voting technologies over others. (For readers who cannot access the published version at APR, here is a pre-publication version at the Caltech/MIT Voting Technology Project’s website.)

Here’s the abstract, summarizing the paper:

In this article, we study previous experiences with voting technologies, support for e-voting, and perceptions of voter fraud, using data from the 2015 Cooperative Congressional Election Study. We find that voters prefer systems they have used in the past, and that priming voters with voting fraud considerations causes them to support lower-tech alternatives to touch-screen voting machines — particularly among voters with previous experience using e-voting technologies to cast their votes. Our results suggest that as policy makers consider the adoption of new voting systems in their states and counties, they would be well-served to pay close attention to how the case for new voting technology is framed.

This type of research is quite valuable for election officials and policy makers, as we argue in the paper. How administrative or technological change is framed to voters — who are the primary consumers of these services and technologies — can really help to facilitate the transition to new policies, procedures, and technologies.

Report on “Voter Fraud” Rife With Inaccuracies

I look forward to a more detailed analysis by voter registration and database match experts of the GAI report that will be presented to the Presidential Advisory Commission on Election Integrity , but even a cursory reading reveals a number of serious misunderstandings and confusions that call into question that authors’ understanding of some of the most basic facts about voter registration, voting, and elections administration in the United States.

Fair warning: I grade student papers as part of my job, and one of the comments I make most often is “be precise”. Categories and definitions are fundamentally important, especially in a highly politicized environment like that current surrounding American elections.

The GAI report is far from precise; it’s not a stretch to say at many points that it’s sloppy and misinformed. I worry that it’s purposefully misleading. Perhaps I overstate the importance of some of the mistakes below. I leave that for the reader to judge.

  • The report uses an overly broad and inaccurate definition of vote fraud.

American voter lists are designed to tolerate invalid voter registration records, which do not equate to invalid votes, because to do otherwise would lead to eligible voters being prevented from casting legal votes.

But the report follows a very common and misleading attempt to conflate errors in the voter rolls with “voter fraud”. Read their “definition”:

Voter fraud is defined as illegal interference with the process of an election. It can take many forms, including voter impersonation, vote buying, noncitizen voting, dead voters, felon voting, fraudulent addresses, registration fraud, elections officials fraud, and duplicate voting.8

Where did this definition come from? As the source of the definition, they cite the Brennan Center report “The Truth About Voter Fraud” (https://www.brennancenter.org/sites/default/files/legacy/The%20Truth%20About%20Voter%20Fraud.pdf). 

However, the Brennan Center authors are very careful to define voter fraud. From Pg. 4 of their report in a way that directly warns against an overly broad and imprecise definition:

Voter fraud” is fraud by voters. More precisely, “voter fraud” occurs when individuals cast ballots despite knowing that they are ineligible to vote, in an attempt to defraud the election system.1

This sounds straightforward. And yet, voter fraud is often conflated, intentionally or unintentionally, with other forms of election misconduct or irregularities.

To be fair to the authors, they do not conflate in their analysis situations such as being registered in two places at once with “voter fraud”, but the definition is sloppy, isn’t supported by the report they cite, and reinforces a highly misleading claim that voter registration errors are analogous to voter fraud.

David Becker can describe ad nauseam how damaging this misinterpretation has been.

  • The report makes unsubstantiated claims about the efficacy of Voter ID in preventing voter fraud.

Regardless of how you feel about voter ID, if you are going to claim that voter ID prevents in-person vote fraud, you need to provide actual proof, not just a supposition. The report authors write:

GAI also found several irregularities that increase the potential for voter fraud, such as improper voter registration addresses, erroneous voter roll birthdates, and the lack of definitive identification required to vote.

The key term here is “definitive identification”, a term that appears nowhere in HAVAThe authors either purposely or sloppily misstate the legal requirements of HAVA.  On pg. 20 of the report, they write that HAVA has a

“requirement that eligible voters use definitive forms of identification when registering to vote”

The word “definitive” appears again, and a bit later in the paragraph, it appears that a “definitive” ID, according to the authors, is:

“Valid drivers’ license numbers and the last four digits of an individual’s social security number…”,

But not according to HAVA. HAVA requirements are, as stated in the report:

“Alternative forms of identification include state ID cards, passports, military IDs, employee IDs, student IDs, bank statements, utility bills, and pay stubs.”

The rhetorical turn occurs at the end of the paragraph, when the authors conclude that these other forms of ID are:

“less reliable than the driver’s license and social security number standard”. This portion of the is far from precise.

and apparently not “definitive” and hence prone to fraud.

Surely the authors don’t intend to imply that a passport is “less reliable” than a drivers license and social security number. In many (most?) states, a “state ID card” is just as reliable as a drivers license. I’m not familiar with the identification requirements for a military ID—perhaps an expert can help out?[ED NOTE: I am informed by a friend that a civilian ID at the Pentagon requires a retinal scan and fingerprints]–but are military IDs really less “definitive” than a driver’s license?

If you are going to claim that voter fraud is an issue requiring immediate national attention, and that states are not requiring “definitive” IDs, you’d better get some of the most basic details of the most basic laws and procedures correct.

  • The authors claim states did not comply with their data requests, when it appears that state officials were simply following state law

The authors write:

(t)he Help America Vote Act of 2002 mandates that every state maintains a centralized statewide database of voter registrations.14

That’s fine, but the authors seem to think this means that HAVA requires that the states make this information available to researchers at little to no cost. Anyone who has worked in this field knows that many states have laws that restrict this information to registered political entities. Most states restrict the number of data items that can be released in the interests of confidentiality.

Rather than acknowledging that state officials are constrained by state law, the authors claim non-compliance:

In effect, Massachusetts and other states withhold this data from the public.

I can just hear the gnashing of teeth in the 50 state capitols.I am sympathetic with the authors’ difficulties in obtaining statewide voter registration and voter history files. Along with the authors, I would like to see all state files be available for a low or modest fee, and to researchers.

There is no requirement that the database be made available for an affordable fee, nor that the database be available beyond political entitles.  These choices are left to the states.  it is wrong to charge “non-compliance” when an official is following statute (passed by their state legislatures).

I don’t know whether the report authors didn’t have subject matter knowledge or were purposefully trying to create a misleading image of non-cooperation with the Commission.

  • The report shows that voter fraud is nearly non-existent, while simultaneously
    claiming the problem requires “immediate attention”.

But let’s return to the bottom line conclusion of the report: voter fraud is pervasive enough to require “immediate attention.” Do their data support this claim?

The most basic calculation would be the rate of “voter fraud” as defined in the report The 45,000 figure (total potential illegally cast ballots) is highly problematic, based on imputing from suspect calculations in 21 states, then imputed to 29 other states without considering even the most basic rules of statistical calculation.

Nonetheless, even if you accept the calculation, it translates into a “voter fraud” rate of 0.000323741007194 (45,000 / 139 million), or three thousandths of a percent.

This is almost exactly the probability that you will be struck across your whole lifetime (a chance of 1 in 3000 http://news.nationalgeographic.com/news/2004/06/0623_040623_lightningfacts.html)

I’m not the first one to notice this comparison—see pg. 4 of the Brennan Center report cited below. And here I thought I found something new!


There are many, many experts in election sciences and election administration that could have helped the Commission conduct a careful scientific review of the probability of duplicate registration and duplicate voting.  This report, written by Lorraine Minnite more than a decade ago lays out precisely the steps that need to be taken to uncover voter fraud and how statewide voter files should be used in this effort. There are many others in the field including those worried about voter fraud and those who are skeptics of voter fraud who have been calling for just such a careful study.

Unfortunately, the Commission instead chose to consult a “consulting firm” with no experience in the field, and which chose to consult database companies who also had no expertise in the field.

I’m sure that other experts will examine in more detail the calculations about duplicate voting. However, at first look, the report fails the smell test. It’s a real stinker.


Paul Gronke
Professor, Reed College
Director, Early Voting Information Center

http://earlyvoting.net

Estimating Turnout with Self-Reported Survey Data

There’s long been a debate about the accuracy of voter participation estimates that use self-reported survey data. The seminal research paper on this topic, by Rosenstone and Wolfinger, was published in 1978 (available here for those of you with JSTOR access). They pointed out a methodological problem in the Current Population Survey data they used in their early and important analysis: there seemed to be more people in the survey reporting that they voted, than likely voted in the federal elections they studied.

In the years since the publication of Rosenstone and Wolfinger’s paper, there’s been a lot of debate among academic researchers about this apparent misreporting of turnout in survey self-reports of behavior, much more than I can easily summarize here. But many survey researchers have been using “voter validation” to try to alleviate these potential biases in their survey data, which involves matching survey respondents who say they voted to administrative voter history record (after the election); this approach has been used in many large-scale academic surveys of political behavior, including many of the American National Election Studies.

In an important new study, recently published in Public Opinion Quarterly, Berent, Krosnick and Lupia, set out to test the validation of self-reports of turnout against post-election voter history data. Their paper, “Measuring Voter Registration and Turnout in Surveys: Do Official Government Records Yield More Accurate Assessments”, is one that people interested in studying voter turnout using survey data should read. Here’s the important results from their paper’s abstract:

We explore the viability of turnout validation efforts. We find that several apparently viable methods of matching survey respondents to government records severely underestimate the proportion of Americans who were registered to vote. Matching errors that severely underestimate registration rates also drive down “validated” turnout estimates. As a result, when “validated” turnout estimates appear to be more accurate than self-reports because they produce lower turnout estimates, the apparent accuracy is likely an illusion. Also, among respondents whose self-reports can be validated against government records, the accuracy of self-reports is extremely high. This would not occur if lying was the primary explanation for differences between reported and official turnout rates.

This is an important paper, which deserves close attention. As it is questioning one of the common means of trying to validate self-reported turnout, not only do we need additional research to confirm their results, we need new research to better understand how we can best adjust self-reported survey participation to get the most accurate turnout estimate that we can, using survey data.

Estimating racial and ethnic identity from voting history data

Researchers who have participated in redistricting efforts, or who for other reasons have used voter history files in their work, know how difficult it is to estimate a voter’s racial and ethnic identity from these data. These files typically contain a voter’s name, date of birth, address, date of registration, and their participation in recent elections. The usual approach that many have take to estimate each voter’s racial or ethnic identity has been to use “surname dictionaries” which will classify many of the last names in a voter history file to many racial or ethnic groups.

The obvious problem is that with an increasingly diverse society, this surname matching procedure may be less and less accurate. The surnames of many Americans are no longer necessarily accurate for estimating racial or ethnic identity.

Charles recently wrote about one recent paper in Political Analysis on this topic, by Kosuke Imai and Kabir Khanna, “Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Records”. Charles provided an excellent summary of this article, but I’d like to point out to readers that the Imai and Khanna article is now available for free reading online, so check it out asap!

The the other recent article in Political Analysis on this question is by J. Andrew Harris, “What’s in a Name? A Method for Extracting Information about Ethnicity from Names.” Here’s Harris’s abstract:

Questions about racial or ethnic group identity feature centrally in many social science theories, but detailed data on ethnic composition are often difficult to obtain, out of date, or otherwise unavailable. The proliferation of publicly available geocoded person names provides one potential source of such data—if researchers can effectively link names and group identity. This article examines that linkage and presents a methodology for estimating local ethnic or racial composition using the relationship between group membership and person names. Common approaches for linking names and identity groups perform poorly when estimating group proportions. I have developed a new method for estimating racial or ethnic composition from names which requires no classification of individual names. This method provides more accurate estimates than the standard approach and works in any context where person names contain information about group membership. Illustrations from two very different contexts are provided: the United States and the Republic of Kenya.

Harris’s paper is open access, which means it’s also freely available for people to read online.

There’s a lot of interesting research going on in how to use these types of administrative datasets for innovative research; I encourage readers to take a look at both papers, and I’d also like to note that the code and data for both papers are available on the Political Analysis Dataverse.

Making sure that California election officials are ready for the upcoming primary

California’s statewide primary is approaching rapidly, and it sounds as if voter interest in the primary is building. This could be an important test of the state’s top-two primary system, and it might the first time that we see strong voter turnout under the top-two. Clearly election officials throughout the state need to be prepared — there might be a lot of last-minute new registrants, a lot of ballots cast by mail, and perhaps many new voters showing up on election day. The LA Times editorialized about this exactly concern, “How do we prevent the California primary from becoming another Arizona?”.