Author Archives: Michael Alvarez

Election forensics and machine learning

We recently published a new paper on election forensics in PLOS ONE, “Election forensics: Using machine learning and synthetic data for possible election anomaly detection.” . It’s a paper that I wrote with Mali Zhang (a recent PhD student at Caltech), and Ines Levin at UCI. PLOS ONE is an open access journal, so there is no paywall!

Here’s the paper’s abstract:

Assuring election integrity is essential for the legitimacy of elected representative democratic government. Until recently, other than in-person election observation, there have been few quantitative methods for determining the integrity of a democratic election. Here we present a machine learning methodology for identifying polling places at risk of election fraud and estimating the extent of potential electoral manipulation, using synthetic training data. We apply this methodology to mesa-level data from Argentina’s 2015 national elections.

This new PLOS ONE paper advances the paper that Ines and I coauthored with Julia Pomares, “Using machine learning algorithms to detect election fraud”, that appeared in the volume of papers that I edited, Computational Social Science: Discovery and Prediction. This is an area where my research group and some of my collaborators are continuing to work on methodologies to quickly obtain elections data and analyze it for anomalies and outliers, similar to our Monitoring the Election project. More on all of this soon!

Auditing Voter Registration Databases

As many readers know, we’ve been working on a variety of election performance auditing projects, focusing on voter registration database auditing and other types of statistical forensics (you can see a number of examples on our Monitoring The Election project page). We also have been working on post-election ballot auditing, including our recent Election Audit Summit.

Recently the first paper from our new election integrity project in Orange County (CA) was published, in the peer-reviewed journal American Politics Research. This paper, “Evaluating the Quality of Changes in Voter Registration Databases”, was co-authored by myself, Silvia Seo-young Kim (a Ph.D. student here at Caltech), and Spencer Schneider (a Caltech undergrad). Here’s the paper’s abstract:

The administration of elections depends crucially upon the quality and integrity of voter registration databases. In addition, political scientists are increasingly using these databases in their research. However, these databases are dynamic and may be subject to external manipulation and unintentional errors. In this article, using data from Orange County, California, we develop two methods for evaluating the quality of voter registration data as it changes over time: (a) generating audit data by repeated record linkage across periodic snapshots of a given database and monitoring it for sudden anomalous changes and (b) identifying duplicates via an efficient, automated duplicate detection, and tracking new duplicates and deduplication efforts over time. We show that the generated data can serve not only to evaluate voter file quality and election integrity but also as a novel source of data on election administration practices.

An ungated pre-print version of this paper is available from the Caltech/MIT Voting Technology Project’s website, as Working Paper 134.

We are continuing this work with Orange County, and have in recent months been working to explore how these same voter registration database auditing methodologies can work in larger jurisdictions (Los Angeles County) and in states (Oregon). More on those results soon.

The process that led to the development of this project, and to the publication of this paper, is also interesting to recount. In this paper, we make use of daily voter registration “snapshots”, that we obtained from the Orange County Registrar of Voters, starting back in April 2018. This required that we collaborate closely with Neal Kelley, the Orange County Registrar of Voters, and his staff. We are very happy to participate in this collaborative effort, and thank Neal and his team for their willingness to work with us. It’s been a very productive partnership, and we are very excited to continue our collaboration with them going in the 2020 election cycle. This is the sort of academic-election official partnership that we have worked to build and foster at the Caltech/MIT Voting Technology Project since our project’s founding in the immediate aftermath of the 2000 presidential election.

It’s also fun to note that both of my coauthors are Caltech students. Silvia is in her final year in our Social Science Ph.D. program, and she is working on related work for her dissertation (I’ll write later about some of that work, which you can see on Silvia’s website). Spencer worked closely with us on this project in 2018, as he participated in Caltech’s Summer Undergraduate Research Fellowship program. His project was to work with us to help build the methodology for voter registration database auditing. Currently, Spencer is working in computer science and engineering here at Caltech. This paper is a great example of how we like to involve graduate and undergraduate students in our voting technology and election administration research.

The plot thickens: Which Florida counties were targeted by hackers?

Earlier this week I wrote about the recent news that hackers may have gained access to election administration systems in at least one Florida county in 2016: see How to avoid an election meltdown in 2020: Improve voter registration database security and monitoring.

Now in the news are reports that may have been two Florida counties where hackers may have gained access to county election administration system in 2016 (see the NYT story, for example, “Russians Hacked Voter Systems in 2 Florida Counties. But Which Ones?”). This has set off a guessing game — which Florida county election administration systems might have been breached in 2016, and what where the consequences?

I’d like to return attention, though, to what I think is the most important issue here. It’s not whether one or two county systems were breached in 2016, the most important thing is to make sure that as we go into the 2020 election cycle, that security and auditing systems are in place to detect any malicious or accidental manipulations of voter registration databases. It’s now May 2019, and we have plenty of time to evaluate the current security protocols for these critical databases in every state, to improve those protocols where necessary, and to put in place database auditing and monitoring tools like those we have been working on in our Monitoring the Elections project.

Now’s the time to act — while we still can improve the security of voter registration systems, and establish auditing procedures to detect any efforts to manipulate the critical information in those systems.

How to avoid an election meltdown in 2020: Improve voter registration database security and monitoring

One of the most shocking parts of the Mueller report details the widespread efforts by Russian hackers to attack American election infrastructure in 2016.

Specifically, the report presents evidence that the Russian intelligence (GRU) targeted state and local election administration systems, that they have infiltrated the computer network of the Illinois State Board of Elections and at least one Florida County during the 2016 presidential election, using means such as SQL injection and spear phishing. They also targeted private firms that provide election administration technologies, like software systems for voter registration.

This is stunning news, and a wake-up call for improving the integrity and security of election administration and technology in the United States.

The Mueller report does not provide evidence that these hacking attempts altered the reported results of elections in 2016 or 2018. Instead the report highlights hacking efforts aimed at gaining access to voter registration databases, which might seem surprising to many.

Prior to the 2000 presidential election, voter registration data was maintained in a hodgepodge of ways by county and state election officials. After the passage of the Help America Vote Act in 2002, states were required to centralize voter registration data in statewide electronic databases, to improve the accuracy and accessibility of voter registration data in every state.

But one consequence of building statewide voter registration datasets is that they became attractive targets for hackers. Rather than targeting hundreds or thousands of election administration systems at the county level, hackers can now target a single database system in every state.

Why would hackers want to target voter registration systems?

First, a hacker could alter registration records in a state or county, or delete records, with the goal being to wreak havoc on Election Day. By dropping voters, or by changing voter addresses, names, or partisan affiliations, a hacker could create chaos on Election Day—for instance, voters could go to the right polling place, only to find that their name is not on the roster, and thus be denied the chance to vote.

A hack of this type, if done in a number of counties in a battleground state like Florida, could lead to an election meltdown like we saw in the 2000 presidential election.

Second, a hacker could be more systematic in their efforts. They could add fake voters to the database, and if they had access to the electronic systems used to send absentee ballots, get access to ballots for these fake voters.

This type of hack could enable a large-scale effort to actually change the outcome of an election, if the hackers marked and returned the ballots for these fake voters.

These vulnerabilities are real, and an unintended consequence of the development of centralized electronic statewide voter registration databases in the United States. There is little doubt that the attempts by hackers to target voter registration systems in 2016 and 2018 could have produced widespread disruption of either election, had they been successful.

There is also little doubt that efforts to hack voter registration databases in the United States will continue. The GRU will have better knowledge as to what vulnerabilities exist in our election systems and how to target them. What can we do to secure these databases, to prevent these attacks and to make sure that we can detect them if hackers gain access to registration databases?

Obviously, state and county election officials must continue their efforts to solidify the security of voter registration databases. They must also continue their efforts to make sure that strong electronic security practices are in place, to make sure that hackers cannot gain access to passwords and other administrative systems they might exploit to gain access to registration data.

There are further steps that can be taken by election officials to secure registration data.

In a pilot project that we at Caltech have conducted with the Orange County (California) Registrar of Voters, we built a set of software applications that monitor the County’s database of registered voters for anomalies. This pilot project was financially supported by a research grant to Caltech from the John Randolph Haynes and Dora Haynes Foundation. Details are available on the project’s website.

Working with the Registrar, we began getting daily snapshots of the County’s dataset of about 1.5 million registered voters about a year ago. We run our algorithms to look for anomalous changes in the database. Our algorithms can detect situations when unexpectedly large numbers of records are removed or added, and when unexpectedly large numbers of records are being changed. Thus, our algorithms can detect attempts to manipulate voter registration data.

After running our algorithms, we produce detailed reports that we send to the Registrar, letting them know if we see anomalies that require further investigation. We have developed other data-driven tools to monitor the 2018 elections in Orange County, looking at voting-by-mail patterns, turnout, and social media mentions. The results of this comprehensive monitoring appear on our pilot project’s website, providing transparency that we believe helps voters and stakeholders remain confident that the County’s voter registration data is secure.

This type of database and election system monitoring is critical for detecting and mitigating attempts to hack an election. It also helps isolate other issues that might occur in the administration of an election. By finding problems quickly, election officials can resolve them. By making the results of our monitoring available to the public, voters and stakeholders can be assured in the integrity of the election.

We are now working to build similar collaborations with other state and county election officials, to provide independent third-party monitoring of registration databases, and other related election administration infrastructure. Not only is it critical for election officials to monitor their data systems to make sure they have a high degree of integrity, it is also important that the public know that registration data is being monitored and is secure.

Better methods for informing voters — new research

Outside of the U.S., there has been a great deal of interest in voter-advice applications (VAAs). These tools give voters an opportunity to get unbiased advice about which candidates or parties to support, usually in complex multicandidate or multiparty contexts.

However the academic research on VAAs has, to date, focused on observational studies and hasn’t shown a clear causal connection between VAA use and changes in voting intentions. However, Joelle Pianzola, Alexander H. Trechsel, Kristjan Vassil, Guido Schwerdt, and I just published a paper in the Journal of Politics, “The Impact of Personalized Information on Vote Intention: Evidence from a Randomized Field Experiment.” In the paper we present evidence from a randomized controlled field experiment in Switzerland that indicates that VAA use produced changes in voter intentions.

Here’s the paper’s abstract:

Voting advice applications (VAAs) are voter information tools that millions of individuals have used in recent elections throughout the world. However, little is known about how they affect political behavior. Until now, observational studies of VAA have produced inconclusive results. Here we present the results from a randomized field experiment in Switzerland that estimates the causal effects of VAA use on voters’ vote intentions. Our results suggest that usage of the Swiss VAA smartvote strengthened the vote intention for the most preferred party and also increased the number of parties considered as potential vote options. These results imply that VAAs can influence voting behavior and that they can play an important role in electoral politics.

Residual votes in the 2016 presidential election

After generally declining after the 2000 presidential election, the national residual vote rate rose in the 2016 presidential election. Why?

We tackle this question in a new VTP working paper, “Residual Votes and Abstention in the 2016 Election,” which Charles Stewart III and I wrote with Stephen Pettigrew and Cameron Wimply. Here’s the paper’s abstract:

We analyze the significant increase in the residual vote rate in the 2016 presidential election. The residual vote rate, which is the percentage of ballots cast in a presidential election that contain no vote for president, rose nationwide from 0.99% to 1.41% between 2012 and 2016. The primary explanation for this rise is an increase in abstentions, which we argue results primarily from disaffected Republicans more than from alienated Democrats. In addition, other factors related to election administration and electoral competition also explain variation in the residual vote rates across states, particularly the use of mail/absentee ballots and the lack of competition at the top of the ticket in non-battleground states. However, we note that the rise in the residual vote rate was not due changes in voting technologies. The analysis relies on a combination of public opinion and election return data to address these issues.

Research on polling place lines and dynamics in PRQ!

Readers may remember that in 2016 a consortium of researchers from across the U.S. (including Caltech) participated in a large study of polling places lines and dynamics in the November general election. The great news is that some of the results have been published in the journal Political Research Quarterly. The study is a wonderful example of how much progress has been made in developing a science of election study.

The paper, “Waiting to Vote in the 2016 Presidential Election: Evidence from a Multi-county Study”, is now available on the journal’s website. The lead author is Robert M. Stein. Here’s the paper’s abstract:

This paper is the result of a nationwide study of polling place dynamics in the 2016 presidential election. Research teams, recruited from local colleges and universities and located in twenty-eight election jurisdictions across the United States, observed and timed voters as they entered the queue at their respective polling places and then voted. We report results about four specific polling place operations and practices: the length of the check-in line, the number of voters leaving the check-in line once they have joined it, the time for a voter to check in to vote (i.e., verify voter’s identification and obtain a ballot), and the time to complete a ballot. Long lines, waiting times, and times to vote are closely related to time of day (mornings are busiest for polling places). We found the recent adoption of photographic voter identification (ID) requirements to have a disparate effect on the time to check in among white and nonwhite polling places. In majority-white polling places, scanning a voter’s driver’s license speeds up the check-in process. In majority nonwhite polling locations, the effect of strict voter ID requirements increases time to check in, albeit modestly.

Research on UOCAVA voting

In today’s world, the shelf life of a typical academic research article is pretty short. Most papers are published electronically, with a quick and immediate burst of attention (usually fueled by conversation about the paper on social media). After that initial burst of attention, for most academic papers, mentions online and citations quickly wane.

So it was with some pride that I heard of continued interest in a paper that I published over a decade ago with Thad E. Hall and Brian F. Roberts, “Military Voting and the Law: Procedural and Technological Solutions to the Ballot Transit Problem.” In the paper, we looked at UOCAVA voting, focusing on how the focus on the issue has changed from concerns about procedures to concerns about technologies.

I’ve gone back and re-read this paper, and thought I’d write about it here as it covers the history of UOCAVA voting quite well. It serves as a good primer for the history of the issues surrounding UOCAVA voting, and it really sets the stage well for understanding the challenges that UOCAVA voters and election officials face when they try to make sure that UOCAVA voters can easily and securely exercise their voting rights. The basic technological challenges that we discuss in the paper are as true and real today as they were when we wrote the paper over a decade ago.

And the good news is that this paper is available online, so give it a read if you are interested in the history of UOCAVA voting.

Mitigating Mischief

As I wrote last week, CSPAN recently profiled research that I’ve been working on with Andy Sinclair. The interview aired over the past weekend, and it’s now online. Here’s the link to the CSPAN interview.

And here is a link to our book, Nonpartisan Primary Election Reform: Mitigating Mischief.

Andy and I are working on new work on the top-two primary, in collaboration with Christian Grose (USC) and Betsy Sinclair (WUSTL). We hope to have our next book with Christian and Betsy done soon, stay tuned!