Author Archives: Michael Alvarez

California’s Super, Super Tuesday: What to Expect

On March 3, California will be one of fourteen states holding primary elections (American Samoa will have caucuses that day). California’s 454 delegates to the Democratic National Convention will be at stake on March 3, meaning that California is a very large prize for candidates still seeking the Democratic presidential nomination.

But there’s a very good chance that we will not know the winner of California’s Democratic presidential nomination primary the evening of March 3. In fact, we may not know how California’s delegates will be allocated until much later in March. This will be especially true if there’s no clear front-runner in the Democratic presidential nomination contest by March 3.

So why are we anticipating that we may not know the winner of the Democratic presidential primary in California after polls close on March 3?

California is in the midst of sweeping changes in election administration procedures and voting technologies. While some of these changes started in 2018 in some counties, they are now hitting the larger counties in the state, in particular Orange and Los Angeles Counties. Election officials throughout the state have been working in recent years to make the process of registration, getting a ballot, and returning that marked ballot, much easier and more convenient. And it’s these changes that are likely to introduce significant delays in the tabulation of ballots after the polls close on March 3, and which could well delay the determination of a winner in California’s Democratic primary for days or weeks, if the contest is close statewide.

California election officials have sent out an unprecedented number of ballots by mail. For example, in Orange County the Registrar of Voters has mailed just over 1.6 million absentee ballots to registered voters. Many of those ballots (269,690 as of February 27 in Orange County) have been returned — but the vast majority of them are still in the hands of voters. We estimate that many voters will be dropping their voted absentee ballots in the mail in coming days, or they will drop them off in voting centers between now and Election Day. And if a vote-by-mail ballot is received and validated on or before Election Day, but is received by the election official no later than 3 days after March 3, it will be included in the tabulation. This means that there are likely to be a large number of these by-mail ballots that will be received on Election Day, and in the 3 days following Election Day, that will all need to be processed, validated, and included in the tabulation (mostly after March 3).

Californians who for some reason haven’t registered yet to vote, but who want to register now and participate in the March primary, can do so using what is called “Conditional Voter Registration” (CVR), in which they can register and vote at many locations in their county (usually the county election headquarters, a vote center, or a polling place). It’s unknown how many potentially eligible Californians may take advantage of the the CVR opportunity, but it’s possible that we might see large numbers of conditional registration voters between now and March 3, and of course many of these voters will not have their materials processed, and if they are eligible to vote, to have their ballots included in the tabulation, until after the primary on March 3. If there is a swell of interest in the primary election among currently unregistered but eligible voters, this could significantly slow down the reporting for final results after March 3.

Finally, there is also a good chance that there will be strong turnout on March 3, potentially resulting in crowded voting locations statewide, and producing a very large number of ballots, CVRs applications and provisional ballots, and by-mail ballots dropped off on Election Day. If turnout is strong in in the March primary, the large amount of election material that will need to be reconciled and examined after Election Day could also slow the tabulation process, and could introduce significant delays in the reporting of results.

Now that’s just on the administrative side. It also turns out that the rules governing the allocation of California’s 494 Democratic National Convention (DNC) delegates are exceptionally complex, so complex that they will require another blog post. The important issues are that most of the state’s DNC delegates are allocated proportionally to the statewide primary winners, and to the primary winners of the primary in each of the state’s Congressional Districts — but only those candidate receiving more than 15% of the votes cast in either case get delegates. So in order to know the delegate count from California’s Super Tuesday primary, we’ll need accurate counts of the votes cast in each Congressional District, and that could take days or even weeks.

There’s a good chance that we may not know the final delegate count until for a few weeks after the primary. So patience — the process will take time, and let’s give our election officials the opportunity to do their jobs and to produce an accurate tabulation of the results of California’s Super Tuesday March primary.

Twitter Monitoring for the 2020 Super Tuesday Primaries

We’ve launched our Twitter election monitors for the 2020 Super Tuesday primaries, the visualizations are now being posted at Monitoring the Election, or you can see them on GitHub. These data are being collected using a similar process to the one we tested and deployed in 2018. And if you are interested in seeing the code from 2018, here’s a link for that GitHub repo.

The major improvement since 2018 is that we’ve rebuild the code base, and this now runs in the cloud. That should make our collection stream more reliable, and will allow us to scale the collection process to cover additional keywords and hashtags when necessary. These improvements are the subject of a technical paper that is now under development, and we hope to release soon.

We also continue to work on the geo-classifaction of the data we are collecting, and have a few improvements in our process that we’ll roll out soon. These improvements should allow us to monitor social media discussion about the Super Tuesday primaries in Southern California.

The team working on this project includes Nicholas Adams-Cohen (now a post-doc at Stanford University) and Jian Cao (a post-doc here at Caltech).

The Iowa Democratic Caucus: Why Elections Need To Be Fully and Independently Audited

It’s Friday morning, and by this time I think that everyone who follows American elections thought that we’d have some clear sense of the outcome of the Iowa Democratic Caucus.

Instead, we have headlines like this, from the New York Times, “Iowa Caucus Results Riddled With Errors and Inconsistencies.” While it’s not necessarily surprising that there are errors and inconsistencies in the current tabulation reports from the Iowa Democratic caucuses, the issue is that we may never get a clear, trustworthy, and accurate tabulation of the caucus results.

It’s helpful that the caucuses produced tabulation results on paper — and these paper tabulation records can be examined, and these records can form the basis for recounting and even auditing the caucus results. But it doesn’t seem that there ever was any intention for anyone to try to audit or validate the results of the caucus. And I keep scratching my head, wondering why, given how close and competitive the Democratic presidential selection contest has been, it doesn’t appear that anyone considered building a process to audit and validate the caucus results in near-real time.

For example, in our Monitoring The Election project, we pilot tested independent and near real-time quantitative auditing of a number of aspects of the election process in Orange County (CA) in 2018. We are now just starting to do that same type of auditing in both Orange County and Los Angeles County for the March Super Tuesday primary (we’ll start releasing some of our auditing reports very soon). A similar process could have been used in the Iowa caucuses.

What would it involve? Quite simply, the Iowa Democratic Party could work out a data provision plan with an independent auditing group (say the Caltech/MIT Voting Technology Project and/or university and college teams in Iowa). They could securely provide encrypted images of the tabulation reports from the caucus sites, and the independent auditing team would then produce auditing reports for each round of tabulation. These reports, like those that we currently produce as part of our project, would of course be provided to the appropriate officials and then posted to a public website. As rounds of tabulation proceed, this process could continue, until the final tabulation is complete, at which time the independent auditing group could provide their evaluation of the final reported tabulation.

This could have been done earlier this week, and had such a system been in place, it might have helped provide an independent perspective on the problems with the initial tabulations on Monday night, and quite likely could have alleviated a lot of the rumors and misinformation about why the tabulation was proceeding so slowly and why the results were riddled with errors and inconsistencies. By announcing, in advance of the caucuses, a plan for independent auditing of the tabulation results by a trustworthy third-party, the Iowa Democratic Party could have relied on the auditing process to help them figure out the issues in the tabulation, and perhaps helped to buttress confidence in the accuracy of the reported results.

At this point in time, while the data is being released, it’s unfortunate that there wasn’t an independent auditing process established before the crisis hit.

In my opinion, one of the most important lessons from this experience this week is that election processes need to be fully and independently audited. Whether those audits are conducted by academic researchers, or by other third-parties, they need to be a regular component of the administration of any public election process (caucuses, primaries, special elections, and general elections). I think that election officials throughout the United States can learn a lesson about the importance of independent election performance auditing from the chaos of the Iowa Democratic caucuses.

The Iowa Caucus: A Frustrating Start to Election 2020

Like most observers of elections, I got my bowl of popcorn and turned on the TV last night, expecting to learn more about who “won” the Democratic caucuses in Iowa. I enjoyed the popcorn, but got a bit bored watching the pundits speculating endlessly about why they didn’t have immediate results from the caucuses last night.

While like everyone else, I’d like to learn more about who “won” the Democratic caucuses in Iowa, I’d also like to make sure that when the officials there announce the results, they provide the most accurate results they can — and they provide a detailed explanation for why there has been such a delay in reporting the results.

As my colleagues on the Caltech/MIT Voting Technology Project and I have said for nearly two decades now, democracy can be a messy business. Elections (including primaries and caucuses) are complex to administer, they inevitably involve new and old technology, and with hundreds of thousands of people participating they take time to get right. I suggest that we all take a deep breath, let the Iowa Democratic Party figure it all out, and be patient. It’s much better for American democracy, and for the confidence of voters and stakeholders, if we get accurate results and an explanation for the delay, rather than hurried and incorrect results.

And for the rest of this election cycle, I suggest continued patience. As we move further into the primary season, and then into the fall general election, issues like what we are now witnessing in Iowa will continue to arise. It’s likely that on Super Tuesday we might not know who “won” California immediately after the polls close that evening, for example. But we should let election officials have the time and space to get the results right, and to be transparent and open with the public about why delays or issues arise in the administration and tabulation of elections.

The 2018 Voting Experience

My fellow VTP Co-director, Charles Stewart III and some of his research team, released an important study last week: “The 2018 Voting Experience: Polling Place Lines.” Charles and his team continue to find that long lines can be an issue in many states and election jurisdictions. They estimated that in 2018, 5.7% of those who tried to vote on Election Day waited more than 30 minutes to vote, and that this was significantly longer than what they had found in the previous federal midterm election in 2014. Importantly, they also show that wait times are not distributed uniformly across the electorate, with nonwhite and voters in densely populated areas waiting longer to vote than whites and voters in less densely populated areas. Finally, as they note that wait times are strongly correlated with a voter’s overall experience at the polls, long wait times are an issue that needs continued attention in the United States. This is especially true as we are heading into what may be a very closely contested array of state and federal primary and general elections in 2020, where many states and jurisdictions may see much higher turnout than in 2016 and 2018.

Seo-young Silvia Kim’s research on the costs of moving on turnout

Seo-young Silvia Kim, one of our PhD students at Caltech working on our Monitoring the Election project, has recently posted a really interesting working paper online, “Getting Settled in Your New Home: The Costs of Moving on Voter Turnout.” Silvia’s recently presented this paper at a couple of conferences and in research seminars at a number of universities.

What is the dynamic impact of moving on turnout? Moving depresses turnout by imposing various costs on voters. However, movers eventually settle down, and such detrimental effects can disappear over time. I measure these dynamics using United States Postal Services (USPS) data and detailed voter panel data from Orange County, California. Using a generalized additive model, I show that previously registered voters who move close to the election are significantly less likely to vote (at most -16.2 percentage points), and it takes at least six months on average for turnout to recover. This dip and recovery is not observed for within-precinct moves, suggesting that costs of moving matter only when the voter’s environment has sufficiently changed. Given this, can we accelerate the recovery of movers’ turnout? I evaluate an election administration policy that resolves their re-registration burden. This policy proactively tracks movers, updates their registration records for them, and notifies them by mailings. Using a natural experiment, I find that it is extremely effective in boosting turnout (+5.9 percentage points). This success of a simple, pre-existing, and non-partisan safety net is promising, and I conclude by discussing policy implications.

This is important and innovative work, I highly recommend her paper for readers interested in voter registration and voter turnout. She uses two different methods, one observational and the other causal, to show the reduction in the likelihood of turnout for registered voters who move.

Election forensics and machine learning

We recently published a new paper on election forensics in PLOS ONE, “Election forensics: Using machine learning and synthetic data for possible election anomaly detection.” . It’s a paper that I wrote with Mali Zhang (a recent PhD student at Caltech), and Ines Levin at UCI. PLOS ONE is an open access journal, so there is no paywall!

Here’s the paper’s abstract:

Assuring election integrity is essential for the legitimacy of elected representative democratic government. Until recently, other than in-person election observation, there have been few quantitative methods for determining the integrity of a democratic election. Here we present a machine learning methodology for identifying polling places at risk of election fraud and estimating the extent of potential electoral manipulation, using synthetic training data. We apply this methodology to mesa-level data from Argentina’s 2015 national elections.

This new PLOS ONE paper advances the paper that Ines and I coauthored with Julia Pomares, “Using machine learning algorithms to detect election fraud”, that appeared in the volume of papers that I edited, Computational Social Science: Discovery and Prediction. This is an area where my research group and some of my collaborators are continuing to work on methodologies to quickly obtain elections data and analyze it for anomalies and outliers, similar to our Monitoring the Election project. More on all of this soon!

Auditing Voter Registration Databases

As many readers know, we’ve been working on a variety of election performance auditing projects, focusing on voter registration database auditing and other types of statistical forensics (you can see a number of examples on our Monitoring The Election project page). We also have been working on post-election ballot auditing, including our recent Election Audit Summit.

Recently the first paper from our new election integrity project in Orange County (CA) was published, in the peer-reviewed journal American Politics Research. This paper, “Evaluating the Quality of Changes in Voter Registration Databases”, was co-authored by myself, Silvia Seo-young Kim (a Ph.D. student here at Caltech), and Spencer Schneider (a Caltech undergrad). Here’s the paper’s abstract:

The administration of elections depends crucially upon the quality and integrity of voter registration databases. In addition, political scientists are increasingly using these databases in their research. However, these databases are dynamic and may be subject to external manipulation and unintentional errors. In this article, using data from Orange County, California, we develop two methods for evaluating the quality of voter registration data as it changes over time: (a) generating audit data by repeated record linkage across periodic snapshots of a given database and monitoring it for sudden anomalous changes and (b) identifying duplicates via an efficient, automated duplicate detection, and tracking new duplicates and deduplication efforts over time. We show that the generated data can serve not only to evaluate voter file quality and election integrity but also as a novel source of data on election administration practices.

An ungated pre-print version of this paper is available from the Caltech/MIT Voting Technology Project’s website, as Working Paper 134.

We are continuing this work with Orange County, and have in recent months been working to explore how these same voter registration database auditing methodologies can work in larger jurisdictions (Los Angeles County) and in states (Oregon). More on those results soon.

The process that led to the development of this project, and to the publication of this paper, is also interesting to recount. In this paper, we make use of daily voter registration “snapshots”, that we obtained from the Orange County Registrar of Voters, starting back in April 2018. This required that we collaborate closely with Neal Kelley, the Orange County Registrar of Voters, and his staff. We are very happy to participate in this collaborative effort, and thank Neal and his team for their willingness to work with us. It’s been a very productive partnership, and we are very excited to continue our collaboration with them going in the 2020 election cycle. This is the sort of academic-election official partnership that we have worked to build and foster at the Caltech/MIT Voting Technology Project since our project’s founding in the immediate aftermath of the 2000 presidential election.

It’s also fun to note that both of my coauthors are Caltech students. Silvia is in her final year in our Social Science Ph.D. program, and she is working on related work for her dissertation (I’ll write later about some of that work, which you can see on Silvia’s website). Spencer worked closely with us on this project in 2018, as he participated in Caltech’s Summer Undergraduate Research Fellowship program. His project was to work with us to help build the methodology for voter registration database auditing. Currently, Spencer is working in computer science and engineering here at Caltech. This paper is a great example of how we like to involve graduate and undergraduate students in our voting technology and election administration research.

The plot thickens: Which Florida counties were targeted by hackers?

Earlier this week I wrote about the recent news that hackers may have gained access to election administration systems in at least one Florida county in 2016: see How to avoid an election meltdown in 2020: Improve voter registration database security and monitoring.

Now in the news are reports that may have been two Florida counties where hackers may have gained access to county election administration system in 2016 (see the NYT story, for example, “Russians Hacked Voter Systems in 2 Florida Counties. But Which Ones?”). This has set off a guessing game — which Florida county election administration systems might have been breached in 2016, and what where the consequences?

I’d like to return attention, though, to what I think is the most important issue here. It’s not whether one or two county systems were breached in 2016, the most important thing is to make sure that as we go into the 2020 election cycle, that security and auditing systems are in place to detect any malicious or accidental manipulations of voter registration databases. It’s now May 2019, and we have plenty of time to evaluate the current security protocols for these critical databases in every state, to improve those protocols where necessary, and to put in place database auditing and monitoring tools like those we have been working on in our Monitoring the Elections project.

Now’s the time to act — while we still can improve the security of voter registration systems, and establish auditing procedures to detect any efforts to manipulate the critical information in those systems.

How to avoid an election meltdown in 2020: Improve voter registration database security and monitoring

One of the most shocking parts of the Mueller report details the widespread efforts by Russian hackers to attack American election infrastructure in 2016.

Specifically, the report presents evidence that the Russian intelligence (GRU) targeted state and local election administration systems, that they have infiltrated the computer network of the Illinois State Board of Elections and at least one Florida County during the 2016 presidential election, using means such as SQL injection and spear phishing. They also targeted private firms that provide election administration technologies, like software systems for voter registration.

This is stunning news, and a wake-up call for improving the integrity and security of election administration and technology in the United States.

The Mueller report does not provide evidence that these hacking attempts altered the reported results of elections in 2016 or 2018. Instead the report highlights hacking efforts aimed at gaining access to voter registration databases, which might seem surprising to many.

Prior to the 2000 presidential election, voter registration data was maintained in a hodgepodge of ways by county and state election officials. After the passage of the Help America Vote Act in 2002, states were required to centralize voter registration data in statewide electronic databases, to improve the accuracy and accessibility of voter registration data in every state.

But one consequence of building statewide voter registration datasets is that they became attractive targets for hackers. Rather than targeting hundreds or thousands of election administration systems at the county level, hackers can now target a single database system in every state.

Why would hackers want to target voter registration systems?

First, a hacker could alter registration records in a state or county, or delete records, with the goal being to wreak havoc on Election Day. By dropping voters, or by changing voter addresses, names, or partisan affiliations, a hacker could create chaos on Election Day—for instance, voters could go to the right polling place, only to find that their name is not on the roster, and thus be denied the chance to vote.

A hack of this type, if done in a number of counties in a battleground state like Florida, could lead to an election meltdown like we saw in the 2000 presidential election.

Second, a hacker could be more systematic in their efforts. They could add fake voters to the database, and if they had access to the electronic systems used to send absentee ballots, get access to ballots for these fake voters.

This type of hack could enable a large-scale effort to actually change the outcome of an election, if the hackers marked and returned the ballots for these fake voters.

These vulnerabilities are real, and an unintended consequence of the development of centralized electronic statewide voter registration databases in the United States. There is little doubt that the attempts by hackers to target voter registration systems in 2016 and 2018 could have produced widespread disruption of either election, had they been successful.

There is also little doubt that efforts to hack voter registration databases in the United States will continue. The GRU will have better knowledge as to what vulnerabilities exist in our election systems and how to target them. What can we do to secure these databases, to prevent these attacks and to make sure that we can detect them if hackers gain access to registration databases?

Obviously, state and county election officials must continue their efforts to solidify the security of voter registration databases. They must also continue their efforts to make sure that strong electronic security practices are in place, to make sure that hackers cannot gain access to passwords and other administrative systems they might exploit to gain access to registration data.

There are further steps that can be taken by election officials to secure registration data.

In a pilot project that we at Caltech have conducted with the Orange County (California) Registrar of Voters, we built a set of software applications that monitor the County’s database of registered voters for anomalies. This pilot project was financially supported by a research grant to Caltech from the John Randolph Haynes and Dora Haynes Foundation. Details are available on the project’s website.

Working with the Registrar, we began getting daily snapshots of the County’s dataset of about 1.5 million registered voters about a year ago. We run our algorithms to look for anomalous changes in the database. Our algorithms can detect situations when unexpectedly large numbers of records are removed or added, and when unexpectedly large numbers of records are being changed. Thus, our algorithms can detect attempts to manipulate voter registration data.

After running our algorithms, we produce detailed reports that we send to the Registrar, letting them know if we see anomalies that require further investigation. We have developed other data-driven tools to monitor the 2018 elections in Orange County, looking at voting-by-mail patterns, turnout, and social media mentions. The results of this comprehensive monitoring appear on our pilot project’s website, providing transparency that we believe helps voters and stakeholders remain confident that the County’s voter registration data is secure.

This type of database and election system monitoring is critical for detecting and mitigating attempts to hack an election. It also helps isolate other issues that might occur in the administration of an election. By finding problems quickly, election officials can resolve them. By making the results of our monitoring available to the public, voters and stakeholders can be assured in the integrity of the election.

We are now working to build similar collaborations with other state and county election officials, to provide independent third-party monitoring of registration databases, and other related election administration infrastructure. Not only is it critical for election officials to monitor their data systems to make sure they have a high degree of integrity, it is also important that the public know that registration data is being monitored and is secure.