Author Archives: Michael Alvarez

Research on instant-runoff and ranked-choice elections

Given the interest in Maine’s ranked-choice election tomorrow, I thought that this recent paper with Ines Levin and Thad Hall might be of interest. The paper was recently published in American Politics Research, “Low-information voting: Evidence from instant-runoff elections.” . Here’s the paper’s abstract:

How do voters make decisions in low-information contests? Although some research has looked at low-information voter decision making, scant research has focused on data from actual ballots cast in low-information elections. We focus on three 2008 Pierce County (Washington) Instant-Runoff Voting (IRV) elections. Using individual-level ballot image data, we evaluate the structure of individual rankings for specific contests to determine whether partisan cues underlying partisan rankings are correlated with choices made in nonpartisan races. This is the first time that individual-level data from real elections have been used to evaluate the role of partisan cues in nonpartisan races. We find that, in partisan contests, voters make avid use of partisan cues in constructing their preference rankings, rank-ordering candidates based on the correspondence between voters’ own partisan preferences and candidates’ reported partisan affiliation. However, in nonpartisan contests where candidates have no explicit partisan affiliation, voters rely on cues other than partisanship to develop complete candidate rankings.

There’s a good review of the literature on voting behavior in ranked-choice or instant-runoff elections in the paper, for folks interested in learning more about what research has been done so far on this topic.

“Fraud, convenience, and e-voting”

Ines Levin, Yimeng Li, and I, recently published our paper “Fraud, convenience, and e-voting: How voting experience shapes opinions about voting technology” in the Journal of Information Technology and Politics. Here’s the paper’s abstract:

In this article, we study previous experiences with voting technologies, support for e-voting, and perceptions of voter fraud, using data from the 2015 Cooperative Congressional Election Study. We find that voters prefer systems they have used in the past, and that priming voters with voting fraud considerations causes them to support lower-tech alternatives to touch-screen voting machines — particularly among voters with previous experience using e-voting technologies to cast their votes. Our results suggest that as policy makers consider the adoption of new voting systems in their states and counties, they would be well-served to pay close attention to how the case for new voting technology is framed.

The substantive results will be of interest to researchers and policymakers. The methodology we use — survey experiments — should also be of interest to those who are trying to determine how to best measure the electorate’s opinions about potential election reforms.

Our Orange County project

It’s been a busy few weeks here in California for election geeks, specifically for our research group at Caltech. We’ve launched a pilot test of an election integrity project, in collaboration with Orange County, where we have been using the recent primary here in California to test various methodologies for helping evaluate election administration.

At this point, our goal is to work closely with the Orange County Registrar of Voters to understand what evaluative tools they believe are most helpful to them, and to also determine what sorts of data we can readily obtain during the period immediately before and after a major statewide election.

We recently launched a website that describes the project, and where we are building a dashboard that summarizes the various research products as we produce them.

The website is Monitoring the Election, and if you navigate there you’ll see descriptions of the goals of this project, and some of the preliminary analytics we have produced regarding the June 5, 2018 primary in Orange County. At present, the dashboard has a visualization of the Twitter data we are collecting, an analysis of vote by mail ballot mailing and return, and our observations of early voting in Orange County. In the next day or two we will add some first-pass post-election forensics, a preliminary report on our election day observations, and an observation report regarding the risk-limiting audit that OCRV will conduct early this week.

Again, the project is in pilot phase. We will be evaluating these various analytic tools over the summer, and we will determine which we can produce quickly for the November 2018 general election in Orange County.

Stay tuned!

Deja vu? The National Academies of Science voter registration databases research

Over the past few months, I’ve had this strange sense of deja vu, with all of the news about potential attacks on state voter registration databases, and more recently the questions that have been asked about the security and integrity of state voter registries.

Why? Because many of the questions that are being asked these days about the integrity of US voter registration databases (in particular, by the “Presidential Commission on Election Integrity” (or “Pence commission”), have already been examined in the National Academies of Science (NAS) 2010 study of voter registration databases.

The integrity of state voter registries was exhaustively studied back in 2010, when I was a member of this NAS panel studying how to improve voter registries. In 2010 our panel issued it’s final report, “Improving State Voter Registration Databases”.

I’d call upon the members of the “Pence commission” to read this report prior to their first meeting next week.

I think that if the commission members read this report, they will find that many of the questions they seem to be asking about the security, reliability, accuracy, and integrity of statewide voter registration databases were studied by the NAS panel back in 2010.

The NAS committee had a all-star roster. It had world-renown experts on computer security, databases, record linkage and matching, and election administration; it also included a wide range of election administrators. The committee met frequently with a wide range of additional experts, consulted with a wide range of research, and produced the comprehensive report in 2010 on the technical considerations for voter registries (see Chapter 3 of the report, “Technical Considerations for Voter Registration Databases”). The committee also produced a series of short-term and long-term recommendations for improvement of state registries (Chapters 5 and 6 of the report).

At this point in time, the long-term recommendations from the NAS report bear repeating.

  • Provide funding to support operations, maintenance, and upgrades.
  • Improve data collection and entry.
  • Improve matching procedures.
  • Improve privacy, security, and backup.
  • Improve database interoperability.

As we look towards the 2018 election cycle, my assessment is that scholars and election administrators need to turn their attention to studying matching procedures, improving interoperability, and how to make these datafiles both more secure and more private. States need to provide the necessary funding for this research, and for these improvements. I’d love to see the “Pence commission” engage in a serious discussion of how to improve funding for research and technical improvements of voter registration systems.

So my reaction to the recent requests from the “Pence commission” is that there’s really no need to request detailed state registration and voter information from the states; the basic research on the strengths and weaknesses of state voter registries has been done. Just read the 2010 NAS report, you’ll learn all you need to know about the integrity of state voter registries and steps that are still needed to improve their security, reliability, and accuracy.

Pre-registration of 16 and 17 year olds in California

California has recently launched a program, which allows eligible 16 and 17 year old in California to pre-register to vote. When they turn 18, their registration becomes active. Here’s more information from the CA Secretary of State’s website.

It’s going to be quite interesting in 2018 and 2020 to evaluate how this initiative works. Will those who pre-register be more likely to turnout to vote than those who do not (obviously, controlling for all of the factors that might lead 18 year olds to register and vote)? Who uses this program, and does it have any consequences for how organizations and campaigns conduct voter registration drives and get-out-the-vote activities? Lots of interesting questions here for future study!

New research on election forensics

Election forensics is a hot topic for research these days, and recently Arturas Rozenas from NYU published an interesting new paper at Political Analysis (the journal I co-edit). His paper, “Detecting Election Fraud from Irregularities in Vote-Shares” should be of interest to folks studying election integrity and election fraud. Here’s the abstract:

I develop a novel method to detect election fraud from irregular patterns in the distribution of vote-shares. I build on a widely discussed observation that in some elections where fraud allegations abound, suspiciously many polling stations return coarse vote-shares (e.g., 0.50, 0.60, 0.75) for the ruling party, which seems highly implausible in large electorates. Using analytical results and simulations, I show that sheer frequency of such coarse vote-shares is entirely plausible due to simple numeric laws and does not by itself constitute evidence of fraud. To avoid false positive errors in fraud detection, I propose a resampled kernel density method (RKD) to measure whether the coarse vote-shares occur too frequently to raise a statistically qualified suspicion of fraud. I illustrate the method on election data from Russia and Canada as well as simulated data. A software package is provided for an easy implementation of the method.

And since Political Analysis requires that authors provide code and data to replicate the work reported in their paper, here’s the replication materials from Arturas.

Tweeting the Election

We are pleased to launch this new website,, which provides live and real-time summary analytics of the conversation currently occurring on Twitter, about election administration and voting technology, as Americans go to vote in the 2016 Presidential election. This website was developed at Caltech, by Emily Mazo, Shihan Su, and Kate Lewis, as their project for CS 101, and in collaboration with myself and the Caltech/MIT Voting Technology Project.

This website offers two views of some of the discussion about the election that is now occurring on Twitter. These visualizations compare the discussion occurring amongst people with different political ideologies: we have separated the tweets by the predicted ideology of the author.

The first view is geographic, showing the sentiment of incoming tweets by state. In the geographic view, which shows the average sentiment in every state over the past six hours and is updated every few minutes, dots on the map display the most recent tweets, and you can hover over those dots to view the content of the tweet.

At the top of the website, clicking over to the timeline view, you can see the sentiment of the recent incoming tweets by hour, for the past twelve hours.

In each view, we offer data on four different election administration and voting technology issues: tweets about absentee and early voting, polling places, election day voting, and voter identification. We collect these tweets by filtering results from the Public Streaming Twitter API, by keyword, for tweets falling into these categories.

Furthermore, we classify each incoming tweet in two ways. First, we do a sentiment analysis on the incoming tweet, to classify it as positive (given as green in both views) or negative (given as red in both views). We also classify the Twitter users as being more likely to be liberal or conservative in their political orientation, so that viewers can get a sense for how discussion on Twitter of these election administration and voting technology issues breaks by ideology.

We continue to work to improve our analytical tools, and the presentation of this data on this website. We welcome comments, please send them to

Finally, this website is currently best viewed in fullscreen mode, on a larger laptop or computer screen. Mobile optimization, and viewing on smaller sized screens, is for future development.

Some Background

This is part of a larger project at Caltech, which began in the fall of 2014, where we began collecting tweets like these for studying election administration. The background of the project, as well as some initial analyses at validation of these data, is contained in the working paper, Adams-Cohen et al. (2016), “Election Monitoring Using Twitter” (forthcoming).

1. How we collect these tweets
The Tweet data used in this visualization is collected from Twitter’s Public Streaming API. The stream is filtered on keywords for the four different topics of interest:

Election Day Voting: Provisional ballot, Voting machine, Ballot.
Polling Places: Polling place line, precinct line, Pollworker, Poll worker.
Absentee Voting: Absentee ballot, mail ballot, vote by mail, voting by mail, early voting.
Voter ID: Voter identification, Voting identification, Voter ID.

2. Sentiment analysis
The Tweets collected from this stream were then fed into two classification models. The first model classified them into positive or negative classes based on their text. This model was created with crowdsourced data; about 5000 Tweets from a previous collection of Tweets collected in the same manner as those in the visualization were labeled with sentiment (valence) on a positive-to-negative scale, by at least three crowd workers, and then averaged to create a standard label for that Tweet. This training sets of Tweets and labels was then used to create a term frequency-inverse document frequency vector for the the words in each Tweet in the set. These vectors were used to train a decision tree regression model to predict the value for the sentiment in future Tweets (a high positive predicted value indicating higher positive sentiment, and a more negative value indicating a more negative sentiment).

The Tweets that appear on tweetingtheelection were streamed from the Twitter API, stripped of stop words, hashtags, URLs, and any other media, then processed to create Tf-Idf vectors to represent each Tweet using the same vocabulary as the original model. These vectors were then passed through the decision tree regression model, which predicted sentiment labels for them.

3. Ideology classification
This model classify tweets into republican or liberal based on their text.
Training data
Training data for this model is obtained through two process.
First, we obtain ideal point estimation for twitter users from Pablo Barbera’s work <“Birds of the Same Feather Tweet Together. Bayesian Ideal Point Estimation Using Twitter Data”>. In his work, Barbera develop a latent space model to estimate user’s ideal point by studying the following links between users.

Second, we match the user id obtained from Barbera with the tweets collected by R. Michael Alvarez’s lab from 2016-04-19 to 2016-06-23. In order words, we label tweets using the label of the user who create the tweets. We matched around 55,000 tweets and use these as our training data.

4. Classifier
The classifier we use is a convolutional neural network.
The model originally developed by Yoon Kim in his work . We adopt Denny Britz’s Implementation of this model in tensor flow.

The model we adopted has four layers. The first layer embeds words into 128 dimensional vectors, which are learned within the neural network. The next convolution layers use three filter sizes(3, 4, 5), i.e. sliding over 3, 4, 5 words at a time. Then applies 128 filters to each of the three filter size. Next, the model max-pool the result of convolutional layer into a 128 feature vector, add dropout regularization. Then the final softmax layer classify the result.

5. Geocoding
The incoming tweets are geocoded. We use the coordinates of where the tweet was sent from, if they are provided (if the user opted into geocoding). If that information is unavailable, we use the location of the user from their profile (if provided).


There are many people to thank. First off, has created by Emily Mazo, Shihan Su, and Kate Lewis, as their project for CS 101 at Caltech. That class is taught by Yisong Yue and Omer Tamuz, professors at Caltech.

Earlier work done on this project, in particular the ongoing collection of Twitter data like these since 2014 at Caltech, has been done in collaboration with Nailen Matschke (who developed the original python-based Twitter collection tool, as well as the MySQL database where all of the data collected so far is stored); with Clare Hao and Cherie Jia, who worked in the summer of 2016 as SURF students, developing some preliminary python tools to analyze the Twitter data; and Nick Adams-Cohen, a PhD student in the social sciences at Caltech, who is studying the use of Twitter data for public opinion and political behavior research.

We’d like to thank Pablo Barbera for sharing his ideological placement data with us.

Other colleagues and students who have helped with this research project include Thad Hall, Jonathan Nagler, Lucas Nunez, Betsy Sinclair, and Charles Stewart III.

How secure are state voter registration databases?

Over the past few months, there’ve been a number of reports that state voter registration databases have come under cyberattack. Most recently, FBI Director James Comey discussed the attacks in a hearing of the U.S. House Judiciary Committee. While details have been few, between what’s been reported by the FBI recently, and the various attacks on the email accounts of political parties and political leaders in the U.S., it’s seems clear that the U.S. election infrastructure is being probed for vulnerabilities.

So exactly how secure are state voter registration databases? I’ve been asked about this a number of times recently, by the media, colleagues, and students.

The potential threats to state voter registration databases have been known for a long time. In fact, I wrote a paper on this topic in October 2005 — that’s not a typo, October 2005. The paper, “Potential Threats to Statewide Voter Registration Systems”, is available as a Caltech/MIT Voting Technology Project Working Paper. It’s also part of a collection of working papers in a NIST report, “Developing an Analysis of Threats to Voting Systems: Preliminary Workshop Summary.”

The context for my 2005 paper was that states were then rushing to implement their new computerized statewide voter registries, as required after the passage of the Help America Vote Act. At the time, a number of researchers (myself included) were concerned that in the rush to develop and implement these databases, and that important questions about their security and integrity needed to be addressed. So the paper was meant to provide some guidance about the potential security and integrity problems, in the hopes that they would be better studied and addressed in the near future.

The four primary types of threats that I wrote about regarded:

  • Authenticity of the registration file: attacks on the transmission path of voter registration data from local election officials to the state database, or attacks on the transmission path of data between the state registry to other state officials (for example, departments of motor vehicles).
  • Security of personal information in the file: state voter files contain a good deal of personal information, including names, birthdates, addresses, and contact information, which could be quite valuable to attackers.
  • Integrity of the file: the primary data files could be corrupted, either by mistakes which enter the data and are difficult to remove, or by systematic attack.
  • System failure: the files could fail at important moments, either due to problems with their architecture or technology, or if they come under systematic “denial of service” attacks.

By 2010, when I was a member of a National Academies panel, “Improving State Voter Registration Databases”, many of these same concerns were raised by panelists, and by the many election officials and observers of elections who provided input to the panel. It wasn’t clear how much progress had been made by 2010, towards making sure that the state voter registration systems then in place were secure.

Fast-forward to 2016, and very little research has been done on the security and integrity of state voter registration databases; despite the concerns raised in 2005 and 2010, there’s not been a great deal of research focused on the security of these systems, certainly nowhere near the amount of research that has focused on the security of other components of the American election infrastructure, in particular, the security of remote and in-person voting systems. I’d be happy to hear of research that folks have done; I’m aware of only a very few research projects that have looked at state voter registration systems. For example, there’s a paper that I worked on in 2009 with Jeff Jonas, Bill Winkler, and Rebecca Wright, where we matched the voter registration files from Oregon and Washington, in an effort to determine the utility of interstate matching to find duplicates between the states. Another paper that I know of is by Steve Ansolabehere and Eitan Hersh, which looks at the quality of the information in state voter registries. But there’s not been a lot of systematic study of the security of state voter registries; I recommend that researchers (like our Voting Technology Project) direct resources towards studying voter registration systems now, and in the immediate aftermath of the 2016 election.

In addition to calling for more research on the security of state voter registration databases, election officials can take some immediate steps. The obvious step is to take action to make sure that these databases are now as secure as possible, and to determine whether there is any forensic evidence that the files might have been attacked or tampered with recently. A second step is to make sure that the database system will be robust in the face of a systematic denial of service attack. Third, election officials can devise a systematic approach towards providing pre- and post-election audits of their databases, something that I’ve strongly recommended in other work on election administration auditing (with Lonna Atkeson and Thad Hall). If election officials do audit their voter registration databases and processes, those reports should be made available to the public.

Felony Disenfranchisement

I frequently am asked by students, colleagues, and the media, about how many people in the U.S. cannot participate in elections because of felony disenfranchisement laws. Given the patchwork quilt of felony disenfranchisement laws across the states, and a lack of readily available data, it’s often hard to estimate what the rate of felony disenfranchisement might be.

The Sentencing Project has released a report that provides information and data about felony disenfranchisement and the 2016 federal elections in the U.S. Here are their key findings, quoted from their report:

“Our key findings include the following:

– As of 2016, an estimated 6.1 million people are disenfranchised due to a felony conviction, a figure that has escalated dramatically in recent decades as the population under criminal justice supervision has increased. There were an estimated 1.17 million people disenfranchised in 1976, 3.34 million in 1996, and 5.85 million in 2010.

– Approximately 2.5 percent of the total U.S. voting age population – 1 of every 40 adults – is disenfranchised due to a current or previous felony conviction.

– Individuals who have completed their sentences in the twelve states that disenfranchise people post-sentence make up over 50 percent of the entire disenfranchised population, totaling almost 3.1 million people.

– Rates of disenfranchisement vary dramatically by state due to broad variations in voting prohibitions. In six states – Alabama, Florida, Kentucky, Mississippi, Tennessee, and Virginia – more than 7 percent of the adult population is disenfranchised.

– The state of Florida alone accounts for more than a quarter (27 percent) of the disenfranchised population nationally, and its nearly 1.5 million individuals disenfranchised post-sentence account for nearly half (48 percent) of the national total.

– One in 13 African Americans of voting age is disenfranchised, a rate more than four times greater than that of non-African Americans. Over 7.4 percent of the adult African American population is disenfranchised compared to 1.8 percent of the non-African American population.

– African American disenfranchisement rates also vary significantly by state. In four states – Florida (21 percent), Kentucky (26 percent), Tennessee (21 percent), and Virginia (22 percent) – more than one in five African Americans is disenfranchised.”

This looks like a useful resource for those interested in understanding the possible electoral implications of felony disenfranchisement laws across the U.S.