Some thoughts about the reports of supposed evidence of election irregularities in MI, PA, and WI

The Internet lit up on Monday over the news, reported in New York Magazine, that a team of computer scientists and lawyers had reported to the Clinton campaign that “they’ve found persuasive evidence that results in Wisconsin, Michigan, and Pennsylvania may have been manipulated or hacked. ”

A later posting in Medium by University of Michigan computer science professor J. Alex Halderman, who was quoted in the NY Mag piece,  stated that the reporter had gotten the point of the analysis wrong, along with some of the numbers.  As he notes, the important point is that all elections should be audited, and not only if you have statistics suggesting that something might be fishy.

Unfortunately, the cat is out of the bag.  Because of the viral spread of the NY Mag article, the conspiratorially minded now have something to hang their hats on, if they want to believe that the 2016 election (like the 2004 election) was stolen by hacked electronic voting machines.

Many of my friends who are not conspiratorially minded have been asking me if I believe the statistical analysis suggested by the NY Mag piece is evidence that something is amiss.  They’re not satisfied with me echoing Alex Halderman’s point that this is beside the point.  So, here are some thoughts about the statistical analysis.

  1. Some very good commentary about the statistical analysis has already appeared in fivethirtyeight.com and vox.com.  Please read it.  (And, do read Halderman’s Medium post, referenced above.)
  2. I should start my own commentary by saying that I have not seen the actual statistical analysis alluded to by the NY Mag piece.  I know no one else who has seen it, either.  (I’ve asked.)  Therefore, I must make assumptions about what was done.  I’ve been doing analyses such as this for over 16 years, so I have a good idea about what was probably done, but without the actual study and the data on which the analysis was conducted, I can’t claim to be replicating the study.  (By the way, I’m also assuming that a “study” was done, but it’s also not at all clear that this was the case.  It could be that Halderman and his colleagues provided some thoughts to the Clinton campaign, and this communication was misconstrued by the public when word got out.)
  3. The gist of the analysis described by NY Mag appears to be comparing Clinton’s vote share across the types of voting machines used by voters in Michigan, Pennsylvania, and Wisconsin.  To attempt a replication of this analysis, it would be necessary to obtain election returns and voting machine data at the appropriate unit of analysis from these three states.
  4. Voting machine use.  Both Michigan and Wisconsin only use paper ballots for Election Day voting.*  Therefore, one simply cannot compare the performance of electronic and paper systems within these states.  This sentence in the NY Mag article must be false:  “The academics presented findings showing that in Wisconsin, Clinton received 7 percent fewer votes in counties that relied on electronic-voting machines compared with counties that used optical scanners and paper ballots.”  On the other hand, some counties in Pennsylvania do use electronic voting machines, known in the election administration field as “DREs” for “direct recording electronic” devices.  Pennsylvania, therefore, could be used to compare results for voters who used electronic machines with those who used paper.
  5. Voting machine data.  For many decades Kim Brace, the owner of Election Data Services, has collected data about the use of voting technologies as a part of his business.  Every four years I buy Kim’s updated data, which I have done for 2016.  Verified Voting also has a publicly available data set that reports voting machine use at the local level.  I tend to prefer Brace’s data because of his long track record of gathering it.  As I show below, both data sources tell similar stories about the use of voting machines in Pennsylvania.  The comparisons are the same, regardless of the voting machine data set.
  6. Election return data.  Here, I use county-level election return data I purchased from Dave Leip at his wonderful U.S. Election Atlas website. (This is from data release 0.5.)
  7. The Pennsylvania comparison.  Using the Brace voting machine data to classify counties, Clinton received 39.3% of the vote in Pennsylvania counties that used opscans and 49.0% of the vote in counties that used DREs.  However, when the standard statistical controls are included to account for the other factors that would predict the Clinton vote share in a county — race, population density, and education — the difference in vote share between Clinton and Trump is reduced to 0.095%.   Using the Verified Voting data to classify counties, Clinton received 40.2% of the vote in opscan counties and 52.4% of the vote in DRE counties.  (The Brace and Verified Voting data sets differ in reporting the machines used in four counties.)   In this case, when the statistical controls for race, population density, and education are included, the vote share difference between Clinton and Trump goes down to 0.6%.

To summarize:

  1. Virtually all Michigan and Wisconsin Election Day voters (and absentee voters, for that matter) use paper ballots.  In Michigan, these ballots are counted on scanners; in Wisconsin, some are counted by hand, but most by scanners.  Election returns from these states cannot be used to compare voting patterns using electronic machines and paper-based systems.  The core empirical claim in the NY Mag article that has the Internet all atwitter cannot be true.
  2. The difference in voting patterns between Pennsylvania voters who used  electronic machines and those who used optically scanned ballots is accounted for by the fact that voting machines are not randomly distributed in Pennsylvania.  Clinton received proportionally more votes in counties with electronic machines, but that is because these counties were disproportionately nonwhite and metropolitan — factors that are correlated with using DREs in Pennsylvania.
  3. The importance of advocating for post-election audits to ensure that the ballots were counted correctly is not a matter of electronic vs. paper ballots, or a matter of whether doing so will save the election for one’s favored candidate.  The reason all systems, regardless of technology, should be audited after every election is to ensure that the election was fair and that the equipment and counting procedures functioned properly.  This critical message was unfortunately garbled by playing to conspiratorial fears about the outcome of the 2016 election.
  4. My biggest fear in this episode is that election officials, state legislators, and voters will now regard advocates for post-election audits as part of the movement to discredit the election of Donald Trump as president.  I know that this is not the intention. My biggest hope is decisionmakers will look beyond the sensational headlines and recognize that post-election audits are simply  a good tool to make sure that the election system has functioned as intended.

*I have learned that between 5% and 10% of Wisconsin voters who are not physically disabled do use the so-called “accessibility machines,” rather than the regular opscan paper ballots.  However, I know of no election returns that have reported the results of ballots cast on these machines alone, nor do I believe that the reports discussed in the NY Mag article were referring to these ballots.

My experience with VoteCastr on Election Day

VoteCastr’s mixed record on election day providing useful information about turnout and the emerging vote totals in real time are now getting scrutiny from the press, including from its partner, Slate.  I was not involved in the development of VoteCastr, so I don’t have much to say about its difficulties in getting the numbers right.  However, I do have one direct anecdote of the VoteCastr operation, based on my observation work on Election, and a few reflections based on that experience.

The anecdote:  I spent election day travelling around Franklin and Delaware Counties in Ohio.  (That’s Columbus and the northern suburbs.)  I visited about 10 voting locations overall which accounted for something like 30 precincts.  At the first voting location I visited, a union hall on the west side of Columbus, I watched for an hour as the hard-pressed workers did their best to whittle down the line of 100 voters who had greeted them when the polls opened at 6:30. (By the end of the hour the line had grown, owing to the painfully slow work of the poll worker handling check-ins for last names ending in S-Z, but that’s another issue.)

At about 7:15, a young woman carrying an over-stuff backpack on which a VoteCastr badge had been affixed came in looking for the polling place manager.  She and he talked for a couple of minutes right next to where I was standing, so I listened in.  This was the dialogue, played out in a space the size of a large living room, stuffed full of voting equipment, folding tables, and about 30 people at any one time:

  • VoteCastr person:  I’m from VoteCastr.  I’m here to gather information about the number of people turning out to vote each hour.  How can I get that information?
  • Manager:  (Looking at the table where they are checking in voters using paper poll books):  I would love to help you, but I don’t know how we would do that.  We’d have to stop all operations and count up the number of signatures on all the poll books to get that.
  • VoteCastr person:  But, don’t you have a list of people who have voted attached to the wall over there?  (Pointing to a list of voters tacked to the wall.)
  • Manager:  Those are people who had previously voted absentee or early.  We don’t post the names of voters in real time.  We do issue reports back to the county a couple of times during the day about the number of people who voted, based on machine use.
  • VoteCastr person:  Could you get the count from looking at the machines more frequently than that?
  • Manager:  Maybe I could, but it would take one of my busy people several minutes to do that and, as you can see, we can’t spare anyone right now.
  • VoteCastr person:  Is there any other way you can think of that I could get the information?
  • Manager:  You’re welcome to count people as they come in the door.  I’m afraid that’s the only way you’re going to get the information you need on an hourly basis.

I can’t vouch for the empirical claims made by the manager or the VoteCastr person, but the manager seemed like an accommodating fellow (and amazingly poised) and the VoteCastr person was very professional and polite.  My conclusion was that they were honestly trying to make this work, but there was no easy solution.

The observation:  If the turnout reporting was so important to the VoteCastr model, why was it sending one of its data-gatherers into a precinct an hour after polls had opened with no idea about how the data and check-in processed worked?  This was either an example of poor training, poor advance knowledge among leadership about how Franklin County elections are administered, poor cooperation with local officials, or a combination of all three.

It brings to mind the work I have done for the past four years to gather precinct-level data about polling practices, for my own research and to provide advice to election officials.  One thing I’ve learned is that when you go into a precinct wanting to get data in the rush of an election, you over-prepare and you plan for each county, and indeed each precinct, to operate differently.  From what I observed, it appeared that the VoteCastr folks assumed that Franklin County had electronic poll books, like neighboring Delaware County.  With EPBs, there was a decent chance that hourly data could have been obtained.  With paper poll books, not so much.

I’m intrigued by VoteCastr and wish them well as they work out their business model.  One thing going against them — and everyone else in this space — is that presidential elections only come around every four years.  That’s bad for two reasons.

First, it’s hard raising funds and organizing a business (or a research project) during the 3.5 years before the next presidential election, because no one is thinking about it.  The right thing to do would be to be conducting endless trial runs on low-turnout elections, to work out the kinks and to gain the trust of election officials who, after all, are the gatekeepers to the precincts.

Second, presidential elections are qualitatively differently from all other elections.  The surge in activity is so much greater than even midterm congressional elections that you don’t know if you have it right until the onslaught hits; if you make mistakes, it’s an eternity until you know if you’ve make the right corrections.  This is a lesson known by election officials for decades, and now it’s a lesson being learned by the new data companies being formed to make sense of elections.

Ballots to be counted probably won’t help Clinton much

Ned Foley and I recently published two pieces of commentary, here and here, about ballots counted following election day.  Most people don’t realize this, but the election results released election night are unofficial, and are subject to updating and correcting.  Important to the updating is the counting of provisional ballots and mail-in ballots that are considered in the days leading up to the official certification of results.

In these two commentaries, we described the so-called “blue shift” that has been evident in vote counts since 2000.  The blue shift is a term given to the pattern we see, which is that the nationwide vote share has tended to shift a little bit toward the Democratic presidential candidate after election night.

A natural question to ask is whether the late-counted ballots are sufficient in this election to switch any of the states that have currently been called for the candidates.  The answer at this point seems to be “no.”  But,  New Hampshire — a state that is currently “too close to call” — has a margin so tight that the race could conceivably go either way.  (As of this writing, Clinton is ahead of Trump in the counting by 1,371 votes, out of roughly 700,000 cast.)

I have done some quick analysis, in which I’ve taken the current vote totals (as of 9:30 Wednesday morning).  I have then gone back to the 2012 presidential returns and compared the final, official results with the unofficial returns reported Wednesday morning following the election.  Taking this as an estimate of the “blue shift” we might expect in the coming days, I then add this to the current unofficial results to see how much the current preliminary tally might change in the coming days.  The following graph summarizes the results.

closeness

I’ve shown the ten states with the closest vote margins.  The arrows start with the current two-party vote share for Clinton and then add to it the fraction of the vote received by Obama in 2012 during the post-election counting period.

Note that only New Hampshire is close enough to 50/50 that the final count could flip the results from one candidate to the other.  The bad news for Clinton here is that New Hampshire actually experienced a “red shift” in 2012, so that this scenario predicts that the vote counted post-election would be to Trump’s advantage. (Because NH is an election day registration state, it’s later-counted ballots are dominated by absentees and late-arriving counts, not by provisional ballots.)

This is not to suggest that 2016 will be a repeat of 2012.  But, it is to suggest that the presidency in 2016 is probably not going to be decided in the canvass period.

Tweeting the Election

We are pleased to launch this new website, www.tweetingtheelection.com, which provides live and real-time summary analytics of the conversation currently occurring on Twitter, about election administration and voting technology, as Americans go to vote in the 2016 Presidential election. This website was developed at Caltech, by Emily Mazo, Shihan Su, and Kate Lewis, as their project for CS 101, and in collaboration with myself and the Caltech/MIT Voting Technology Project.

This website offers two views of some of the discussion about the election that is now occurring on Twitter. These visualizations compare the discussion occurring amongst people with different political ideologies: we have separated the tweets by the predicted ideology of the author.

The first view is geographic, showing the sentiment of incoming tweets by state. In the geographic view, which shows the average sentiment in every state over the past six hours and is updated every few minutes, dots on the map display the most recent tweets, and you can hover over those dots to view the content of the tweet.

At the top of the website, clicking over to the timeline view, you can see the sentiment of the recent incoming tweets by hour, for the past twelve hours.

In each view, we offer data on four different election administration and voting technology issues: tweets about absentee and early voting, polling places, election day voting, and voter identification. We collect these tweets by filtering results from the Public Streaming Twitter API, by keyword, for tweets falling into these categories.

Furthermore, we classify each incoming tweet in two ways. First, we do a sentiment analysis on the incoming tweet, to classify it as positive (given as green in both views) or negative (given as red in both views). We also classify the Twitter users as being more likely to be liberal or conservative in their political orientation, so that viewers can get a sense for how discussion on Twitter of these election administration and voting technology issues breaks by ideology.

We continue to work to improve our analytical tools, and the presentation of this data on this website. We welcome comments, please send them to tweetingtheelection@gmail.com.

Finally, this website is currently best viewed in fullscreen mode, on a larger laptop or computer screen. Mobile optimization, and viewing on smaller sized screens, is for future development.

Some Background

This is part of a larger project at Caltech, which began in the fall of 2014, where we began collecting tweets like these for studying election administration. The background of the project, as well as some initial analyses at validation of these data, is contained in the working paper, Adams-Cohen et al. (2016), “Election Monitoring Using Twitter” (forthcoming).

1. How we collect these tweets
The Tweet data used in this visualization is collected from Twitter’s Public Streaming API. The stream is filtered on keywords for the four different topics of interest:

Election Day Voting: Provisional ballot, Voting machine, Ballot.
Polling Places: Polling place line, precinct line, Pollworker, Poll worker.
Absentee Voting: Absentee ballot, mail ballot, vote by mail, voting by mail, early voting.
Voter ID: Voter identification, Voting identification, Voter ID.

2. Sentiment analysis
The Tweets collected from this stream were then fed into two classification models. The first model classified them into positive or negative classes based on their text. This model was created with crowdsourced data; about 5000 Tweets from a previous collection of Tweets collected in the same manner as those in the visualization were labeled with sentiment (valence) on a positive-to-negative scale, by at least three crowd workers, and then averaged to create a standard label for that Tweet. This training sets of Tweets and labels was then used to create a term frequency-inverse document frequency vector for the the words in each Tweet in the set. These vectors were used to train a decision tree regression model to predict the value for the sentiment in future Tweets (a high positive predicted value indicating higher positive sentiment, and a more negative value indicating a more negative sentiment).

The Tweets that appear on tweetingtheelection were streamed from the Twitter API, stripped of stop words, hashtags, URLs, and any other media, then processed to create Tf-Idf vectors to represent each Tweet using the same vocabulary as the original model. These vectors were then passed through the decision tree regression model, which predicted sentiment labels for them.

3. Ideology classification
This model classify tweets into republican or liberal based on their text.
Training data
Training data for this model is obtained through two process.
First, we obtain ideal point estimation for twitter users from Pablo Barbera’s work <“Birds of the Same Feather Tweet Together. Bayesian Ideal Point Estimation Using Twitter Data”>. In his work, Barbera develop a latent space model to estimate user’s ideal point by studying the following links between users.

Second, we match the user id obtained from Barbera with the tweets collected by R. Michael Alvarez’s lab from 2016-04-19 to 2016-06-23. In order words, we label tweets using the label of the user who create the tweets. We matched around 55,000 tweets and use these as our training data.

4. Classifier
The classifier we use is a convolutional neural network.
The model originally developed by Yoon Kim in his work . We adopt Denny Britz’s Implementation of this model in tensor flow.

The model we adopted has four layers. The first layer embeds words into 128 dimensional vectors, which are learned within the neural network. The next convolution layers use three filter sizes(3, 4, 5), i.e. sliding over 3, 4, 5 words at a time. Then applies 128 filters to each of the three filter size. Next, the model max-pool the result of convolutional layer into a 128 feature vector, add dropout regularization. Then the final softmax layer classify the result.

5. Geocoding
The incoming tweets are geocoded. We use the coordinates of where the tweet was sent from, if they are provided (if the user opted into geocoding). If that information is unavailable, we use the location of the user from their profile (if provided).

Thanks

There are many people to thank. First off, www.tweetingtheelection.com has created by Emily Mazo, Shihan Su, and Kate Lewis, as their project for CS 101 at Caltech. That class is taught by Yisong Yue and Omer Tamuz, professors at Caltech.

Earlier work done on this project, in particular the ongoing collection of Twitter data like these since 2014 at Caltech, has been done in collaboration with Nailen Matschke (who developed the original python-based Twitter collection tool, as well as the MySQL database where all of the data collected so far is stored); with Clare Hao and Cherie Jia, who worked in the summer of 2016 as SURF students, developing some preliminary python tools to analyze the Twitter data; and Nick Adams-Cohen, a PhD student in the social sciences at Caltech, who is studying the use of Twitter data for public opinion and political behavior research.

We’d like to thank Pablo Barbera for sharing his ideological placement data with us.

Other colleagues and students who have helped with this research project include Thad Hall, Jonathan Nagler, Lucas Nunez, Betsy Sinclair, and Charles Stewart III.

“Rigged election” rhetoric is having an effect on voters — just not in the way you think.

Donald Trump’s relentless messaging about a “rigged election” is having an effect on the confidence voters have that their votes will be counted accurately.  But, it’s not the effect you think.

I came to this conclusion as I was considering yesterday’s Morning Consult poll results about confidence in the vote count.  It so happens that I asked almost exactly the same question on a national poll during the pre-election period in 2012.  (I can’t take all the credit.  My colleague at Reed College, Paul Gronke, joined me in sponsoring a “double-wide” module on the 2012 Cooperative Congressional Election Study.)  I decided to compare what Morning Consult found today with what we found almost exactly four years ago.

The results were surprising.  The percentage of respondents who say that they are “very confident” that their own votes will be counted accurately is virtually unchanged from 2012.  Confidence that votes nationwide will be counted accurately has, if anything, increased since 2012.  Trump’s rhetoric appears not to have reduced Republican confidence in the accuracy of the vote count over the past four years.  Rather, it has increased the confidence of Democrats.  The degree of party polarization over the quality of the vote count has increased since 2012, but it is Democratic shifts in opinion, not Republican, that are leading to this greater polarization.

Let me sketch out the background here.  In 2012, Gronke and I coordinated our modules in the CCES to ask a series of questions about election administration to a representative sample of 2,000 adults.  Two of these questions were:

  • How confident are you that your vote in the General Election will be counted as you intended?
  • How confident are you that votes nationwide will be counted as voters intend?

The first question was asked of respondents who reported they intended to vote in 2012; the second question was asked of all respondents.

The response categories for both questions were (1) very confident, (2) somewhat confident, (3) not too confident, (4) not at all confident, and (5) I don’t know.

The corresponding Morning Consult questions were:

  • How confident are you that your vote will be accurately counted in the upcoming election?
  • How confident are you that votes across the country will be accurately counted in the upcoming election?

The response categories were identical to ours, with the exception of an additional “no opinion” option with Morning Consult.

So, while the questions are not 100% identical, they are close enough to allow some meaningful comparisons.  (For those interested in a more systematic example of how similar survey research questions can be combined in this type of analysis, see the article I co-authored with Mike Sances, which appeared last year in Electoral Studies.)  Both the 2012 and 2016 studies were conducted about three weeks ahead of the general election, so the timing couldn’t be better.

In the table below, I compare Morning Consult’s 2016 results with Gronke’s and my results in 2012.  The numbers in the table are the percentages of the indicated respondents who gave the “very confident” response.

Your own vote Votes nationwide
2012 (Gronke/Stewart) 2016 (Morning Consult) 2012 (Gronke/Stewart) 2016 (Morning Consult)
All registered voters 41% 45% 16% 28%
Democrats 47% 59% 20% 43%
Republicans 42% 41% 13% 18%

The 2012 patterns were consistent with what my colleagues and I have regularly reported:  the “winning” party tends to be more confident than the “losing” party and voters tend to be much more confidence of their own votes being counted accurately than votes nationwide.

 

The 2016 patterns are similar, with a couple of major differences.  The most important similarity is that respondents in both 2012 and 2016 were more confident their own votes would be counted accurately than votes nationwide.  In 2012, the local-nationwide gap was 25 percentage points (41% vs. 16%); in 2016, the local-nationwide gap dropped to 17 percentage points (45% vs. 28%).

 

The most important changes come as we look down the table, at the Democratic-Republican differences.  Republican and Democratic opinions have changed in very different ways since 2012.  At the local level, Republicans remain about as confident as they were in 2012, but Democratic confidence has grown.  As a consequence, the Democratic-Republican gap in the confidence about local vote counting has grown from 5 percentage points to a much more substantial 18 percentage points.

 

In assessing the accuracy of the vote count nationwide, Republicans are actually a little more confident in 2016 than in 2012 (18% vs 13%), but this small change from 2016 is likely due to subtle differences between the two studies.  On the other hand, Democrats have become a lot more confident.  They are now a whopping 23 percentage points more confident than in 2012 that votes will be counted accurately nationwide (43% “very confident” vs 20%).

 

Much more work needs to be done on this issue, but a couple of tentative conclusions seem in order.  The first is that Donald Trump’s complaints about a “rigged” electoral system most clearly reminded his strongest supporters of what they already believed.  It is much less clear that Republicans who were not already convinced of the corruption of the election system have now had a change of heart.

 

The second conclusion is that Trump’s charges appear to have counter-mobilized Democratic opinion in novel ways.  Democrats have come to the defense of vote counting, not only in their own back yards, but even in other people’s back yards.

 

Either way, summary judgements about the legitimacy of the electoral process have become more polarized in 2016 than they were in 2012.  One possibility is that as time progresses, support for the electoral process as a whole will become associated with the Democratic Party in the public’s mind, with opposition associated with the Republican Party.  I am hoping that this is not the case, because we have seen important bipartisan improvements in the world of election administration over the past four years, despite continued partisan differences over voter ID laws and amending the Voting Rights Act.

 

We certainly need to be concerned about undermining the legitimacy of elected officials, especially in circumstances where there is no hard evidence of election rigging going on.  But, we also need to recall that once the November election is done and gone, elections will continue to be administered at the state and local levels.  The danger for election administration with all this unsubstantiated talk about fraud is that it will undermine the comity that has often existed in handling the day-to-day details of running elections.  In other words, the failure to institute improvements to local election administration will become collateral damage of this heightened polarization.

 

 

 

How secure are state voter registration databases?

Over the past few months, there’ve been a number of reports that state voter registration databases have come under cyberattack. Most recently, FBI Director James Comey discussed the attacks in a hearing of the U.S. House Judiciary Committee. While details have been few, between what’s been reported by the FBI recently, and the various attacks on the email accounts of political parties and political leaders in the U.S., it’s seems clear that the U.S. election infrastructure is being probed for vulnerabilities.

So exactly how secure are state voter registration databases? I’ve been asked about this a number of times recently, by the media, colleagues, and students.

The potential threats to state voter registration databases have been known for a long time. In fact, I wrote a paper on this topic in October 2005 — that’s not a typo, October 2005. The paper, “Potential Threats to Statewide Voter Registration Systems”, is available as a Caltech/MIT Voting Technology Project Working Paper. It’s also part of a collection of working papers in a NIST report, “Developing an Analysis of Threats to Voting Systems: Preliminary Workshop Summary.”

The context for my 2005 paper was that states were then rushing to implement their new computerized statewide voter registries, as required after the passage of the Help America Vote Act. At the time, a number of researchers (myself included) were concerned that in the rush to develop and implement these databases, and that important questions about their security and integrity needed to be addressed. So the paper was meant to provide some guidance about the potential security and integrity problems, in the hopes that they would be better studied and addressed in the near future.

The four primary types of threats that I wrote about regarded:

  • Authenticity of the registration file: attacks on the transmission path of voter registration data from local election officials to the state database, or attacks on the transmission path of data between the state registry to other state officials (for example, departments of motor vehicles).
  • Security of personal information in the file: state voter files contain a good deal of personal information, including names, birthdates, addresses, and contact information, which could be quite valuable to attackers.
  • Integrity of the file: the primary data files could be corrupted, either by mistakes which enter the data and are difficult to remove, or by systematic attack.
  • System failure: the files could fail at important moments, either due to problems with their architecture or technology, or if they come under systematic “denial of service” attacks.

By 2010, when I was a member of a National Academies panel, “Improving State Voter Registration Databases”, many of these same concerns were raised by panelists, and by the many election officials and observers of elections who provided input to the panel. It wasn’t clear how much progress had been made by 2010, towards making sure that the state voter registration systems then in place were secure.

Fast-forward to 2016, and very little research has been done on the security and integrity of state voter registration databases; despite the concerns raised in 2005 and 2010, there’s not been a great deal of research focused on the security of these systems, certainly nowhere near the amount of research that has focused on the security of other components of the American election infrastructure, in particular, the security of remote and in-person voting systems. I’d be happy to hear of research that folks have done; I’m aware of only a very few research projects that have looked at state voter registration systems. For example, there’s a paper that I worked on in 2009 with Jeff Jonas, Bill Winkler, and Rebecca Wright, where we matched the voter registration files from Oregon and Washington, in an effort to determine the utility of interstate matching to find duplicates between the states. Another paper that I know of is by Steve Ansolabehere and Eitan Hersh, which looks at the quality of the information in state voter registries. But there’s not been a lot of systematic study of the security of state voter registries; I recommend that researchers (like our Voting Technology Project) direct resources towards studying voter registration systems now, and in the immediate aftermath of the 2016 election.

In addition to calling for more research on the security of state voter registration databases, election officials can take some immediate steps. The obvious step is to take action to make sure that these databases are now as secure as possible, and to determine whether there is any forensic evidence that the files might have been attacked or tampered with recently. A second step is to make sure that the database system will be robust in the face of a systematic denial of service attack. Third, election officials can devise a systematic approach towards providing pre- and post-election audits of their databases, something that I’ve strongly recommended in other work on election administration auditing (with Lonna Atkeson and Thad Hall). If election officials do audit their voter registration databases and processes, those reports should be made available to the public.

Felony Disenfranchisement

I frequently am asked by students, colleagues, and the media, about how many people in the U.S. cannot participate in elections because of felony disenfranchisement laws. Given the patchwork quilt of felony disenfranchisement laws across the states, and a lack of readily available data, it’s often hard to estimate what the rate of felony disenfranchisement might be.

The Sentencing Project has released a report that provides information and data about felony disenfranchisement and the 2016 federal elections in the U.S. Here are their key findings, quoted from their report:

“Our key findings include the following:

– As of 2016, an estimated 6.1 million people are disenfranchised due to a felony conviction, a figure that has escalated dramatically in recent decades as the population under criminal justice supervision has increased. There were an estimated 1.17 million people disenfranchised in 1976, 3.34 million in 1996, and 5.85 million in 2010.

– Approximately 2.5 percent of the total U.S. voting age population – 1 of every 40 adults – is disenfranchised due to a current or previous felony conviction.

– Individuals who have completed their sentences in the twelve states that disenfranchise people post-sentence make up over 50 percent of the entire disenfranchised population, totaling almost 3.1 million people.

– Rates of disenfranchisement vary dramatically by state due to broad variations in voting prohibitions. In six states – Alabama, Florida, Kentucky, Mississippi, Tennessee, and Virginia – more than 7 percent of the adult population is disenfranchised.

– The state of Florida alone accounts for more than a quarter (27 percent) of the disenfranchised population nationally, and its nearly 1.5 million individuals disenfranchised post-sentence account for nearly half (48 percent) of the national total.

– One in 13 African Americans of voting age is disenfranchised, a rate more than four times greater than that of non-African Americans. Over 7.4 percent of the adult African American population is disenfranchised compared to 1.8 percent of the non-African American population.

– African American disenfranchisement rates also vary significantly by state. In four states – Florida (21 percent), Kentucky (26 percent), Tennessee (21 percent), and Virginia (22 percent) – more than one in five African Americans is disenfranchised.”

This looks like a useful resource for those interested in understanding the possible electoral implications of felony disenfranchisement laws across the U.S.

Questions about postal voting

Since the origins of the Caltech/MIT Voting Technology Project in 2000, the VTP has noted a number of concerns about postal voting. Our original report in 2001 noted that postal voting represents clear tradeoffs, with benefits including convenience, but with potential risks, especially regarding the reliability and security of balloting by mail.

Our most recent report reiterated these same concerns, but added another, as there is new research indicating that many of the reductions in residual votes (a key measure of voting system reliability and accuracy) are at risk because of the increase in postal voting. One of these papers studies residual votes in California (“Voting Technology, Vote-by-Mail, and Residual Votes in California, 1990-2010”). The other is a national-level study, “Losing Votes by Mail.” There is an important signal in the residual vote data from recent elections, increased postal voting is associated with increased residual votes.

Now comes word of a new concern about the reliability of postal voting. Upcoming Austrian elections might be postponed due to faulty glue used in the ballot envelopes. This video helps explain the problem.

While we’ve raised questions in the past about the reliability of the mail system for balloting (in particular, noting that there’s always a risk that balloting materials might be delayed or misdirected, especially for overseas and military voters covered by the UOCAVA and MOVE Acts), a basic malfunction of postal voting material is not an issue that we’ve heard much of in the past. But clearly it may be an issue in the future, so researchers will need to keep an eye on what is learned from this Austrian postal ballot problem, how it it resolved, and determine how to prevent problems like these from happening.

California’s massive 2016 voter guide

I’m glad that I recently had a large and sturdy mailbox installed at the end of our driveway. Our previous mailbox was small, rusty, and was starting to lean to one side — had the mail carrier tried to leave California’s massive, 224-page, 2016 general election voter information guide in our old mailbox, I have no doubt it would have immediately toppled over.

The LA Times has a fun video that shows the printing of this super-sized voter information guide:
http://www.latimes.com/la-pol-vn-printing-the-california-voter-information-guide-2-20160909-premiumvideo.html

Don’t get me wrong, I think that it’s great that California voters receive the voter information guide from our Secretary of State (it’s available online in pdf format as well, which might be more easily usable for many voters). The information guide helps remind voters about the upcoming election, it provides useful information about voter rights and resources about registration and voting, and it also gives lots and lots of detailed information about all of the ballot measures that we will have on our ballots in California this fall.

But with seventeen statewide measures on the ballot (this does not include county or local measures), the information guide is a bit intimidating this election season. Californians are being asked to provide their input into a wide range of statewide issues, including fiscal matters like school and revenue bonds, tax extensions and new taxes, the death penalty, and marijuana legalization. These are important issues, and this fall voters will need to take a close look at the voter information guide to get a better understanding of these issues and to figure out how to cast their ballots.

With so many issues on the ballot, and with a lot of important candidate races (a presidential race, the U.S. Senate contest, and lots of competitive congressional and state legislative races), it’s a long ballot. Combine the long ballot with a lot of interest in this election, there’s a good chance we will see strong turnout through the state this fall, which even with widespread voting by mail will likely mean long waits at polling places on election day.

In any case, Californians should be on the lookout for their massive voter information guide in their mailboxes, or take a look at the online version. Just make sure that you have a sturdy mailbox, and don’t drop it on your toes when your copy arrives soon.