Tweeting the Election

We are pleased to launch this new website, www.tweetingtheelection.com, which provides live and real-time summary analytics of the conversation currently occurring on Twitter, about election administration and voting technology, as Americans go to vote in the 2016 Presidential election. This website was developed at Caltech, by Emily Mazo, Shihan Su, and Kate Lewis, as their project for CS 101, and in collaboration with myself and the Caltech/MIT Voting Technology Project.

This website offers two views of some of the discussion about the election that is now occurring on Twitter. These visualizations compare the discussion occurring amongst people with different political ideologies: we have separated the tweets by the predicted ideology of the author.

The first view is geographic, showing the sentiment of incoming tweets by state. In the geographic view, which shows the average sentiment in every state over the past six hours and is updated every few minutes, dots on the map display the most recent tweets, and you can hover over those dots to view the content of the tweet.

At the top of the website, clicking over to the timeline view, you can see the sentiment of the recent incoming tweets by hour, for the past twelve hours.

In each view, we offer data on four different election administration and voting technology issues: tweets about absentee and early voting, polling places, election day voting, and voter identification. We collect these tweets by filtering results from the Public Streaming Twitter API, by keyword, for tweets falling into these categories.

Furthermore, we classify each incoming tweet in two ways. First, we do a sentiment analysis on the incoming tweet, to classify it as positive (given as green in both views) or negative (given as red in both views). We also classify the Twitter users as being more likely to be liberal or conservative in their political orientation, so that viewers can get a sense for how discussion on Twitter of these election administration and voting technology issues breaks by ideology.

We continue to work to improve our analytical tools, and the presentation of this data on this website. We welcome comments, please send them to tweetingtheelection@gmail.com.

Finally, this website is currently best viewed in fullscreen mode, on a larger laptop or computer screen. Mobile optimization, and viewing on smaller sized screens, is for future development.

Some Background

This is part of a larger project at Caltech, which began in the fall of 2014, where we began collecting tweets like these for studying election administration. The background of the project, as well as some initial analyses at validation of these data, is contained in the working paper, Adams-Cohen et al. (2016), “Election Monitoring Using Twitter” (forthcoming).

1. How we collect these tweets
The Tweet data used in this visualization is collected from Twitter’s Public Streaming API. The stream is filtered on keywords for the four different topics of interest:

Election Day Voting: Provisional ballot, Voting machine, Ballot.
Polling Places: Polling place line, precinct line, Pollworker, Poll worker.
Absentee Voting: Absentee ballot, mail ballot, vote by mail, voting by mail, early voting.
Voter ID: Voter identification, Voting identification, Voter ID.

2. Sentiment analysis
The Tweets collected from this stream were then fed into two classification models. The first model classified them into positive or negative classes based on their text. This model was created with crowdsourced data; about 5000 Tweets from a previous collection of Tweets collected in the same manner as those in the visualization were labeled with sentiment (valence) on a positive-to-negative scale, by at least three crowd workers, and then averaged to create a standard label for that Tweet. This training sets of Tweets and labels was then used to create a term frequency-inverse document frequency vector for the the words in each Tweet in the set. These vectors were used to train a decision tree regression model to predict the value for the sentiment in future Tweets (a high positive predicted value indicating higher positive sentiment, and a more negative value indicating a more negative sentiment).

The Tweets that appear on tweetingtheelection were streamed from the Twitter API, stripped of stop words, hashtags, URLs, and any other media, then processed to create Tf-Idf vectors to represent each Tweet using the same vocabulary as the original model. These vectors were then passed through the decision tree regression model, which predicted sentiment labels for them.

3. Ideology classification
This model classify tweets into republican or liberal based on their text.
Training data
Training data for this model is obtained through two process.
First, we obtain ideal point estimation for twitter users from Pablo Barbera’s work <“Birds of the Same Feather Tweet Together. Bayesian Ideal Point Estimation Using Twitter Data”>. In his work, Barbera develop a latent space model to estimate user’s ideal point by studying the following links between users.

Second, we match the user id obtained from Barbera with the tweets collected by R. Michael Alvarez’s lab from 2016-04-19 to 2016-06-23. In order words, we label tweets using the label of the user who create the tweets. We matched around 55,000 tweets and use these as our training data.

4. Classifier
The classifier we use is a convolutional neural network.
The model originally developed by Yoon Kim in his work . We adopt Denny Britz’s Implementation of this model in tensor flow.

The model we adopted has four layers. The first layer embeds words into 128 dimensional vectors, which are learned within the neural network. The next convolution layers use three filter sizes(3, 4, 5), i.e. sliding over 3, 4, 5 words at a time. Then applies 128 filters to each of the three filter size. Next, the model max-pool the result of convolutional layer into a 128 feature vector, add dropout regularization. Then the final softmax layer classify the result.

5. Geocoding
The incoming tweets are geocoded. We use the coordinates of where the tweet was sent from, if they are provided (if the user opted into geocoding). If that information is unavailable, we use the location of the user from their profile (if provided).

Thanks

There are many people to thank. First off, www.tweetingtheelection.com has created by Emily Mazo, Shihan Su, and Kate Lewis, as their project for CS 101 at Caltech. That class is taught by Yisong Yue and Omer Tamuz, professors at Caltech.

Earlier work done on this project, in particular the ongoing collection of Twitter data like these since 2014 at Caltech, has been done in collaboration with Nailen Matschke (who developed the original python-based Twitter collection tool, as well as the MySQL database where all of the data collected so far is stored); with Clare Hao and Cherie Jia, who worked in the summer of 2016 as SURF students, developing some preliminary python tools to analyze the Twitter data; and Nick Adams-Cohen, a PhD student in the social sciences at Caltech, who is studying the use of Twitter data for public opinion and political behavior research.

We’d like to thank Pablo Barbera for sharing his ideological placement data with us.

Other colleagues and students who have helped with this research project include Thad Hall, Jonathan Nagler, Lucas Nunez, Betsy Sinclair, and Charles Stewart III.