The purpose of this course project is to predict the votes of Supreme Court Justices by applying statistical learning techniques to transcripts of the Supreme Court oral arguments. Each transcript contains the petitioner's oral argument and the respondent's oral argument, and as the petitioner and respondent present their cases, the justices ask questions. By mining the text from the Justices' comments and questions, we are able to learn patterns and provide predictions as to how the justices will vote.
The vote classifier relies on the assumption that a Justice's questions and comments to an arguing attorney (called advocates) differ subtly (or overtly) depending on whether the Justice agrees with the advocate's position or not. Thus, when training the classifier on cases with known outcomes, the oral arguments are divided into arguments on behalf of the petitioner and the respondent. Knowing whether a Justice decides in favor of the petitioner or respondent allows us to analyze the Justices' comments and questions and identify correlations between decisions and transcript text.
Some identified correlations are intuitively appealing- for example, if Justice Ginsburg uses the two-word phrase "I don't" during oral arguments, she is less likely to vote in favor of the arguing party. Some less obvious correlations exist also- for example, if Justice Scalia uses the phrase "certainly" during oral arguments, Justice Ginsburg is more likely to vote in favor for the arguing party. The classifier identifies tens of thousands of such correlations, each of which contributes a small piece to an overall prediction. By combining these small contributions, the classifier produces a prediction for each Justice's decision in a case.
It's worth noting that the classifier knows nothing about political affiliation or the political impact of a particular case. These predictions are based solely on what the justices say during oral arguments and how they say it. While we like to think of Supreme Court Justices as masters of impartiality, it is clear that many subtle clues lie in these oral arguments.
Below are predictions for some high profile cases that have been argued in front of the Supreme Court but have yet to receive decisions.
We consider the one thousand most influential words and phrases for predicting each Justice's decision, and we determine the fraction of these most influential tokens that originates from each of the nine Justices. Below are pie charts indicating these fractions, and the values displayed in the pie wedge labels indicate the average weight of these influential tokens. For example, approximately 25% of the most influential tokens for predicting Justice Ginsburg's decision comes from Justice Ginsburg herself, and these tokens contribute an average weight of -15.6 to the classification decision. The fact that this weight is negative indicates that these tokens, on average, contribute to a less favorable outcome for the arguing party, and the relatively high weight magnitude of 15.6 indicates that these are fairly predictive tokens.
It is interesting to observe that Justice Kennedy is the only Justice who's own share of the most influential words and phrases contributes positively to a favorable outcome for the arguing party.
Each transcript is divided into arguments on behalf of the petitioner and arguments on behalf of the respondent. Each Justice's comments and questions are tokenized using a separate token for each Justice. We limit tokens to the 30,000 most frequent unigram tokens and about 50,000 bigram tokens with the highest mutual information, resulting in approximately 80,000 tokens per Justice. The features are considered to be Bernoulli variables, thus we record the presence or lack thereof of each token for each arguing party in each case.
We construct a classifier for each of the current nine Justices using feature data from all nine Justices. The Justices' comments and questions for each arguing party in each case is considered to be a single document that belongs to either the winning category or the losing category. We then use logistic regression to create a classifier for each Justice. For test cases in which we wish to predict a decision, we can apply the classifier to both the petitioner's argument and the respondent's argument. Ideally, one argument will produce a positive value and one will produce a negative value. Regardless, we can take the difference between the petitioner's value and the respondent's value to create a decision prediction. If the difference is positive, the Justice is likely to vote in favor of the petitioner, while a negative value indicates a decision favorable to the respondent. The magnitude of the difference is an indication of the confidence in the prediction.
Below are predictions for all cases that are/were on the 2012 docket with transcripts available at oyez.org. Some cases have had rulings, others have not. Justices Kagan and Sotomayor tend to be the least accurate predictions- this is due to the relatively little amount of time these Justices have spent on the bench, providing fewer data with which to train the classifier. Each case's scores are rescaled for clarity, so scores should not be compared across cases.
In fairness, our classifier would have predicted different outcomes for the healthcare ruling: