Predicting Supreme Court Outcomes Using AI ?
Is it possible to predict the outcomes of legal cases – such as Supreme Court decisions – using Artificial Intelligence (AI)? I recently had the opportunity to consider this point at a talk that I gave entitled “Machine Learning Within Law” at Stanford.At that talk, I discussed a very interesting new paper entitled “Predicting the Behavior of the Supreme Court of the United States” by Prof. Dan Katz (Mich. State Law), Data Scientist Michael Bommarito, and Prof. Josh Blackman (South Texas Law).Katz, Bommarito, and Blackman used machine-learning AI techniques to build a computer model capable of predicting the outcomes of arbitrary Supreme Court cases with an accuracy of about 70% – a strong result. This post will discuss their approach and why it was an improvement over prior research in this area.
Quantitative Legal Prediction
The general idea behind such approaches is to use computer-based analysis of existing data (e.g. data on past Supreme Court cases) in order to predict the outcome of future legal events (e.g. pending cases). The approach to using data to inform legal predictions (as opposed to pure lawyerly analysis) has been largely championed by Prof. Katz – something that he has dubbed “Quantitative Legal Prediction” in recent work.Legal prediction is an important function that attorneys perform for clients. Attorneys predict all sorts of things, ranging from the likely outcome of pending cases, risk of liability, and estimates about damages, to the importance of various laws and facts to legal decision-makers. Attorneys use a mix of legal training, problem-solving, analysis, experience, analogical reasoning, common sense, intuition and other higher order cognitive skills to engage in sophisticated, informed assessments of likely outcomes.By contrast, the quantitative approach takes a different tack: using analysis of data employing advanced algorithms to result in data-driven predictions of legal outcomes (instead of, or in addition to traditional legal analysis). These data-driven predictions can provide additional information to support attorney analysis.
Predictive Analytics: Finding Useful Patterns in Data
Outside of law, predictive analytics has widely applied to produce automated, predictions in multiple contexts. Real world examples of predictive analytics include: the automated product recommendations made by Amazon.com, movie recommendations made by Netflix, and the search terms automatically suggested by Google.
Scanning Data for Patterns that Are Predictive of Future Outcomes
In general, predictive analytics approaches use advanced computer algorithms to scan large amounts of data to detect patterns. These patterns can be often used to make intelligent, useful predictions about never-before-seen future data. Many of these approaches employ “Machine Learning” techniques to engage in prediction. I have written about some of the ways that machine-learning based analytical approaches are starting to be used within law and the legal system.Broadly speaking, machine-learning refers to a research area studying computer systems that are able improve their performance on some task over time with experience. Such algorithms are specifically designed to detect patterns in data that can be highlight non-obvious relationships or that can be predictive of future outcomes (such as detecting Netflix users who like movie X, tend also to like movie Y and concluding you like movie X, so you’re likely to like movie Y.)Importantly these algorithms are designed to “learn” – in the sense that they can change their own behavior to get better at some task – like predicting movie preferences – over time by detecting new, useful patterns within additional data. Thus, the general idea behind predictive legal analytics is to examine data concerning past legal cases and use machine learning algorithms to detect and learn patterns that could be predictive of future case outcomes.In such a machine learning approach — called supervised learning – we “train” the algorithm by providing it with examples of past data that is has been definitively classified. For example, there may be a body of existing data about Supreme Court cases along with confirmed data indicating whether the outcome was affirm or reverse, along with other potentially predictive data, such as lower circuit, and subject matter at issue. Such an algorithm examines this training data to detect patterns and statistical correlations between variables and outcomes (e.g. 9th Circuit cases more likely to be reversed) and build a computer model that will be predictive of future outcomes.It is helpful to briefly review some earlier research in using data analytics to engage prediction of Supreme Court outcomes to understand the contribution of Katz, Bommarito, and Blackman’s paper.
Prior Work in Analytical Supreme Court Prediction
Pioneering work in the area of quantitative legal prediction began in 2004 with a seminal project by Prof. Ted Ruger (U Penn), Andrew D. Martin (now dean at U Michigan) and other collaborators, employing statistical methods to predict Supreme Court outcomes. That project pitted experts in legal prediction – law professors and attorneys – against a statistical model that had analyzed data about hundreds of past Supreme Court cases.Somewhat surprisingly the computer model significantly outperformed the experts in predictive ability. The computer model correctly forecasted 75% of Supreme Court outcomes, while the experts only had a 59% success rate in predicting Supreme Court affirm or reversal decisions. (The computer and the experts performed roughly the same in predicting the votes of individual justices – as opposed to the ultimate outcome – with the computer getting 66.7 % correct predictions vs. the experts 67.9%).
Improvements by Katz, Bommarito, and Blackman (2014)
The work by Ruger, Martin et. al – while pioneering – left some room for improvement. One aspect was that their predictive model – while highly predictive of the relatively short time frame examined (the October 2002 term) – was thought not to be broadly generalizable to predicting arbitrary Supreme Court cases across any timespan. A primary reason was that the period of Supreme Court cases that they examined to build their models – roughly 1994 – 2000 – involved an unusually stable court. Notably, this period exhibited no change in personnel (i.e. justices leaving the court and new justices being appointed).A model that was “trained” on data from an unusually stable period of the Supreme Court, and tested on a short case-load of relatively non-fluctuation might not perform as accurately when applied to a broader or less homogenous examination period, or might not handle changes in court composition in a robust manner.Ideally, we would any such computer predictive model to be flexible enough, and generalizable enough to handle significant changes in personnel and still be able to produce accurate predictions. Additionally, such a model should be general enough to predict case outcomes with a relatively consistent level of accuracy regardless of the term or period of years examined.
Katz, Bommarito, and Blackman: Machine Learning And Random Forests
While building upon Ruger et al’s pioneering work. Katz, Bommarito, and Blackman improve upon it by employing a relatively new machine learning approach known as “Random Forests.” Without getting into the details, it is important to note that Random Forest approaches have been shown to be quite robust and generalizable as compared to other modeling approaches in contexts such as this. The authors applied this algorithmic approach to examine data about past Supreme Court cases found in the Supreme Court Database. In addition to outcome (e.g. affirmed, reverse), this database contains hundreds of variables about nearly every Supreme Court decision of the past 60 years.Recall that machine learning approaches often working by providing an algorithm with existing data (such as data concerning past Supreme Court case outcomes and potentially predictive variables such as lower-circuit) in order to “train” it. The algorithms looks for patterns and builds an internal computer model that can hopefully be used to provide prediction is future, never-before-seen data – such as pending Supreme Court case.Katz, Bommarito, and Blackman did this and produced a new robust machine-learning based computer model that correctly forecasted ~ 70% of Supreme Court affirm / reverse decisions.This was actually a significant improvement over prior work. Although Ruger’s et. al’s model had a a 75% prediction rate on the period it was analyzed against, Katz et. al’s model was a much more robust, generalizable model.The new model is able to withstand changes in Supreme Court composition and still produce accurate results even when applied across widely variable supreme court terms, with varying levels of case predictability. In other words, it is unlikely that the Ruger model – focused only on one term 2002 – would produce a 75% rate across a 50 year range of Supreme Court jurisprudence. By contrast, the computer model produced by Katz et. model consistently delivered a 70% prediction rate across nearly 8,000 cases across 50+ years.
Conclusion: Prediction in Law Going Forward
Katz, Bommarito, and Blackman’s paper is an important contribution. In the not too distant future, such data-driven approaches to engaging in legal prediction are likely to become more common within law. Outside of law, data analytics and machine-learning have been transforming industries ranging from medicine to finance, and it is unlikely that law will remain as comparatively untouched by such sweeping changes as it remains today.In future posts I will discuss machine learning within law more generally, and principles for understanding what such AI techniques ca, and cannot do within law given the state of current technology, and some implications of these technological changes.
Patent Law’s Definiteness Requirement Has New Bite
The Supreme Court may have shaken up patent law quite a bit with its recent opinion in the Nautilus v. Biosig case (June 2, 2014).
At issue was patent law’s “definiteness” requirement, which is related to patent boundaries. As I (and others) have argued, uncertainty about patent boundaries (due to vague, broad and ambiguous claim language), and lack of notice as to the bounds of patent rights, is a major problem in patent law.
I will briefly explain patent law’s definiteness requirement, and then how the Supreme Court’s new definiteness standard may prove to be a significant change in patent law. In short – many patent claims – particularly those with vague or ambiguous language – may now be vulnerable to invalidity attacks following the Supreme Court’s new standard.
Patent Claims: Words Describing Inventions
In order to understand “definiteness”, it’s important to start with some patent law basics. Patent law gives the patent holder exclusive rights over inventions – the right to prevent others from making, selling, or using a patented invention. How do we know what inventions are covered by a particular patent? They are described in the patent claims.
Notably, patent claims describe the inventions that they cover using (primarily) words.
For instance, in the Supreme Court case at issue, the patent holder – Biosig – patented an invention – a heart-rate monitor. Their patent used the following claim language to delineate their invention :
I claim a heart rate monitor for use in association with exercise apparatus comprising…
a live electrode
and a first common electrode mounted on said first half
In spaced relationship with each other…”
So basically, the invention claimed was the kind of heart rate monitor that you might find on a treadmill. The portion of the claim above described one part of the overall invention – two electrodes separated by some amount of space. Presumably the exercising person holds on to these electrodes as she exercises, and the device reads the heart rate.
( Note: only a small part of the patent claim is shown – the actual claim is much longer)
Patent Infringement: Comparing Words to Physical Products
So what is the relationship between the words of a patent claim and patent infringement?
In a typical patent infringement lawsuit, the patent holder alleges that the defendant is making or selling some product or process (here a product) that is covered by the language of a patent claim (the “accused product”). To determine literal patent infringement, we compare the words of the patent claim to the defendant’s product, to see if the defendant’s product corresponds to what is delineated in the plaintiff’s patent claims.
For instance, in this case, Biosig alleged that Nautilus was selling a competing, infringing heart-rate monitor. Literal patent infringement would be determined by comparing the words of Biosig’s patent claim (e.g. “a heart rate monitor with a live electrode…”) to a physical object – the competing heart-rate monitor product that Nautilus was selling (e.g. does Nautilus’ heart rate monitor have a part that can be considered a “live electrode”)?
Literal patent infringement is determined by systematically marching through each element (or described part) in Biosig’s patent claim, and comparing it to Nautilus’s competing product. If Nautilus’ competing product has every one of the “elements” (or parts) listed in Biosig’s patent claim, then Nautilus’s product would literally infringe Biosig’s patent claim.
If patent infringement is found, a patent holder can receive damages or in some cases, use the power of the court to prevent the competitor from selling the product through an injunction.
Patent Claims – A Delicate Balance with Words
Writing patent claims involves a delicate balance. On the one hand, a patent claim must be written in broad enough language that such a patent claim will cover competitors’ future products.
Why? Well, imagine that Biosig had written their patent claim narrowly. This would mean that in place of the broad language actually used (e.g. “electrodes in a spaced relationship”), Biosig had instead described the particular characteristics of the heart-rate monitor product that Biosig sold. For instance, if Biosig’s heart-rate monitor product had two electrodes that were located exactly 4 inches apart, Biosig could have written their patent claim with language saying, “We claim a heart rate monitor with two electrodes exactly 4 inches apart” rather than the general language they actually used, the two electrodes separated by a “spaced relationship”
However, had Biosig written such a narrow patent, it might not be commercially valuable. Competing makers of heart rate monitors such as Nautilus could easily change their products to “invent around” the claim so as not to infringe. A competitor might be able to avoid literally infringing by creating a heart-rate monitor with electrodes that were 8 inches apart. For literal infringement purposes, a device with electrodes 8 inches apart would not literally infringe a patent that claims electrodes “exactly 4 inches apart.”
From a patent holder’s perspective, it is not ideal to write a patent claim too narrowly, because for a patent to be valuable, it has to be broad enough to cover the future products of your competitors in such a way that they can’t easily “invent around” and avoid infringement. A patent claim is only as valuable (trolls aside) as the products or processes that fall under the patent claim words. If you have a patent, but its claims do not cover any actual products or processes in the world because it is written too narrowly, it will not be commercially valuable.
Thus, general or abstract words (like “spaced relationship”) are often beneficial for patent holders, because they are often linguistically flexible enough to cover more variations of competitors’ future products.
Patent Uncertainty – Bad for Competitors (and the Public)
By contrast, general, broad, or abstract claim words are often not good for competitors (or the public generally). Patent claims delineate the boundaries or “metes-and-bounds” of patent legal rights Other firms would like to know where their competitors’ patent rights begin and end. This is so that they can estimate their risk of patent liability, know when to license, and in some cases, make products that avoid infringing their competitors’ patents.
However, when patent claim words are abstract, or highly uncertain, or have multiple plausible interpretations, firms cannot easily determine where their competitor’s patent rights end, and where they have the freedom to operate. This can create a zone of uncertainty around research and development generally in certain areas of invention, perhaps reducing overall inventive activity for the public.Continue reading