Could Machines Do Criminal Justice Better?

-by Hon. Katherine B. Forrest (fmr.), Partner at Cravath, Swaine & Moore

What if machines, really smart machines, could do criminal justice better? That is, apply algorithms that really eliminate unwanted sentencing disparities and ensure that defendants with similar profiles are truly treated similarly; that ignore race when deciding guilt or innocence; that accurately assess a likelihood of recidivism; that accurately assess a propensity for violence; that accurately assess whether incarceration does more harm than good; that make decisions based on scientific evaluations of evidence – eliminating gamesmanship and obfuscation. Machines do not have a race, age, gender, economic position – they do not have innate bias (unless programmed in). Can they achieve what sensitivity training has not been able to: eliminate implicit biases that some argue permeate our criminal justice system?

Smart machines are doing some of this already – and can certainly do more in the near future, if we want them to. It is time we ask ourselves whether we want them to and if they are any good at it. As artificially intelligent machines are used with more frequency in courthouses across the country, we should be asking ourselves whether they are part of a solution to criminal justice reform, or a way of entrenching existing problems. What if smart machines used in the criminal justice area end up making things worse?

Artificial intelligence is a non-nature based intelligence resident in software designed to learn, make rational decisions and solve problems. Unlike high-tech software that simply executes on programmed instructions, AI uses initial instructions as a starting point and exceeds them, learning and applying what it learns along the way. As I use the term here, “intelligence” means being able to acquire knowledge and apply it to solve a problem.

Narrow AI—AI deployed to fulfill a single purpose—is all around us today. Numerous applications on our cell phones use artificial intelligence to bring us functionality we take for granted: facial recognition software that unlocks the device, apps that finish our sentences with words we might want to use based upon what we have said before and our relationship to the recipient, apps that pick music for us based on what we like to listen to, our age, what we have liked on Instagram or what we have purchased, and apps that predict what goods or services we might want or need. Narrow AI machines are sold as robots who vacuum our floors, devices that regulate the temperature of our homes based on usage patterns and weather events, disks that sit on counter tops and can be instructed to turn our lights on or off, make a restaurant reservation, a dental appointment, order a car service, and find our favorite playlist.

In the criminal justice area, AI is used in all phases of a matter: as investigatory tools to determine whether a suspect was present in a particular location from grainy camera footage using facial recognition software; to isolate and identify a voice on recordings; to analyze documents, recordings and other materials; to create a matrix of social connections between people that may assist in identifying witnesses or targets, or trace a flow of funds in a money laundering investigation; or to analyze behavior and flag instances when someone may be engaged in shoplifting or other unlawful conduct.

AI is also used in the pre-trial phase to “score” an arrestee and predict whether he or she will comply with conditions of pre-trial release; it is used during the trial to identify documents and materials; or during the sentencing phase to predict a likelihood of violence or recidivism. A significant portion of what we call “AI” is based on algorithms that work with large data sets. Algorithms, however, are themselves based on very human choices as to what inputs to include or not include, and data sets are only as good as the underlying data. An example of how this can be problematic is readily apparent in AI used to predict whether a defendant will recidivate or not. The algorithm may have inputs such as age, gender, race, education, history of violence, drug use, socio-economic status, number of prior arrests, and gang affiliation. It can also include parental characteristics such as their history of arrests and incarceration.

Who is making the choices about what inputs to select? The answer is: whomever the company that designs and licenses the software chooses. There are no standards or rules around who makes the input choices. Prior to the AI software being utilized in a court system, there is no application of national or regional standards that require the chooser of the inputs to have a degree in psychology, criminology, sociology, or anything else. Each licensor can choose to vet the software however it deems appropriate. This means that we do not have easy insight into the choice of inputs or whether those inputs, versus any others, that might be utilized have any scientifically demonstrated correlation (let alone causal impact) on recidivism. It could be, in other words, that the inputs are plucked out of the air by a youthful computer programmer with a B.A. in computer science.

But at least as important as the selection of inputs is the weighting of inputs: how much weight is age given to an arrest history? Race? Education? Socio-demographic factors? Why is one weight selected versus another? So is race weighted as “10%” relevant, 1%, 55%, 99%? Who chose that weighting and why? Are there variations in weightings by gender and should there be? Again, there is no national or regional standard required. Once the algorithm is designed, it has to run against a data set. There is no uber-data set that is conveniently designed to have just the right data in it – a data set that we would all be comfortable relying on. Algorithms do not yet invent the data that they evaluate and weigh in connection with their output. Therefore, the choice of data against which an algorithm is run is critically important. Today, data sets are historical. Many would argue that they will always be historical – while some would caution that there may be a time when even data sets are created.

The limitations of historical data sets used in connection with AI in the criminal justice area are real. An easy example is the use of historical arrest records as a primary data set against which to run algorithms seeking to predict recidivism. Many interested in criminal justice reform point to a pattern of arrests in communities that reflects embedded racial biases. If one accepts some element of racial bias in historical arrest patterns, then use of an historical data set reflective of disproportionate arrests may simply perpetuate biases. Moreover, other factors may double down on that bias – and further entrench it. For instance, to the extent race in a particular community is already correlated with education and economic factors (that is, an over-arrest of black or Hispanic men, who are from a poor community with lower than average high school graduation rates), then using race, education and economic factors as separate inputs may result in modelling error.

The term “data sets” also obscures the scope of the data used. A “data set” might be a single community, it might be only one region of the country, it might be derived from a specific area that had a specific and time-limited issue that was later found to be aberrational. In short, data sets only tell us what happened somewhere at a particular moment in time. There is a legitimate question as to whether a particular data set can tell us an answer to a question relevant to a person from an entirely different area of the country at a different moment in time.

In State v. Loomis, the trial court utilized an AI software package to predict a defendant’s likelihood of recidivism. The data set that the algorithm was run against is not discussed in the case. The algorithm was being used as a risk assessment tool. The defendant sought insight into the underlying software – to the inputs and weightings. The court’s denial of the application was challenged as a violation of due process. The highest court in Wisconsin affirmed the below ruling, relying on the fact that the court made the ultimate decision on sentencing, and had allowed defendant access to a pamphlet that described the inputs and non-specific descriptions of testing. Many courts use similar software for similar purposes today. Defendants will mount increasingly sophisticated challenges to the algorithms – probing who has chosen some inputs and not others, what the basis for those choices was, how the weightings are picked and whether regional, gender, age or other factors should affect how weightings are adjusted for a particular defendant. There are serious issues lurking in the answers to these questions.

The utilization of AI to assist courts in determining a defendant’s risk raises legitimate questions about who designed the algorithm and who determined the data set it was using to learn. Certain uses of AI in the criminal justice area also require us to consider how we can or should program software to adjust for changing notions of what constitutes a crime. Back to our example of historical data sets, if the predictive software is used in Colorado or Massachusetts, it might be important to consider whether a previous arrest for marijuana possession should be considered at all, or should be considered as having the same import today as it would have five years ago. Some judges might view an arrest for conduct no longer constituting a crime as nonetheless a violation of the social compact in existence at the time of the act. In other words, that a person’s anti-social behavior is judged according to laws in effect at the time of arrest and not later de-criminalization. However, again, whether we view this as a question that has a normative answer or one that may be answered by the company that designs the software, is something we should be addressing.
At the start, and before we have reached the point when we no longer control how AI is designed or how it learns, AI is as good as its human progenitors will allow it to be. Nevertheless, it can also be as limited as its progenitors make it.