Resources / Study / Innovation for Court ADR

Just Court ADR

The blog of Resolution Systems Institute

Archive for the ‘Artificial Intelligence (AI)’ Category

How Should AI Be Used in Criminal Justice?

Stephen Sullivan, February 25th, 2026

A pair of policy briefs by the Center for Justice Innovation offers recommendations for when AI could and should not be used in the criminal legal space. They call for leaders in the field to deploy AI responsibly — to foreground values and commit to mitigating harm. Publication of the briefs preceded the Center’s announcement of the AI and Justice Consortium, which they describe as convening justice practitioners, researchers, tech companies and communities to discuss and develop AI infrastructure in criminal justice. The reports’ focus on prioritizing values and assessing risk are germane to court ADR.

Leading with Values and Lessons Learned

Photo by Markus Winkler via Pexels

The first brief urges criminal justice leaders to learn from previous efforts to deploy algorithmic-based technologies. The authors offer three main recommendations: 1) Prioritize values over technology; 2) Stay active, curious and informed about technology; and 3) Resist the siren song of efficiency long enough to weigh unintended consequences.

Risk assessment systems and electronic monitoring are two examples to learn from. The authors note that early legal adopters positioned these technologies as “evidence-based” and “neutral” tools that could effectively predict recidivism. However, the field did not sufficiently establish a values framework to evaluate and constrain the use of these technologies prior to deployment — ignoring those who warned of risk scores injecting inherent biases. The warnings were prescient. These technologies perpetuated race- and class-based inequities and harms present in the criminal justice system; efforts to address these issues came too late and are still catching up.

The authors thus call for a front-end commitment to values such as transparency and the well-being of communities. Through a sustained effort to understand why and how predictive technologies such as AI can and will fail, practitioners can make informed, proactive decisions to mitigate harm.

We need to proceed with caution, the authors contend, to determine what purposes AI tools might serve, for whom and for what intended outcomes. The report outlines potentially positive uses of AI tools, such as surfacing racial biases in charging and sentencing patterns and expediting reviews of applicant case files to potentially get people out of prison sooner. The authors describe these uses as not bearing direct negative impact on people’s liberties.

Establishing Constraints and Safeguards  

The second brief discusses the practical application of the first brief’s values recommendations; it includes recommendations to constrain the use of AI in high-risk scenarios and establish safeguards for its use in low-risk scenarios. This set of recommendations synthesizes the insights of criminal legal and technology leaders who attended a working session at the Center. Most notably, the authors call for a moratorium on AI use in high-stakes contexts to “allow for a thorough assessment of AI’s impact on liberty and safety, and for a proper consideration of whether it should be deployed at all in certain higher-stakes contexts.”

The authors distinguish high-risk scenarios, in which AI should not be used, from low-risk scenarios, in which AI tools could be used with human oversight and decision-making. High-risk scenarios involve the potential for significant harm: spaces where people are intensely vulnerable (such as jails and prisons) and decisions such as detention versus release. In contrast, examples of lower-risk scenarios in which AI tools are potentially better suited include supporting case managers to disseminate community resources and housing services, summarizing case notes, or analyzing case patterns to match services to client needs. They also identify an opportunity for court staff and researchers to use AI to identify disparities in court policies and programs.

The authors call for mandatory comprehensive evaluations before any deployment of AI in justice settings. Conducting trial runs with representative, real-world data sets is the minimum needed to anticipate real-world consequences and identify biases. Lastly, the authors call for a greater push for standards to “put the brakes on hasty experimentation” and safeguard AI implementation in criminal justice. Taken together, the reports urge criminal justice leaders to carefully consider what purposes they use AI for, when to draw firm lines to mitigate harm, and how to be guided by the needs and values of people in the justice system.

The value of these briefs to the ADR community is the emphasis on the risks that seemingly neutral AI uses might entail and the call for real-world evaluation of potential consequences and biases prior to deployment. To ensure that AI is adopted in ADR responsibly, we need to check that guardrails are in place and that the risks to parties are not ignored.

How Well Can an AI Facilitator Recognize Emotions During Dispute Resolution?

Jennifer Shack, December 1st, 2025

Recent research suggests that large-language models (LLMs) acting as facilitators in text-based dispute resolution can be trained to accurately identify human emotions and to intervene to change the trajectory of a dispute when the emotions might otherwise lead to an impasse.

The authors[1] of the August 2025 paper “Emotionally-Aware Agents for Dispute Resolution” recruited students to act as disputants regarding the sale of a basketball jersey. The ultimate dataset included 2,025 disputes, with an average 10.7 messages per dispute.

To allow for comparison with prior research, the researchers initially categorized emotions as others had done to assess LLM capacity to identify emotions in negotiations.[2] The emotions tracked were joy, sadness, fear, love, anger and surprise. The study also used a self-measure frustration scale as an indication of “ground truth” to be used as a benchmark for comparison with the LLMs’ identification of emotions. The dispute participants assessed their level of frustration during the dispute exchange, as well as their perception of the other party’s level of frustration.

To set a baseline against prior emotion models, the researchers first ran the disputants’ text exchanges through T5-Twitter, a large fine-tuned model adapted for recognizing emotions.[3] They found that T5-Twitter (T5) failed to recognize anger in conversations that participants had reported as frustrating. The researchers hypothesized that this was because T5 was classifying each dialogue turn in isolation, rather than within the context of the entire interaction. Although they had adopted the emotions used for negotiation research, the researchers also noted that those emotions were more relevant to negotiation than to dispute resolution, which concerned them.

Testing Other LLMs

The next phase of the study was to test the researchers’ hypothesis that general LLMs could better identify emotions than T5 had. The researchers prompted a variety of LLMs to analyze the same dialogues, using different prompting strategies, but with slightly different emotions. They changed the emotion “love” to “compassion” and added a “neutral” category so the LLMs were not forced to choose an emotion when none was apparent. They also prompted the LLMs to consider each dialogue turn within the context of previous turns. Finally, they helped the LLMs to learn within context by including in the prompt several sample dialogue turns with hand-annotated emotions.

Again comparing self-reported frustration with each LLM’s classification of emotions, the researchers found that GPT-4o outperformed T5 (as well as other LLMs). T5 skewed toward annotating utterances as joy or anger, while GPT-4o was more diverse in its assessments and used “neutral” as a dampener by not assigning emotions to unemotional statements. GPT-4o also recognized compassion where T5 did not recognize love.

The researchers then used multiple linear regression[4] to predict participants’ subjective feelings about the result of the dispute resolution effort (as measured by the Subjective Value Inventory) based upon the emotions that T5 and GPT-4o assigned to each dialogue turn. They found that GPT-4o provided the biggest improvement in predicting participants’ feelings about the result, even when accounting for changes in prompts to T5. They also found that buyers were more straightforward to predict than sellers.

Preventing Impasse

The researchers then examined whether GPT-4o could determine when to intervene to de-escalate anger before it escalates into impasse. This would require GPT-4o to identify a pattern of escalation. GPT-4o’s automatic identification of emotion showed that when sellers respond to buyers’ anger with anger in these dialogues, the anger spirals, and impasse results. They found something similar with compassion. When sellers began with compassion, buyers responded with compassion, and the dialogue more often resulted in agreement.

In sum, the researchers demonstrated that properly prompted LLMs with in-context learning can accurately assign emotions to text. Additionally, they found that they could predict subjective dispute outcomes from emotional expressions alone, without knowing the actual content of a conversation.

Researchers can use GPT-4o emotion assignment to reveal how emotions can shape disputes over time: Anger spirals, but so does compassion when it comes early in the dispute. This indicates that LLMs can be trained to know when to intervene in order to change the trajectory of a dispute. Future work will look at how they can do this.


[1] The authors are Sushrita Rakshit, James Hale, Kushal Chawla, Jeanne M. Brett and Jonathan Gratch.

[2] They characterize negotiations as a coming together to create a new relationship (e.g., car salesman and customer), while disputes involve an existing relationship that has gone badly.

[3] I’m extrapolating here, based on the context and what I could find about fine-tuned LLMs.

[4] Multiple linear regression uses several independent variables to predict a specific outcome.

Verified by ExactMetrics