Recent research suggests that large-language models (LLMs) acting as facilitators in text-based dispute resolution can be trained to accurately identify human emotions and to intervene to change the trajectory of a dispute when the emotions might otherwise lead to an impasse.
The authors[1] of the August 2025 paper “Emotionally-Aware Agents for Dispute Resolution” recruited students to act as disputants regarding the sale of a basketball jersey. The ultimate dataset included 2,025 disputes, with an average 10.7 messages per dispute.
To allow for comparison with prior research, the researchers initially categorized emotions as others had done to assess LLM capacity to identify emotions in negotiations.[2] The emotions tracked were joy, sadness, fear, love, anger and surprise. The study also used a self-measure frustration scale as an indication of “ground truth” to be used as a benchmark for comparison with the LLMs’ identification of emotions. The dispute participants assessed their level of frustration during the dispute exchange, as well as their perception of the other party’s level of frustration.
GPT-4o’s automatic identification of emotion showed that when sellers respond to buyers’ anger with anger in these dialogues, the anger spirals, and impasse results. They found something similar with compassion. When sellers began with compassion, buyers responded with compassion, and the dialogue more often resulted in agreement.
To set a baseline against prior emotion models, the researchers first ran the disputants’ text exchanges through T5-Twitter, a large fine-tuned model adapted for recognizing emotions.[3] They found that T5-Twitter (T5) failed to recognize anger in conversations that participants had reported as frustrating. The researchers hypothesized that this was because T5 was classifying each dialogue turn in isolation, rather than within the context of the entire interaction. Although they had adopted the emotions used for negotiation research, the researchers also noted that those emotions were more relevant to negotiation than to dispute resolution, which concerned them.
Testing Other LLMs
The next phase of the study was to test the researchers’ hypothesis that general LLMs could better identify emotions than T5 had. The researchers prompted a variety of LLMs to analyze the same dialogues, using different prompting strategies, but with slightly different emotions. They changed the emotion “love” to “compassion” and added a “neutral” category so the LLMs were not forced to choose an emotion when none was apparent. They also prompted the LLMs to consider each dialogue turn within the context of previous turns. Finally, they helped the LLMs to learn within context by including in the prompt several sample dialogue turns with hand-annotated emotions.
Again comparing self-reported frustration with each LLM’s classification of emotions, the researchers found that GPT-4o outperformed T5 (as well as other LLMs). T5 skewed toward annotating utterances as joy or anger, while GPT-4o was more diverse in its assessments and used “neutral” as a dampener by not assigning emotions to unemotional statements. GPT-4o also recognized compassion where T5 did not recognize love.
The researchers then used multiple linear regression[4] to predict participants’ subjective feelings about the result of the dispute resolution effort (as measured by the Subjective Value Inventory) based upon the emotions that T5 and GPT-4o assigned to each dialogue turn. They found that GPT-4o provided the biggest improvement in predicting participants’ feelings about the result, even when accounting for changes in prompts to T5. They also found that buyers were more straightforward to predict than sellers.
Preventing Impasse
The researchers then examined whether GPT-4o could determine when to intervene to de-escalate anger before it escalates into impasse. This would require GPT-4o to identify a pattern of escalation. GPT-4o’s automatic identification of emotion showed that when sellers respond to buyers’ anger with anger in these dialogues, the anger spirals, and impasse results. They found something similar with compassion. When sellers began with compassion, buyers responded with compassion, and the dialogue more often resulted in agreement.
In sum, the researchers demonstrated that properly prompted LLMs with in-context learning can accurately assign emotions to text. Additionally, they found that they could predict subjective dispute outcomes from emotional expressions alone, without knowing the actual content of a conversation.
Researchers can use GPT-4o emotion assignment to reveal how emotions can shape disputes over time: Anger spirals, but so does compassion when it comes early in the dispute. This indicates that LLMs can be trained to know when to intervene in order to change the trajectory of a dispute. Future work will look at how they can do this.
[1] The authors are Sushrita Rakshit, James Hale, Kushal Chawla, Jeanne M. Brett and Jonathan Gratch.
[2] They characterize negotiations as a coming together to create a new relationship (e.g., car salesman and customer), while disputes involve an existing relationship that has gone badly.
[3] I’m extrapolating here, based on the context and what I could find about fine-tuned LLMs.
[4] Multiple linear regression uses several independent variables to predict a specific outcome.
