Real Judges More Easily Swayed by “legally irrelevant” Factors than Artificial Intelligence
- blamlaw
- Apr 8
- 1 min read
Eric Posner and Shivam Saran's research paper, "Judge AI: Assessing Large Language Models in Judicial Decision-Making," evaluates whether large language models (LLMs) like GPT-4 can replace human judges. Their study replicates a prior experiment involving 31 U.S. federal judges by tasking GPT-4 to decide an international war crimes case.
The study had two variables. Some versions included sympathetic background information about the Defendant that had no legal relevance while other versions made the defendant seem unsympathetic. Also, some versions varied with the lower court’s ruling following legal precedent while in other versions the lower court went against precedent.
One major finding is that human judges are significantly influenced by how sympathetically a defendant was portrayed even when these emotional factors had no legal relevance to the case. GPT-4o follows legal precedent more strictly than judges but lacks sensitivity to defendant sympathy.
While GPT-4o was more accurate in applying the law, its performance raises a philosophical question whether strict adherence to precedent makes it a "better" or "worse" judge.
Despite its potential to serve as impartial adjudicators without biases, the authors conclude that current artificial intelligence cannot replace human judges and should not be relied on for analyzing judicial decision-making.
Comments