AI beats law professors in blind law school test with 75 percent win rate

The study showed that AI can be competitive as an educational support tool even in fields like law, where judgment and reasoning matter more than a single correct answer. [Photo: Shutterstock]

A study has found that artificial intelligence was rated more highly than human law professors in an experiment comparing the quality of answers to law students’ questions. It also found that AI has reached a meaningful level as an educational support tool even in fields like law, where there is no clear single correct answer and complex reasoning is required.

On June 4 local time, online media outlet Gigazine reported that Stanford University law school researchers said AI-generated answers received better overall ratings than answers written by human professors in blind evaluations conducted with U.S. law school professors.

The study was jointly conducted by Julian Nyarko (Julian Nyarko), a Stanford Law School professor who leads the Legal Innovation and Frontier Technology Lab, and researchers at Yale University and New York University. The team recruited 16 law professors from U.S. law schools and selected 40 representative questions that students could realistically raise during a contracts course. Professors then wrote answers themselves, and the same questions were also given to an AI model to generate responses.

The evaluation used a blind format that did not reveal who wrote each answer. Participating professors rated quality without knowing whether a response was written by a human or generated by AI. To improve fairness, the researchers adjusted the length and format of the AI answers to be similar to the human answers.

The results favored AI. Professors evaluated a total of 2,918 answers and rated AI-written responses statistically significantly higher than those written by human professors. In direct comparisons, AI answers posted a win rate of about 75 percent.

AI also performed better on educational harmfulness. About 12 percent of human professors’ answers were classified as responses that could hinder students’ understanding or cause misunderstanding, while 3.5 percent of AI answers received the same assessment. The researchers explained that AI generated relatively fewer responses that could negatively affect student learning.

The researchers attached significance to the fact that the results came from the field of law rather than simple memorisation or multiple-choice problem solving. Nyarko said, "Law is a field that requires not simple recall of facts but judgment, nuanced reasoning and the ability to handle uncertainty." He added, "The questions used in the experiment were not composed only of problems with clear correct answers."

The study is also expected to affect an ongoing debate in U.S. legal academia over the use of AI. Many law schools are currently reviewing how to introduce AI tools into curricula, but there are also many voices raising concerns about hallucinations, excessive reliance and the possibility of reduced critical thinking.

The researchers stressed that the results do not mean a wholesale replacement by AI. They said the findings sufficiently confirmed the potential for AI to be used as a support tool that provides students with high-quality personalised learning assistance.

Alejandro Salinas, a researcher at the Legal Innovation and Frontier Technology Lab, said, "An AI-based tutor can become a means of high-quality learning support that is accessible whenever needed." He added, "It could contribute to expanding access to expert knowledge."

Nyarko also said, "This study shows there is a need to re-examine both unconditional optimism and scepticism about AI." He added, "Future discussions should focus not on whether AI can produce good answers, but on how to use AI responsibly to improve students’ learning outcomes."

Jinju Hong (홍진주) hongjj@d-today.co.kr

Keyword