New Research Shows That AI Models Taught Legalese Are Surprisingly Efficient
How good of a lawyer should an artificial intelligence system be? More importantly, how good do you want one to be?
When it comes to teaching machines how to understand and utilize language, it turns out that the more legalese that they know, the better, according to a new paper co-authored by Ā鶹APP-Kent College of Law Professor and Law Lab Director Daniel Martin Katz.
āIt is pretty clear that a legally trained AI system is just going to perform betterābut the open question is to identify the precise information diet to feed these modelsā Katz states simply.
His paper, ā,ā explores how different large language models (LLMs) were used to solve a variety of tasks.
Pioneered by organizations such as Google, OpenAI, and the Allen Institute, LLMs such as Bert, Elmo, and GPT-3, among others, have grown increasingly popular in the field of natural language processing. Many LLMs have been trained in general language, but the question that Katz and his colleagues sought to explore is how to apply these LLMs to legal tasks. They analyzed several the different modelsāto evaluate the performance of LLMs on tasks such as evaluating contracts, including determining if such contracts were unfair under European Union consumer law.
āA lot of effort in computer science goes into making machines understand language broadly,ā Katz says. āHow do you train a machine in the language of law? Well, how do you train a person? You send them [to law school] for three years, and you say a lot of words at them. You use words in a variety of contexts. In a real sense, you are training a studentās neural network (their brains).ā
Thatās what the models tested in Katzās paper did: They exposed machines to a large corpus of different words and measured how effective those words were at getting the machines to solve tasks.
It turned out, of the seven different models that were tested, the model that taught legal language got the machines, on average, to perform tasks betterānot just legal tasks, but any type of task.
āThe diet of getting legal information when itās being trained makes it better across all tasks,ā Katz says.
The paper has been deemed intriguing enough to be accepted for presentation at the Association for Computational Linguistics annual 2022 meeting in May.
āItās a rare thing to see a law professor get a paper accepted into a computer science conference,ā Katz notes. āItās the type of place you should take this type of workāa group of people that can actually evaluate its technical merits.ā
Itās a research area on the cutting-edge of both computer science and the law; itās an area that Illinois Institute of Technology and Ā鶹APP-Kent are uniquely situated to excel in, Katz notes.
āEven though machines are getting good at understanding basic language, itās a much harder problem to understand specialist languages: medical English or law,ā Katz says. āWeāre trying to answer: How do we build the scientific infrastructure to have machines understand legal language?ā
āLaws and their interpretations, legal arguments and agreements, are typically expressed in writing, leading to the production of vast corpora of legal text. Their analysis, which is at the center of legal practice, becomes increasingly elaborate as these collections grow in size,ā the authors note in the paper, adding that ānatural language understanding technologies can be a valuable tool to support legal practitioners in these endeavors.ā
Along with Katz, the paper is co-authored by Ilias Chalkidis of the University of Copenhagen, Denmark; Abhik Jana of the Universit Ģat Hamburg, Germany; Dirk Hartung of Bucerius Law School, Hamburg, Germany; Michael Bommarito of CodeX, Stanford Law School; Ion Androutsopoulos of the Athens University of Economics and Business, Greece; and Nikolaos Aletras of the University of Sheffield, United Kingdom.
Photo: Ā鶹APP-Kent College of Law Professor and Law Lab Director Daniel Martin Katz