Trial Magazine
Theme Article
Large Language Model Fundamentals
Attorneys are beginning to use these AI applications for research, drafting, website-based communications, and more. Understand the basics and how to get started.
March 2024With the explosion in artificial intelligence technologies, one type—large language models (LLMs)—has the potential to be particularly useful in the practice of law. Many of us have heard a lot recently about these, even if we’re not familiar with how the underlying programming works. ChatGPT, for instance, is an LLM.
A large language model is a type of artificial intelligence model that is designed to understand, generate, and interact with human language. LLMs generate text at approximately five words per second or 300 words per minute. LLMs can generate text more than 7.5 times faster than humans can type: A 3,000-word document would take over an hour for a human to type, while an LLM could generate it in 10 minutes.
However, this comparison only goes so far. Writing involves much more than just typing speed, and there is much to consider and understand about LLMs when using them in your law practice. They can be used to summarize documents, to research, to draft interrogatories, as chatbots on firm websites, and more. Let’s delve deeper into how LLMs generate text and the common terminology.
How It Works
Before exploring how you may be able to use an LLM in your practice, it is helpful to understand a bit about how they work. Just like how we sometimes need to understand unfamiliar topics and terminology in medicine and science to build our client’s claims, we need to know how LLMs use data to begin to understand their potential and limitations.
At its core, an LLM is based on a neural network architecture—which is a type of computer programming that is used for machine learning. (The specific type used here is called a “generative pre-trained transformer.”) Basically, this type of machine learning is designed to function like a human brain—data is entered into the model, it passes through a set of hidden computing mechanisms called “neurons” that mimic human brain processing, and then it creates a set of outputs interpreting the data that was processed through it.1 This architecture is great at handling sequential data, such as sentences.2
Training and weights adjustment. LLMs undergo “training” much like law students study various cases and texts. This stage is foundational, and it allows the LLM to generate human-like text. During training the LLM processes terabytes of text, adjusting internal parameters (which are in the billions for large models) known as “weights.” The weights determine the output of the LLM.
These adjustments enable the LLM to mimic patterns in human language—in other words, this training teaches the model to predict the next word in a sentence, given the words that come before it. Imagine a scale balancing different objects. Each piece of text alters the “balance” of the LLM’s weights, so that it can generate text that is similar to patterns derived from its training data. A parrot provides a better analogy than a law student. The LLM says things that mimic human language with surprising accuracy, but what it says is not always useful or true. (See “hallucinations” later in this article.)
Probabilities and predictions. When generating text, the model calculates probabilities for what the next word should be based on the context provided by the previous words. It selects words based on these probabilities, generating coherent and contextually relevant text.
Contextual understanding. Unlike simpler models, LLMs can understand and generate complex sentences while maintaining coherence over longer passages. The LLM reduces the relationship between words to a vector. Consider this fascinating (and famous) example: If you take the vector for the word “King,” subtract the vector for the word “Man,” and add the vector for the word “Woman,” you get the vector for the word “Queen.”3
Hallucinations. A word of caution: LLMs often produce “hallucinations”—text emitted by an LLM that seems plausible and coherent yet is factually incorrect or nonsensical. LLMs rely on statistical patterns from their training data, not verified facts. For example, an LLM might produce citations with proper formatting and syntax for a case that does not exist. The LLM, much like a parrot, can repeat phrases it has “heard” in its training data without any grasp of their accuracy or relevance.
This limitation is crucial to understand, as it underscores the need for careful evaluation of LLM-generated content, especially when factual correctness is paramount. Despite the real risk of hallucinations, LLMs nonetheless are proving to be useful, and they are constantly evolving to become more reliable.
How to Optimize the Model
Once the model is set up and starts processing data and creating outputs of text, it still needs to be refined—similar to how you don’t use the first draft of a brief; you edit, reword, add better cases, and keep honing your arguments. The initial training process creates the “base model,” which writes more like the internet’s stream of consciousness rather than a useful tool. There are several ways that an LLM can be refined to be more useful, reliable, and controllable.
Fine-tuning. Since an LLM is trained on a vast, generalized dataset taken from publicly available and private data, to optimize it for a particular use, it needs to be “fine-tuned.” This is similar to how you might gather a set of cases in your research for a brief, and then need to go through them more carefully to see which ones are more on-point with your argument over others. Fine-tuning happens after the first training phase: The LLM trains on a specific and focused dataset instead of broad swaths of text.
One common approach is to continue the training process of the LLM on a more specialized dataset—think case law, pleadings, and other legal documents. For example, companies such as Westlaw and Lexis Nexis use their significant datasets to tune their models, while large firms may use internal documents for their datasets.
One important technique is called “instruction fine-tuning.” This technique hones the LLM’s ability to complete certain tasks and to follow the user’s instructions. A fine-tuning dataset typically consists of successful answers to various instructions. An example instruction might be: “Write a topic sentence for a given paragraph.” The answer should demonstrate a good topic sentence for that paragraph. Human experts curate datasets for fine-tuning, ensuring that the examples are correct and at a certain quality level. The aim is to refine the LLM’s responses, making them more relevant and accurate for an intended application, such as a helpful chatbot. The fine-tuning technique addresses the problem of an LLM’s reliability by further training the LLM on quality examples vetted by humans and focused on specific tasks.
The text given to an LLM to elicit an output is called a “prompt.” Different prompting techniques influence the usefulness and reliability of the LLM.
Prompt engineering. Another method to further improve the usefulness and reliability of the LLM’s outputs, which does not require altering the model itself, is prompt engineering. The text given to an LLM to elicit an output is called a “prompt.” Different prompting techniques influence the usefulness and reliability of the LLM. Unlike training and fine-tuning, prompt engineering does not alter the weights of the model’s parameters. Think of it like choosing your words carefully to make yourself understood.
Essentially, you prompt the model with detailed instructions. For example, a lawyer might prompt an LLM with this: “You are an experienced lawyer. Draft a simple settlement agreement that avoids legalese and contains the following terms, a confidentiality agreement, and a non-disparagement agreement.” The inclusion of “experienced lawyer” signals to the LLM what you expect from its outputs. All of the details included in the prompt will influence the quality of the generated text. Asking the model to role play as an experienced attorney will likely get better results than if you asked it to role play as an inexperienced lawyer. In general, the prompts set the tone of the LLM’s response. The more context, the better. Prompting techniques allow the user to provide specific and relevant context that may not have been in the training data.
The base case is called “zero-shot prompting.” The user provides a single prompt with no examples. The LLM relies solely on its training to generate the output. “Few-shot prompting” provides the LLM with some examples to guide its response. For example, a prompt might include some examples of haikus before asking the LLM to generate its own haiku. If you want the model to generate something with very specific requirements, then giving it examples of what the form of the response should look like will increase the likelihood that its output follows that form.
Chatbots are one example of how LLM applications rely on prompting techniques. In chatbot-style applications, the LLM engages in a dialogue with the user and generates appropriate responses based on the context of the conversation. A “system prompt” inserts specific instructions into each interaction between the user and the LLM. “Chat-formatted prompting” feeds the history of the conversation back into the LLM for each subsequent prompt. The LLM appears to “remember” the conversation because each prompt includes the entire conversation.
Despite these techniques to improve the reliability of LLMs, the fundamental limitations still apply. The LLM is still mimicking patterns. Each technique provides the LLM with a more specific and relevant context of which patterns to mimic.
Retrieval augmented generation. Retrieval augmented generation (RAG) goes a step further than simple prompting techniques. RAG integrates external information sources into the text-generation process. These LLMs can then use information beyond their training, even if the user does not supply it in the prompt. For example, the application would scan a database for relevant sources before answering a user’s prompt. If the application finds relevant information, then this information is incorporated into the prompt. Think of it like how a human might read a passage from a book to inform their answer to a question.
The previously described prompting techniques all rely on the user to supply the relevant context. RAG instead enables the LLM application to find the information by itself. Typically, RAG requires the database to be configured in advance, although some applications allow the LLM to browse the internet. Prompting techniques attempt to mitigate the risk of hallucinations by providing context relevant to a successful LLM response. RAG complements these prompting techniques by automatically providing relevant context from external sources.
Reinforcement learning from human feedback. This is a more advanced method by which the model is fine-tuned based on feedback from human evaluators. The evaluators rate the model’s responses, and these ratings are used to adjust the model’s parameters. As a comparison, this is a bit like a senior attorney asking a junior attorney to write two versions of a document and then selecting the version they prefer. Over time, the junior attorney learns to write in the preferred style. The outputs that are given higher ratings signal to the model that those are better answers, and the model’s parameters adjust so that it is more likely to generate responses like the ones marked as preferred.
Your Homework
The easiest way to get started is to try some of the LLMs out there. Before you use any program, however, always read the terms of service to understand how the LLM will use and store any data you enter—you do not want to risk any confidential client or case information. And review your jurisdiction’s ethics rules.4
Head to chat.openai.com, for instance. Create an account and just start typing. Ask it to draft a confidentiality policy for your firm. Ask for a laptop policy. Ask for 10 interrogatories in a hypothetical motor vehicle claim. None of those prompts implicate confidential client information.
If you’re intrigued, next head over to the website Claude.AI. Create another account. Start with a publicly available document from PACER (perhaps an expert report from an expert you’re dealing with, but, crucially, not an expert report from a case where you represent a party). Ask the LLM to read the document and highlight inconsistent facts. Then check it against the actual report.
Today, LLMs are somewhere between a parrot and a superhuman typist. The technology is still evolving, but it already promises gains in efficiency. Test it out. Make mistakes. Learn by doing.
LLMs are not infallible. In fact, they are quite the opposite. A basic understanding of the technology can help you anticipate their limitations and avoid pitfalls—but also appreciate their usefulness. Learn how the technology works, and try it out for yourself.
Alex Freeburg is the founder of and Erik Dahl is the digital operations director at Freeburg Law in Jackson, Wyo. They can be reached at alex@tetonattorney.com and erik@tetonattorney.com. The views expressed in this article are the authors’ and do not constitute an endorsement of any product of service by AAJ or Trial.
Notes
- Pragati Baheti, The Essential Guide to Neural Network Architecture, v7, July 8, 2021, http://tinyurl.com/4mnr4b5p.
- Amazon Web Servs., What Are Large Language Models?, https://aws.amazon.com/what-is/large-language-model/; Amazon Web Servs., What is a Neural Network?, https://aws.amazon.com/what-is/neural-network/; Nvidia, Large Language Models Explained, https://www.nvidia.com/en-us/glossary/data-science/large-language-models/; Nvidia, What Is a Transformer Model?, https://blogs.nvidia.com/blog/what-is-a-transformer-model/; Rishi Bommasani et al., On the Opportunities and Risks of Foundation Models, Ctr. for Rsch. on Found. Models, Stanford Inst. for Human-Centered Artificial Intelligence, https://arxiv.org/pdf/2108.07258.pdf.
- Emerging Technology from the arXiv, King – Man + Woman = Queen: The Marvelous Mathematics of Computational Linguistics, MIT Techn. Rev., Sept. 17, 2015, https://www.tinyurl.com/ytps2efh.
- While still very much an evolving topic, some states are starting to delve into how AI may implicate legal ethics. Be sure to stay up to date with what’s happening in your jurisdiction.