I have been using Langchain’s output parser to structure the output of language models. I found it to be a useful tool, as it allowed me to get the output in the exact format that I wanted.
In this article, I will share my experience of using the output parser, discuss how I used it to structure the output of different language models and share some of the benefits that I found.
Here are some of the benefits of using the output parser:
- It can help to make the output of language models more structured and easier to understand.
- It can be used to get more structured information than just text back.
- It can be customized to meet the specific needs of a particular application.
In Practice
Let’s say we want to use LLM to create a simple TODO web API server using Go Lang.
First, we’ll define the output structure. In this case, it’s a ‘SourceCode’ class with the ‘souce_code’ content and a file name.
from pydantic import BaseModel, Field, validator
class SourceCode(BaseModel):
source_code: str = Field(description="The current source code")
file_name: str = Field(description="The file name with extension for this code")
parser = PydanticOutputParser(pydantic_object=SourceCode)
Then we prepare our prompt to ask the LLM
from langchain.prompts import PromptTemplate
prompt = PromptTemplate(
template="Provide the source code for the following requirement.n{format_instructions}n{requirement}n",
input_variables=["requirement"],
partial_variables={"format_instructions": parser.get_format_instructions()},
)
And use the prompt template to create an prompt input.
todo_requirement = (
"Create an TODO web api server in Go lang with CRUD operation endpoints."
)
_input = prompt.format_prompt(requirement=todo_requirement)
We can also check how our input is formatted before sending it to the LLM
print(_input.to_string())
Then we should decide on which LLM model to use. I’ve tried a few of them and found ‘text-davici-003’ produced a more accurate output. Feel free to do your research and find the one that suits better for your need.
model_name = "text-davinci-003"
# model = OpenAI(model_name="text-ada-001", n=2, best_of=2)
temperature = 0.0
model = OpenAI(model_name=model_name, temperature=temperature)
output = model(_input.to_string())
# checking the output
# print(output)
This didn’t work as expected, the output was cut short and resulted in an illegal JSON string that is unable to parse. After doing some research, the reason was that LangChain sets a default limit 500 total token limit for the OpenAI LLM model. The token limit is for both input and output. Which is not enough for the result text. To get around this I needed to use the tiktoken library to help me maximize the token limit.
import tiktoken
encoding = tiktoken.encoding_for_model(model_name)
prompt_tokens = len(encoding.encode(_input.to_string()))
# ...
# text-davinci-003 model has a total token limit of 4097
model = OpenAI(model_name=model_name, temperature=temperature, max_tokens=4097-prompt_tokens)
This time, the LLM generated an expected formatted output as follows.
{
"source_code": "package main
import (
"fmt"
"net/http"
)
func main() {
http.HandleFunc("/todos", todosHandler)
http.ListenAndServe(":8080", nil)
}
func todosHandler(w http.ResponseWriter, r *http.Request) {
switch r.Method {
case "GET":
// Handle GET request
case "POST":
// Handle POST request
case "PUT":
// Handle PUT request
case "DELETE":
// Handle DELETE request
default:
fmt.Fprintf(w, "Method not supported")
}
}",
"file_name": "todo.go"
}
Great! The output is now consumable by another program. We have achieved our goal in just a few lines of code.
In summary
If you are interested in using the output parser, you can find more information on the Langchain website.
Here are some additional tips for using the output parser:
- Make sure that you understand the different types of output that the language model can produce.
- Experiment with different settings to see how they affect the output.
- Use the output parser to structure the output of different language models to see how it affects the results.
How to Use LangChain’s Output Parser to Tame the Language Model Output was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.