How to Use LangChain’s Output Parser to Tame the Language Model Output

July 3, 2023
Rss Fetcher

Photo by Digital Content Writers India on Unsplash

I have been using Langchain’s output parser to structure the output of language models. I found it to be a useful tool, as it allowed me to get the output in the exact format that I wanted.

In this article, I will share my experience of using the output parser, discuss how I used it to structure the output of different language models and share some of the benefits that I found.

Here are some of the benefits of using the output parser:

It can help to make the output of language models more structured and easier to understand.
It can be used to get more structured information than just text back.
It can be customized to meet the specific needs of a particular application.

In Practice

Let’s say we want to use LLM to create a simple TODO web API server using Go Lang.

First, we’ll define the output structure. In this case, it’s a ‘SourceCode’ class with the ‘souce_code’ content and a file name.

from pydantic import BaseModel, Field, validator

class SourceCode(BaseModel):
    source_code: str = Field(description="The current source code")
    file_name: str = Field(description="The file name with extension for this code")

parser = PydanticOutputParser(pydantic_object=SourceCode)

Then we prepare our prompt to ask the LLM

from langchain.prompts import PromptTemplate

prompt = PromptTemplate(
    template="Provide the source code for the following requirement.n{format_instructions}n{requirement}n",
    input_variables=["requirement"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

And use the prompt template to create an prompt input.

todo_requirement = (
    "Create an TODO web api server in Go lang with CRUD operation endpoints."
)

_input = prompt.format_prompt(requirement=todo_requirement)

We can also check how our input is formatted before sending it to the LLM

print(_input.to_string())

Then we should decide on which LLM model to use. I’ve tried a few of them and found ‘text-davici-003’ produced a more accurate output. Feel free to do your research and find the one that suits better for your need.

model_name = "text-davinci-003"
# model = OpenAI(model_name="text-ada-001", n=2, best_of=2)
temperature = 0.0
model = OpenAI(model_name=model_name, temperature=temperature)
output = model(_input.to_string())

# checking the output
# print(output)

This didn’t work as expected, the output was cut short and resulted in an illegal JSON string that is unable to parse. After doing some research, the reason was that LangChain sets a default limit 500 total token limit for the OpenAI LLM model. The token limit is for both input and output. Which is not enough for the result text. To get around this I needed to use the tiktoken library to help me maximize the token limit.

import tiktoken

encoding = tiktoken.encoding_for_model(model_name)
prompt_tokens = len(encoding.encode(_input.to_string()))

# ...
# text-davinci-003 model has a total token limit of 4097
model = OpenAI(model_name=model_name, temperature=temperature, max_tokens=4097-prompt_tokens)

This time, the LLM generated an expected formatted output as follows.

{
  "source_code": "package main

import (
        "fmt"
        "net/http"
)

func main() {
        http.HandleFunc("/todos", todosHandler)
        http.ListenAndServe(":8080", nil)
}

func todosHandler(w http.ResponseWriter, r *http.Request) {
        switch r.Method {
        case "GET":
                // Handle GET request
        case "POST":
                // Handle POST request
        case "PUT":
                // Handle PUT request
        case "DELETE":
                // Handle DELETE request
        default:
                fmt.Fprintf(w, "Method not supported")
        }
}",
  "file_name": "todo.go"
}

Great! The output is now consumable by another program. We have achieved our goal in just a few lines of code.

In summary

If you are interested in using the output parser, you can find more information on the Langchain website.

Here are some additional tips for using the output parser:

Make sure that you understand the different types of output that the language model can produce.
Experiment with different settings to see how they affect the output.
Use the output parser to structure the output of different language models to see how it affects the results.

How to Use LangChain’s Output Parser to Tame the Language Model Output was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.

In Practice

In summary

Previous Post

Next Post

Solutions

Regions Covered