AdventureGPT: Using LLM-Backed Agents to Play Text-Based Adventure Games
Unlocking the magic of text-based adventure games with the power of language models
Recently, I decided to take some time to learn how to utilize ChatGPT and other OpenAI models. Like much of the world, I had played with OpenAI’s chat interface and had some interesting and silly conversations with ChatGPT. However, I wanted to dig deeper and really understand the development tools available.
I have always enjoyed, but have never been any good at, text-based adventure games. There is a certain magic to being given a text prompt and using your imagination to create/explore a location. The games’ pure textual nature made them a natural choice for hooking up with an LLM. With that, the first iteration of AdventureGPT was born.
Initially, I forked a Python port of Colossal Cave Adventure and started hacking away at the main game loop. I was mainly interested in taking game output, feeding it to ChatGPT, and returning the resulting command to the game. Honestly, it was just some basic plumbing. In an evening, I hacked together the first version, and it was here that I became aware of something — I had no clue how to make ChatGPT intelligently interact with the game world.
This was evident by the way ChatGPT initially tried to play the game. It used long, full sentences where the limited parser could only understand the first five letters of each word and only understood a handful of short phrases. This resulted in a lot of “I DON’T UNDERSTAND THAT” and “PLEASE REPHRASE.” An early observer commented, “Do you want terminators? Because this is how you get terminators.”
Around the same time I was initially updating the game loop, I was made aware of AutoGPT, an implementation of an autonomous agent, and its baby sibling BabyAGI. Somewhat curiously, I loaded up AutoGPT to “Create a bitcoin trading bot that reinvests dividends” and watched in amazement as it began scouring the web for trading algorithms and eventually prompted me for a GitHub API key so it could fork repos under my username. It was about then that I turned it off and realized I was dealing with something special.
AutoGPT is somewhat complicated and uses its own framework for dealing with different foundation models. In other words, getting started picking apart requires a lot of know-how. BabyAGI, on the other hand, fits into a single Python file (~ 600 LOC at the time of writing). I really wanted to bring in some of the agent functionality of these autonomous agents into my game bot, so I took an afternoon and read through the BabyAGI code base.
What I found was incredibly approachable and absolutely worth one’s time to read. There were really only a handful of functions driving the agents’ interactions, with everything else following into the category of utility functions. There were four main agents:
- The execution agent, the agent carrying out tasks
- The task creation agent, this agent dreamed up new tasks that were needed to complete an objective
- The task prioritization agent adjusted the task list to reach the objective most efficiently
- The context agent, which enhanced prompts with context from previous tasks executed
Each agent was a single function with a call to the foundation model (it supports OpenAI models and Meta AI’s LLaMa model). After reading through the code, I copied the task creation agent, the execution agent, and the OpenAI call code to a blank Python file and began hacking.
I also grabbed a walkthrough of the game I initially embedded into my source code. My plan is to take a walkthrough and feed it to the task creation agent to create game tasks. Hence, the task creation agent became a game task creation agent. The prompt remained largely the same, replacing the objective with the phrase “win the game” and removing the section on previous tasks as this was being fed a chunk of natural language.
I also modified the execution agent into a player agent, whose output was fed directly into the game’s engine. These modifications, again, were minimal at first, mainly telling the agent it was playing Colossal Cave Adventure and trying to win the game. Without too many changes to the main game loop, it started with generating the game tasks into a custom todo list class, popping the first task off the list, and handing it to the execution agent. Once the execution agent executed that task, the next task was popped, and the game continued until either the task list was empty or the game was finished.
The first time I got the game loop executed with the agents, a spark of magic quickly faded as I saw how anemic the execution agent was. It simply put in a single game command for each task, which wouldn’t do at all. I needed some way of knowing when a task was finished. This led to the birth of the task-completion agent.
The task-completion agent would look at the game history and decide if the current task had been completed. Then, and only then, would the next task be pulled off the todo list and become the player agent’s focus. Once the completion agent was added to the game loop, the agents started playing the game with something that resembled purpose. There were moments when the player agent would get in a loop of bad input resulting in a confused game parser, but there were also moments when the agents would actually execute tasks off the list and truly play the game.
This is when I started some cleanup. Not wanting to include a specific walkthrough in the source code, I moved the partial walkthrough into an external text file that gets read into the program and turned into game tasks incrementally (in chunks of 500 tokens, approximately). In ignorance, I tried feeding the foundation model the entire walkthrough in one go, but that resulted in a timeout that froze the game.
Cohere stated in their blog that one should feed the foundation model tasks in small, discrete chunks, especially when it comes to model output. Five hundred tokens seemed like a good place to start, so that was the chunk size that resulted in a composite list of game tasks that the player agent needed to churn through.
As a review, we started with a task creation agent which became a walkthrough summarization/task creation agent, an execution agent which became a player agent, and a new task completion agent. This combination was potent enough to be worth sharing. When I started the project, I forked the python-adventure repository onto my own private git server and kept things private and on my own hardware (ask me about my self-hosting setup in the comments).
Opening sourcing these changes was high on my priority list. I was lucky because a). I only used a single file from the Python-adventure repository and therefore didn’t actually need a proper fork, but could import the Python package b.) both Python-adventure and BabyAGI were permissively licensed. Python-adventure was licensed under the Apache 2.0 license, and BabyAGI utilized an MIT license.
After a night of reading about license interoperability, I took matters into my own hands and made all my code and changes licensed under Apache 2.0, which had all the same clauses of the MIT license with the addition that licensees cannot patent Apache-licensed code. I am not a lawyer, but this is the least controversial choice as the Apache License allowed for a NOTICE file allowing the author to include notes about copyrighted works in the new work.
With licensing sorted, I uploaded the code to GitHub and considered the next steps. The repository contains an up-to-date TODO list, but the major next step is to make the agents play the game more like a human player. This will involve removing the dependency on the walkthrough and likely adding a location-aware agent, such as a map agent. I’m currently working on this as of the time of writing, and I’ll likely report back once I’ve made more progress.
In conclusion, AdventureGPT has come a long way from its initial version. It started with basic plumbing, integrating Colossal Cave Adventure with ChatGPT, but quickly evolved by incorporating concepts from AutoGPT and BabyAGI. The task creation, execution, and completion agents, along with the player agent, transformed the game into an interactive and purpose-driven experience.
While improvements are still being made, such as making the agents play more like human players and reducing the reliance on walkthroughs, AdventureGPT has reached a point where it’s worth sharing with the community. By open-sourcing the code and licensing it under Apache 2.0, I hope to encourage further development and collaboration in enhancing the game-playing capabilities of AI models.
If you’re interested in following the progress of AdventureGPT or contributing to its development, please check out the GitHub repository and feel free to leave comments or ask questions. Together, we can continue to unlock the magic of text-based adventure games with the power of language models.
AdventureGPT: Using LLM Backed Agents to Play Text-Based Adventure Games was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.