And you don’t even need to know the command syntax
The idea
The other night I had a fleeting idea that turned into a fun technical challenge. I have them all the time.
The fleeting ideas, that is.
They rarely turn into actual technical challenges that I complete because of life’s myriad of other distractions (like season 2 of Foundation or my cat giving me feline stank-eye because he wants to be fed). But this particular night, the idea just wouldn’t leave my head until I sat down at my computer and began to stitch code and cloud components together until I had something working.
The idea? Could I create a solution where I could execute AWS CLI commands on my AWS account using only my voice via Amazon Alexa while also not having to know the actual syntax required (but instead relying on natural language)?
The answer is yes! And this article will show you how (with a bonus video at the end showing the solution in action).
To follow along at home, you’ll need the following:
- An Amazon Alexa Developer Console login
- An AWS Account
- An Azure OpenAI service or other endpoint that allows chat completion via a GPT-x model
- A programming language of your choice (though for best results, c# because that’s what I’m working with)
The voice bit
Our entry point for the application is going to be Amazon Alexa. Why? Because Alexa provides an easy-to-use development experience for the creation of voice skills, and as you’ll see from my example, you don’t need to finalise and publish a skill to experiment with it, nor do you need an actual Alexa device like an echo.
The voice skill I’ve set up is called — rather unimaginatively — VoiceToCommand — and as far as Alexa skills go, it’s really very simple. That’s because most of the brains of our solution will come when we plug in the chat completions API that exposes our large language model. But more on that later.
For now, let’s take a look at our Alexa skill. We’ve set up a very simple launch phrase called ‘voice command’:
And we’ve created a custom slot called NaturalLanguageCommand. This slot will be filled with our natural language request (for example, “show me my s3 buckets”).
You can see we’ve added some sample slot values, but these don’t actually matter. It wouldn’t be possible to cover all the slot values for the ways we might talk about the things we want from the AWS CLI, so these are really just some samples so that we don’t have 0 slot values which can cause issues saving your skill.
Also, make sure you turn Multi-Value on. This will ensure our entire natural language request (typically a short sentence) will be passed along to our yet-to-be-built backend.
Now, to give Alexa an idea of how we might start our natural language command, we’ve created an intent called ExecuteCommand. We’ve then combined that intent with some sample utterances as typical ways to introduce a request for our AWS CLI (the {CommandSlot}, and voila! We have the starting point of our skill.
A quick test in the developer console, and we’re ready to move on.
The skill backend
To take our spoken words and do something useful with them, we need to create a backend to accept the {CommandSlot}. For this, we will keep things in the Amazon family and create an AWS Lambda function using C#.
We will name this Lambda VoiceToCommand, but it can really be thought of as our orchestrator.
It’ll be responsible for receiving our voice command, sending it to our LLM endpoint to convert it to a usable AWS CLI command, and sending that command to our local computer by way of the WebSocket server (don’t worry — we’ll get to that too!) and finally responding to Alexa that the command has been executed.
Put simply, our Lambda — and the services it talks to — will be responsible for doing something like:
We’ll put the entire Lambda function code here — including its three methods — then go through them one at a time (note we’re introducing a couple of nuget packages here, namely Alexa.NET and Azure.AI.OpenAI).
using Alexa.NET;
using Alexa.NET.Request;
using Alexa.NET.Request.Type;
using Alexa.NET.Response;
using Amazon.Lambda.Core;
using Azure.AI.OpenAI;
using Azure;
using System.Text.Json;
using System.Net.WebSockets;
using System.Text;
[assembly: LambdaSerializer(typeof(Amazon.Lambda.Serialization.Json.JsonSerializer))]
namespace VoiceToCommand;
public class Function
{
private readonly string _azureOpenAiUrl;
private readonly string _azureOpenAiKey;
private readonly OpenAIClient _openAiClient;
public Function()
{
_azureOpenAiKey = "<Your Azure OpenAI Key>";
_azureOpenAiUrl = "<Your Azure OpenAI URL>";
_openAiClient = new OpenAIClient(
new Uri(_azureOpenAiUrl),
new AzureKeyCredential(_azureOpenAiKey));
}
/// <summary>
/// A simple function that takes a string and does a ToUpper
/// </summary>
/// <param name="input"></param>
/// <param name="context"></param>
/// <returns></returns>
public async Task<SkillResponse> FunctionHandler(SkillRequest input, ILambdaContext context)
{
Console.WriteLine($"Executing Skill Handler...");
Console.WriteLine($"Skill Request Object: {JsonSerializer.Serialize(input)}");
if (input.Request is LaunchRequest)
{
return ResponseBuilder.Ask("Ready to accept commands", null);
}
else if (input.Request is IntentRequest intentRequest)
{
if (intentRequest.Intent.Name == "ExecuteCommand")
{
var command = intentRequest.Intent.Slots["CommandSlot"].Value;
var response = ConstructAwsCommand(command);
await SubmitCommandToWebSocketServer(response);
var r = ResponseBuilder.Tell($"Command executed.");
r.Response.ShouldEndSession = false; // Keep the session open so we can continue to listen for commands
return r;
}
}
return ResponseBuilder.Tell("I don't understand");
}
public async Task<bool> SubmitCommandToWebSocketServer(string command)
{
try
{
var cws = new ClientWebSocket();
var cancelSource = new CancellationTokenSource();
var connectionUri = new Uri("wss://your-server-here.amazonaws.com/Prod");
await cws.ConnectAsync(connectionUri, cancelSource.Token);
ArraySegment<byte> message = new ArraySegment<byte>(UTF8Encoding.UTF8.GetBytes($"{{"message":"sendmessage", "data":"{command}"}}"));
await cws.SendAsync(message, WebSocketMessageType.Text, true, cancelSource.Token);
}
catch (Exception ex)
{
Console.WriteLine($"{ex.Message}");
}
return true;
}
/// <summary>
/// Use OpenAI GPT-4 model to convert natural language to AWS command.
/// </summary>
/// <param name="command"></param>
/// <returns></returns>
public string ConstructAwsCommand(string command)
{
try
{
Response<ChatCompletions> responseWithoutStream = _openAiClient.GetChatCompletions(
"shiny-new-model",
new ChatCompletionsOptions()
{
Messages =
{
new ChatMessage(ChatRole.System, @"You are an AI assistant that converts natural language requests to perform AWS actions into executable aws cli commands that are suitable to be run by other processes. You will only ever respond with the command itself. No other text should be included in your response."),
new ChatMessage(ChatRole.User, @$"What is the command for: {command}. Please provide only the command itself in your response and make sure the output is suitable to be passed as a string literal. Make sure that your JMESPath expression has balanced and correctly matched single quotes ('). If you have any literal strings or conditions in your expression, enclose them with single quotes, and ensure that they are properly closed.")
},
Temperature = (float)0.7,
MaxTokens = 800,
NucleusSamplingFactor = (float)0.95,
FrequencyPenalty = 0,
PresencePenalty = 0,
});
ChatCompletions completions = responseWithoutStream.Value;
Console.WriteLine($"Response: {JsonSerializer.Serialize(completions)}");
return completions.Choices[0].Message.Content;
}
catch (Exception ex)
{
Console.WriteLine($"Error: {ex.Message}");
return "error";
}
}
}
So, let’s break down our function and its three key methods.
FunctionHandler is our main Lambda handler. It intercepts our SkillRequest from Alexa and determines whether it’s the one we’re interested in: ExecuteCommand.
If it is, it’ll grab out the CommandSlot value (our natural language request, like ‘tail the log group for my lambda called x’) and pass it to our ConstructAwsCommand method.
The ConstructAwsCommand method is where things get more interesting. Here we will use prompt engineering to construct just the right kind of prompt to elicit a suitable response from our LLM endpoint (in this case, a chat completion endpoint featuring the gpt-35-turbo-16k model).
It took several iterations to get the prompt to work best for this situation. Typically it’s beneficial to receive a well-rounded response from an LLM, including introductory text and explanations. This isn’t what we’re after, though, because we want the exact command to be executed.
To this end, I set up the system message to create a sort of ‘constrained’ personality for the model, instructing it that
“You are an AI assistant that converts natural language requests to perform AWS actions into executable AWS CLI commands that are suitable to be run by other processes. You will only ever respond with the command itself. No other text should be included in your response.”
This helped prevent ‘fluff’ from being sent back in the response. I still found issues where the command was sent back with special characters or string literal-breaking characters that caused issues trying to get that command sent anywhere else as JSON. To that end, I helped the model along by giving it a fairly lengthy prompt:
“What is the command for: {command}. Please provide only the command itself in your response and make sure the output is suitable to be passed as a string literal. Make sure that your JMESPath expression has balanced and correctly matched single quotes (‘). If you have any literal strings or conditions in your expression, enclose them with single quotes, and ensure that they are properly closed.”
With that done, and with the expectation we’d be getting back suitable commands, I now needed a way to get that command from my VoiceCommand Lambda in my AWS account to my local laptop environment where it could be executed.
There are a few ways to do this, but I settled on using a WebSocket server. WebSocket is a great protocol for establishing a client/server connection where you want to broadcast data in near real time to and from multiple clients. In this case, the VoiceCommand Lambda is client 1, and my laptop is client 2. That’s where the final method, SubmitCommandToWebSocketServer, comes in. It establishes a connection to our yet-to-be-created WebSocket server and broadcasts our command from our VoiceCommand Lambda to the WebSocket server, and the laptop will listen for that same message.
So, with the orchestrator lambda taken care of, it was time to set up that web socket server.
Socket to ’em.
Now full disclosure here, I took a very neat shortcut to get my WebSocket server set up (and don’t we love working smarter, not harder?). As primarily a C# developer and a heavy Visual Studio user, I have access to all kinds of really useful boilerplate templates for various programs. One of those happens to be a simple WebSocket server using Lambda and a Serverless Application Model (SAM) template:
Creating this blueprint gives us a ready-to-go project containing a Lambda function acting as a WebSocket server and a SAM template that’ll create the connection, disconnection, and message Lambdas, an API Gateway configured for WebSocket communication and a DynamoDb table to store the sessions from the client(s) that connect. I won’t go into the code here or the template. You can check it out for yourself.
The important thing to note is that once deployed to your AWS account, you’ll have a WebSocket server designed to send and receive messages in the following format:
{
{
"message":"sendmessage",
"data":"{command}"
}
}
That {command} bit, that’s where we’re going to put our AWS CLI command. It’ll be sent to the WebSocket server to broadcast to other clients, and our laptop will be listening for just such a message.
Deploy the complete stack to your AWS account, and take note of the produced API Gateway WSS URL; you’ll want to update that into your VoiceCommand Lambda, and into your local console application on your local machine.
(A note: The blueprint deploys an unauthenticated WebSocket server. This is fine for experimentation, but if you have any thoughts about using this in a production setting, make sure you apply proper authentication and authorisation to anything that can be reached over the internet).
The local client
The final part of our puzzle! The engine for executing our command locally. For this, we need a simple c# console application that will be set up to listen to our WebSocket server, receive our AWS CLI command, and execute it locally by wrapping it in a PowerShell execution. Let’s take a quick look at the code:
using System.Diagnostics;
using System.Net.WebSockets;
using System.Text;
namespace VoiceToCommandConsole
{
internal class Program
{
private static async Task Main(string[] args)
{
Console.WriteLine("Connecting to the AWS Command Listener...");
var connectionUri = new Uri("wss://<your websocket server>.amazonaws.com/Prod");
using (var cws = new ClientWebSocket())
{
var cancelSource = new CancellationTokenSource();
try
{
while (true) // Reconnection loop
{
await cws.ConnectAsync(connectionUri, cancelSource.Token);
Console.WriteLine("Connected to the WebSocket server.");
// Start a separate task to listen for incoming messages
_ = ListenForMessagesAsync(cws, cancelSource.Token);
// Keep the console application running while the connection is active
await Task.Delay(-1, cancelSource.Token);
// If the code reaches here, the WebSocket connection was lost or explicitly closed
// You can introduce a delay before attempting to reconnect to avoid excessive connection attempts
await Task.Delay(50);
}
}
catch (Exception ex)
{
Console.WriteLine($"Error: {ex.Message}");
}
}
}
private static async Task ListenForMessagesAsync(ClientWebSocket cws, CancellationToken cancellationToken)
{
try
{
while (!cancellationToken.IsCancellationRequested)
{
var buffer = new byte[1024];
var result = await cws.ReceiveAsync(new ArraySegment<byte>(buffer), cancellationToken);
if (result.MessageType == WebSocketMessageType.Text)
{
var message = Encoding.UTF8.GetString(buffer, 0, result.Count);
Console.WriteLine($"Received message: {message}");
// Display the progress bar animation
var progressBarAnimationTask = ShowProgressBarAnimation();
await progressBarAnimationTask;
// Run the received message as a command and display the output
var output = RunCommand(message);
Console.WriteLine($"{output}");
}
}
}
catch (Exception ex)
{
Console.WriteLine($"Error while preparing commands: {ex.Message}");
}
}
private static string RunCommand(string command)
{
try
{
var processInfo = new ProcessStartInfo("powershell", "-Command " + """ + command + """)
{
CreateNoWindow = true,
UseShellExecute = false,
RedirectStandardOutput = true,
RedirectStandardError = true
};
var process = Process.Start(processInfo);
process.WaitForExitAsync();
var output = process.StandardOutput.ReadToEnd() + process.StandardError.ReadToEnd();
return output;
}
catch (Exception ex)
{
return $"Error executing command: {ex.Message}";
}
}
private static async Task<string> RunCommandAsync(string command)
{
try
{
var processInfo = new ProcessStartInfo("powershell", "-Command " + """ + command + """)
{
CreateNoWindow = true,
UseShellExecute = false,
RedirectStandardOutput = true,
RedirectStandardError = true,
RedirectStandardInput = true // Allows writing to the process's input stream
};
var process = new Process();
process.StartInfo = processInfo;
var outputBuilder = new StringBuilder();
process.OutputDataReceived += (sender, e) =>
{
if (e.Data != null)
{
outputBuilder.AppendLine(e.Data);
}
};
process.ErrorDataReceived += (sender, e) =>
{
if (e.Data != null)
{
outputBuilder.AppendLine(e.Data);
}
};
process.Start();
process.BeginOutputReadLine();
process.BeginErrorReadLine();
// The following line writes input to the process's standard input (if needed)
// process.StandardInput.WriteLine("Input for the process");
await process.WaitForExitAsync(); // Await process completion asynchronously
return outputBuilder.ToString();
}
catch (Exception ex)
{
return $"Error executing command: {ex.Message}";
}
}
private static async Task ShowProgressBarAnimation()
{
// Simple ASCII progress bar animation
const string animationFrames = "|/-\";
int frameIndex = 0;
for (int i = 0; i < 10; i++)
{
Console.Write($"rProcessing... {animationFrames[frameIndex]}");
frameIndex = (frameIndex + 1) % animationFrames.Length;
await Task.Delay(100); // Adjust the delay time to control animation speed
}
Console.WriteLine("rProcessing... Done!");
}
}
}
At its simplest, the above simply launches a command window, sets up an always-open connection to our WebSocket server, and listens for the all-important AWS CLI command. Once it’s received, a nifty little progress animation plays (just for effect), our command is shown, run, and hopefully, some meaningful output is produced!
So, after all that, what have we ended up with? Let’s take a look at the overall architecture. A picture is always handy:
The test
So, did it actually work? After all that code and tomfoolery, was the hypothesis proven? I’m happy to report that, yes, it did indeed work. Not granted, it’s a little bit shaky and needs work in the skill space and the prompt engineering space to scale it beyond simple commands, but the core of the idea works! And that’s awesome. For proof, here’s a short video of me testing a couple of commands. I hope you enjoyed this little journey from ideation to solution, and as always — feel free to reach out via the comments if you have any questions :).
Using Amazon Alexa To Run AWS CLI Commands via Large Language Models (LLMs) was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.