It’s time to revisit Web Intents.
“The ability to imagine is the largest part of what you call intelligence. You think the ability to imagine is merely a useful step on the way to solving a problem or making something happen. But imagining it is what makes it happen.”
— Sphere, by Michael Crichton.
Last week, I was introduced to the world of Gacha Life. Bear with me. It’s a place where a dedicated community of grade, middle, and early-high schoolers spend an exorbitant amount of time using anime characters, stickers, and graphics to create … well … anything they can think of. I was genuinely impressed at the dexterity of my 15-year-old guide as she walked me through one of the community’s routines: creating GIFs from scenes generated using the app’s content. At lightning speed, she jumped from app to app, capturing screenshots, assembling them in a GIF generator, exporting them, editing them again, and sharing them to all the necessary locations. It was quite a production pipeline. I was impressed on many levels — with the depth and complexity of this community, as well as the creative skills of my guide. However, something about the process tickled my sensibilities as a systems engineer. There was something odd about it.
The very next day was WWDC. I, along with the rest of the world, was too distracted by Apple’s vision for “Spatial Computation” to think much more about my Gacha-gotcha moment. However, somewhere during Allessandra McGinnis’ demo justifying the headset’s existence in the office, I had the same feeling again. This time, we were watching a screen lift up and away from a MacBook — something akin to a very expensive and not-so-remote desktop — and into the air. The hypothetical office worker was trying to use an application that ran on their MacBook, but not on the headset. Again, something didn’t quite seem right.
Recently, I spent an hour or two working on some scripts for a personal analytics system. (I might be writing about it, soon, so I’ll spare the details). I was trying to automatically capture some data that I had been collecting at my day job manually, over the last couple of weeks. There didn’t seem to be an existing system that did everything I wanted, so I decided to just build it. It collected information from sources on my local machine, parsed them into spreadsheet rows, and stored them in Google Sheets for later analysis. Somewhere in the process, I remembered both of my confusions from the week prior.
The “odd thing” that caught my attention in these scenarios was the boundaries that were being crossed — boundaries of our own making. Much of the work that we do on our devices takes place within the well-defined boundaries of specific programs, otherwise known as applications or apps. Each of the three experiences I had over the last week was one that broke down those boundaries. They concerned a task — a series of steps that affect the state of a system over time — that could not be completed within the confines of any one environment. For the Gacha community, they wanted to create something with their images the developers hadn’t anticipated. For the Visionary Apple office worker, it was a function — a part of his normal workflow — that he was trying to include in his new environment. For me, it was taking applications that had no awareness of the existence of others on the system and getting them to process my information, in sequence.
We can better facilitate all of these workflows without creating purpose-built applications for all of them. We can ease the friction of switching between different applications and, with it, the negative side effects of switching contexts. We can ease the overload that comes from maintaining outrageously large libraries of applications on our devices, each only for the sake of that one thing that it does. Furthermore, we can take full advantage of the new mediums that are opening up to us. It starts with intent, and it started a long time ago.
Before the App
If you ask most computer programmers or system administrators, they’ll tell you that they still prefer working with a shell, such as bash or zsh. In many cases, when you work with servers, you don’t have a choice, but the reasons are pragmatic, as well. One of the most important features of bash and other CLIs is the pipe. It’s a fundamental part of CLI commands that allows you to take the output of one command and use it, automatically, as the input to another command. In a shell, we don’t think of software programs as things with large walls separating them from each other. You don’t enter into them, per se, but you can pass your information into them. In a shell, we think of most commands as simple black boxes: you put something in, it does some work, and you get something else out. By chaining these together, you can very quickly automate intensely complex workloads. This is why developers continue to do so much work here. It’s the same piping process that makes for the fruitful, limitless expression within most general-purpose programming languages. In the parlance of language or interaction design, we refer to this as a “means of combination.” CLIs have always offered a robust means of combination.
Now, there certainly are commands that don’t work this way. For instance, a whole genre of text-based games, such as Zork, developed long before GUIs came along. These games and other commands work by taking up the whole screen and manipulating its presentation for the sake of the experience. Inputs and outputs were of little concern.
As we all know, GUIs such as windowing systems and app launchers quickly took over our screens from traditional shells. In the process of writing software to accommodate this shift, we seem to have kept all of the characteristics of this latter class of interaction and ignored the former. We might refer to these combinable interactions and stand-alone interactions. Applications and GUI-heavy systems feature plenty of stand-alone processes. You open them, you do your work in them, and then you leave. In the meantime, all of your attention and the work you are doing takes place within that application context. Your attention and your sense of context are limited by the application’s walls. Combinable interactions, on the other hand, put your focus exclusively on what you want to get done, rather than where or how it gets done. These pieces of software, which used to be part-and-parcel with the experience of computation, are now almost exclusively used by computer programmers and other specialists.
It isn’t that stand-alone programs are bad. They’re just different. Sometimes we want a piece of software to occupy our entire focus. This happens when we read a story on a website, play a video game, or watch a movie. It’s important in any location we are actively consuming. However, when it comes to doing things, it can be a limiting form of interaction.
The App is Dead
This situation we’ve got ourselves into, where we keep a different stand-alone piece of software and all of its accouterments for each of our daily tasks, is the equivalent of a great chef keeping a different kitchen for each of the meals he prepares. It’s a lot of unnecessary stuff. On top of that, the drawers aren’t always fully stocked. Saidat Giwa-Osagie, of The Next Web, succinctly said that “in less than two decades, we’ve gone from wishing there was ‘an app for that’ to having an abundance of choices … there are so many apps, but very few of them actually do what we want them to do for us.”
This has led to a growing chorus of computer users, application developers, and platform developers who are actively seeking to replace the application formula. Just a couple of months ago, the folks at the Browser Company declared that The Age of the App is Over. Part of their solution, baked into the Arc browser, is a feature called an easel, which lets web users take snippets from different websites and arrange them in a centralized location. It assumes that a website’s visitors don’t all care about the same pieces of information. Going the other direction, they just launched Boosts 2.0, a feature that lets website visitors remove the sections of a website they don’t care for, along with a myriad of other customizations.
The Browser Company, along with many other teams, are looking for ways to help people focus on their workflows instead of their workplace. Their goal is to empower people by allowing them to focus on what they want to do, not where they need to go to do it — to focus on intent. This is a beautiful goal.
Long Live the App
For better or worse, though, applications don’t seem to be going anywhere. None of us, including those of us comfortable working with CLIs and obscure scripting languages, want to give up the advancements that have been made in UX over the last several decades. We can’t pin the blame on the App Economy, either. Centralized repositories, such as the App Store and Google Play, are no more responsible for our current predicament than NPM or Github. As long as we can create recipes for our machines to follow, we will have centralized forums in which to share them, regardless of their granularity.
So instead of throwing the baby out with the bathwater, there are, currently, several initiatives to make apps more combinable — to allow us to take single aspects of their functionality and mix and match them as needed. On the Android platform, there has been a mechanism known as Intents for quite some time. Apple, conversely, only introduced App Intents quite recently. App Intents can be used to break an application into smaller pieces. For instance, every platform has a camera app. It usually does a lot more than take photos. We also use them to start and stop video recordings. In the language of App Intents, even though both of these live in the same code base, they would be different “intents.” Each one is defined by an optional input or output. The output of a “take a photo” intent is a photo, as a file. The output of a “take a video” intent is a video, as a file. Two completely different outputs. Two completely different intents.
Very quickly, one starts to see the potential of such a system. It’s just like piping inputs and outputs from shell commands into one another. As long as each intent follows some shared standard — file formats, data types, etc. — you can mix and match them in any way you can imagine. Indeed, that’s probably why Apple’s Shortcuts, which is built on App Intents, is my favorite app, and it’s only getting better. Apple is doubling down on App Intents, with more changes to Shortcuts coming in iOS 17. However, the feature has been slow to catch on. As a result, its usefulness still lags far behind that of its textual predecessors.
There are a number of pieces of friction preventing their adoption. For one thing, it’s very difficult to know what intents an app contains before you download it. This makes searching for discrete actions to add to your workflow very difficult. There is no sub-section on an App Store page that tells you about the Intents an app enables. Additionally, there doesn’t seem to be a glut of developer adoption. Most of the apps on my iPhone don’t expose intents — though this is obviously subject to a selection bias. This might be because there isn’t a clear path to monetize their use. App developers depend on any combination of (1) one-time purchases, (2) subscriptions, and (3) in-app purchases. It’s not entirely clear to me, yet, how application developers can effectively monetize their work using marketplaces that are optimized for stand-alone applications. In the world of language-specific package indexes, we’re accustomed to this problem — you either give your work away for free and ask for a donation or you wrap it in a pay-per-use API.
On top of all of this, people have become used to the stand-alone model. It would take time and training to bring the average computer user to the point where they could take full advantage of these features, even if they did exist at scale. But what would happen if we overcame these frictions? What would happen if the Intents architecture were universal?
Web Intents (Again)
Incidentally, there was a feature for the web called Web Intents, introduced simultaneously by Mozilla and Google, back in 2011. It was closer to the Android concept of Intents than Apple’s vision of Intents. The standard outlined a small set of actions, such as “sharing” or “editing this type of file” that individual web apps could register to handle. When a user indicated that they wanted to invoke an intent, they would be prompted to select from one of those registered websites. In general, it worked more like the “share” sheets that are common in both operating systems, now. The feature didn’t seem to catch on, fizzling out shortly after its introduction.
But what if we tried again? Apple’s App Intents work differently. Instead of setting a fixed limit on the actions an app can handle, it lets apps tell you what they can do. Intents keep track of three simple things: a description of what a chunk of code does, the exact data type it takes as an input, and the exact data type that it provides as an output. It is this simplicity that makes it so powerful — a concept we refer to in language design as atomicity. What if we built web applications using this style of Intents? If we did this right, we could leverage the work that developers are already doing to architect their applications.
This isn’t such a far-fetched idea. Apple’s platforms have always prioritized web applications, featuring home-screen optimization features for websites even before the iPhone had an App Store. In the last several months, the Safari and Mac teams have demonstrated a renewed zeal for the platform, such as by implementing the Notifications API and the Notification Badge API. Last week, at WWDC, they announced that Web Applications would live on the Mac Dock as first-class citizens, just like on iOS. These features have been around on other platforms, such as Google’s Chromebooks and Android devices, for some time. The line between web applications and native applications has never been so blurry.
So what if web application developers were to register the pieces of their applications that they are already building? Most of us already architect our software into reusable pieces. What if these combinable services were exposed to operating systems and users? Suddenly, the openness and ease of discoverability that is inherent to the web would make these bite-sized computational pieces available at a massive scale. The act of doing something would become almost as simple as searching for how to do it. The source of information and the source of computation would become one and the same.
But let’s take this one step further. In true-to-Web fashion, these bite-sized computational pieces and the recipes we make with them could run on any device. Writing, running, and sharing them between machines would be trivial. Just as shell scripts are easily portable across computers — as long as their dependent command are available — we could create chains of Web Intents that would be instantly reusable for any web user.
Think, too, what this would mean for a single user with multiple pieces of hardware. If a Web Intent were instantly available to run on any one of their devices, it wouldn’t matter which device they use to invoke the intent. It wouldn’t even have to be the same one that runs the intent. A user could trigger a workflow from their phone and distribute the workload across all of their available devices. A batch process for manipulating images, for example, could be invoked from a phone and run on a desktop or server.
This is what I think of when I think of a world where we focus on what we want to do, instead of on where we need to do it. It’s one where we can express individual actions, automated workflows, and whole programs, in terms of intents. Instead of focusing on programming languages, environments, and obtuse syntaxes, the act of programming becomes that of drawing a map. We select a starting point, our data, and then we draw a route that transforms that data incrementally through many small steps until we get it into the form and location that we want it.
Imagine something like Apple’s Shortcuts, but more inclusive, more far-reaching. Imagine selecting and arranging these pieces right in front of your eyes on an Apple Vision Pro, pulling threads between them as one might weave a web or paint on a canvas. Imagine using AI tools — for whom the analysis of intent is their bread and butter — to compose and manipulate these computational building blocks. Computational thinking would no longer be the express domain of those brave few of us who have stuck around on a CLI. Computer programming would become no different than playing with Legos. The very act of thinking a thing would become the majority of the work in making it so. More succinctly, as Michael Crichton put it, in Sphere: “Imagining it is what makes it happen.”
In researching this article, I came across the work of Charles Simonyi. He is a Berkley graduate and a veteran of several amazing teams, such as Xerox Parc and Microsoft’s Application Development team. In the early 2000s, Simonyi left Microsoft to start a company called Intentional Software, which focused on his concept of “Intentional Computing.” It was a surprising discovery for me, as I had already chosen the title for this article. It doesn’t look as though the project ever saw the light of day, though. Microsoft purchased the company in 2017, and I haven’t been able to find any later references.
In an article from 2007, Jason Ponton of the New York Times wrote about the project in an article titled Awaiting the Day When Everyone Writes Software. First, he listed all of the reasons why we should want to make this kind of computation accessible to more people — professional, low-level programs are hard to write and their programmers are expensive to hire. These are often framed as “business problems,” but the consequences go far beyond matters of business. To this point, there was a quote by Simonyi that caught my eye. He said that: “Software is truly the bottleneck in the high-tech horn of plenty.”
Intenful Computation is not just a fun science project. It’s a tool for turning thoughts into actions. It’s a tool that helps us to easily express ourselves in concrete, rigorous, actionable language. It goes further and makes this kind of power accessible to everyone, not just experienced developers. It will take the creation of new infrastructure, such as a universal, web-oriented Intents system. It will take training and a re-thinking of the ways we interact with computers. Neither of these are simple problems. I believe, though, that if we can imagine it, we can make it happen.