Microsoft and Facebook both recently announced that they will be heavily investing in bots. Each company has created an accompanying framework for developing chatbots, which you can use to deploy your own bots on their respective services—Skype and Messenger. Microsoft’s Satya Nadella is so bullish on bots that he spun up a Domino’s chatbot in real time when he made the company’s announcement, which is super cool and exactly why I am long MSFT!
x.ai has long believed that the future is not apps and that a new paradigm will emerge in which you use natural language to interact with software. These new frameworks make that case even more strongly than we ever could.
But in the rush to bring forth this new reality, what a bot is and what it can do is not so obvious.
When people talk about bots they might be referring to one of several things: a conversational UI, a simple software program (bot) that you interact with via a conversational UI, or an intelligent agent.
Let’s break these down: the first is pretty simple. Since the advent of the graphical user interface (GUI) in the early seventies, which was popularized by Apple a decade later, we’ve relied primarily on visual metaphors to interact with software; you click a “file” and drag, you select a “button,” you pull down a “menu” and then “highlight” a phrase in order to accomplish whatever task you are working on, whether that’s formatting a cell in a spreadsheet or ordering your lunch on Seamless. These days, you rarely (and for most people, never) enter a string of machine-readable text into a command line to operate software.
Conversational User Interfaces dispense with the pointing, swiping, dragging, tapping, and toggling that we’ve grown accustomed to. Instead, you interact with software on platforms like email, SMS, Slack, and, now, Skype and Messenger, using only text or voice (which is then translated into text). But unlike the command line of the past, which required machine syntax, these conversational interfaces let you interact with software using only natural language (more or less). Many different types of software can exploit conversational UI—from simple chatbots like the Domino’s bot to more complex intelligent agents like ours (more on that later).
Chatbots are simple pieces of software that you interact with via a conversational UI. Statsbot, for example, is one of several dozen chatbots that you can use on Slack. Basically, you type @statsbot to summon the bot, and then ask it questions about your key metrics. Statsbot then delivers the results into Slack (in the form of charts and tables generated by Google Analytics). The Statsbot is essentially a conversational layer on GA. It doesn’t add any real intelligence, but rather gives the user a new way to access (simple text in a Slack channel) and new place to view (within that same Slack channel) analytics data.
As Statsbot suggests, these bots are really question and answer machines that run in real time. They might help on a task but they don’t understand the job you are trying to get done, and then run away and work on multiple steps over many hours or days to do it.
The Microsoft and Facebook Bot development tools are quite impressive, and I have no doubt that ingenuous engineers will build chatbots to do all sorts of things we haven’t imagined yet. Right now, though, they’re brand new, and so the bots we’ve seen thus far have been very basic. Facebook’s shopping chatbot (Spring), for example, shifts the interaction from the web to Messenger without adding any additional functionality—and in fact you are still doing a fair bit of pointing and clicking. There’s nothing wrong with that, and it may be far easier to order shoes or flowers via text than via the web (it certainly is easier, if you are already texting with friends).
These early chatbots speak to the challenges of building real intelligence behind a conversational UI—which brings us to Amy Ingram.
Intelligent agents, which get lumped into this “Bot” universe, often operate through a conversational UI but not always. The Google Self Driving car is an intelligent agent that you may run by pushing a start button in addition to speaking to it.
No matter what the interface, intelligent agents share one key characteristic—they are fully autonomous. x.ai is building Amy Ingram, an intelligent agent that schedules meetings for you. All you do is cc firstname.lastname@example.org, and she’ll take over the tedious email ping pong that comes with setting up a meeting. You won’t hear from her again until she has successfully negotiated a day, time, and location with your guest. She doesn’t help you set up a meeting, she does the whole job for you, just like a human assistant would.
Intelligent agents themselves come in a wide variety. Amy is goal based, which means she is designed to achieve a specific objective (setting up a meeting). I’ve written before about the technical challenges of training a machine to understand scheduling conversations. Suffice it to say, to achieve their goals, intelligent agents must understand all the nuances of their given domain, and this makes them hard to build (which is probably a major understatement if you ask the x.ai Data Science teams). It’s taken us nearly three years to develop Amy (and her brother Andrew), and we are still training her.
I would like to add a few more examples of intelligent agents. The reason I don’t is simply that we’ve only very recently reached the moment when it’s technically possible to create them. I expect that chatbots will proliferate quickly, and some might even turn into intelligent agents over time. But it will take years before we see more than a handful of truly intelligent agents. That said, if you know of any intelligent agents, those that can take over an entire job start to finish, please post them in the comments. Amy could use some friends.