Voice Assistant Concepts
The basic building blocks of Slang Assistants
Last updated
The basic building blocks of Slang Assistants
Last updated
The role of the Slang Voice Assistant is to enable the users of the app (with which it's integrated) to complete one or more User Journeys by voice, either fully, or partially, aided by touch. Technically, this is achieved through back-and-forth interactions between the Assistant and the app, in which the Assistant repeatedly notifies the app of the User Journey requested by the user, along with the accompanying data, while the app responds by informing the Assistant of the various App State transitions it makes, which allows the Assistant to speak appropriate messages or request more information from the user.
In order to effectively integrate Slang's Voice Assistant with your app, you will need to be familiar with the following concepts.
Slang CONVA provides Voice Assistants that are specific to a Domain, which corresponds to a real-world domain in which businesses and apps exist, such as Retail, Travel, Hospitality, Finance, etc. Each Voice Assistant in a specific Domain, supports a specific set of User Journeys for which it has been pre-trained.
Each domain consists of one or more Sub-domains. A Sub-domain corresponds to a data variant that could exist within a Domain. For example, within the Retail Domain, we could have Sub-domains for Grocery and Pharmacy. Similarly, within the Travel Domain, we could have Sub-domains for Flights, Trains, and Buses.
Typically, Slang's Voice Assistants for each Domain provide out-of-the-box support for one or more Sub-domains, and in some cases also allow Custom Sub-domains, where the customer is expected to upload data specific to that sub-domain.
A user journey is a path a user may take to reach their goal when using a web or mobile app
User journeys, in Slang's context, represent various journeys or functionalities that are supported by the Assistant, which would result in the user reaching some logical end-point inside an app from that domain.
A User Journey can consist of an initial App State, a final App State, and zero or more intermediate App States.
The user journeys begin in one of two ways -
The user speaks a command that the Assistant is able to recognize as the beginning of a new user journey. Eg: "show me onions" or "where is my last order". The Assistant will analyze the utterance and determine which user journey belongs to it and will call an appropriate method (eg onSearch or onOrderManagement) that the app is expected to instantiate (more details in the Getting Started section)
The app launches an explicit user journey programmatically, eg to fill a form optionally by voice, as soon as the user lands on a relevant page.
Sample user journeys: Search, Order management, Offers
The data associated with every user journey. It's the data that the Assistant extracted from what the user spoke or from what the user did on the UI or something that the app wants to pre-inform the Assistant.
Sample Context items: Item, brand, quantity, unit, source & destination stations
An app state is a point in the user journey path that represents a specific form and context the app is in or wants to be in, based on the data it received from the Assistant or directly from the end-user via the screen.
That's a mouthful. Let's try to simplify it.
From a developer's perspective, in its simplest form, you can think of the various screens through which a user transitions while completing a User Journey as the various App States for that journey.
As an example, for the Search User journey, here are some sample app-states.
The search user journey will be triggered when the user is trying to search for an item and can optionally give enough details to uniquely identify an item (eg: aashirvaad 2 kg aata - here the brand and the size variant is also specified and the app can potentially map this to a specific SKU item in a retail e-commerce app)
SEARCH_RESULTS - when the app is showing the search listing based on the input it received from the user
PRODUCT_DETAILS - when the app is showing a product detail page that is specific to the item that the user searched for
CART_DETAILS - when the app wants to automatically add the item to cart based on what the user asked for
An App State is fully self-sufficient, which means that it is descriptive enough to provide all the necessary information to the Slang Assistant by itself, but it can optionally be enhanced with more details (conditions) to provide more fine-grained information that better represents the true state of the application at that point.
For eg, in the SEARCH_RESULTS app state (which should be triggered when the app wants to show a search listing, based on a search command that the user gave), the following additional conditions are available for the app to inform the Assistant a more precise state of the app.
SearchSuccess - when the app is showing the user the search listing
ItemNotFound - when the app is not able to find the item being searched
ItemInvalid - when the app is not able to match to any existing SKU item
For another example the PRODUCT_DETAILS app state (which should be triggered when the app was able to narrow down an item explicitly based on the search command that the user gave), the following conditions are available for the app -
QuantityRequired - when the app wants to know the quantity the user is interested in
AddToCartSuccess - when the app has added the item to cart
In the QuantityRequired case, the Assistant will automatically prompt the user to specify the quantity and it will inform the app with the details but also preserve details of the user journey up to that point.
Every App State has a default condition.
Non-terminal Conditions - Conditions that conceptually represent points in the app where the journey is not complete. It will require some input from the user to move ahead and the same will be collected by the Assistant. The app could also allow collecting the same information from the screen. Eg is when the app needs the user to specify the quantity of an item they are requested to be added to the cart
Terminal Condition - Conditions that conceptually represent points in the app where the journey has ended. The Assistant will speak out the details of the state to the user. The app would also be showing something related to that app state on the screen. Eg is when the app is showing a search result and they want to speak out a successful showing of the listing
The final piece of the puzzle is the things that the Assistant speaks back to the user. There are 4 types of prompts that the Assistant handles today -
The message that the Assistant will speak when the user starts using the Assistant. Eg like "Welcome to Retail Assistant. What item do you want to buy?".
There are different types of Greeting prompts
Global greeting - The greeting that is spoken when the Assistant is invoked without any explicit user journey affinity. Eg: "Welcome to BigBasket. What do you want to do today?"
User journey specific greeting - The greeting that is spoken when the Assistant is invoked having an affinity for a specific user journey. Eg: "What item do you want to buy?"
The message that the Assistant will speak whenever it does not understand what the user is saying. Eg: "Sorry I did not understand. Please try again".
There are again two types of Clarification prompts
Global clarification - The clarification message that is spoken when the Assistant is invoked without any explicit user journey affinity. Eg: "Sorry I did not understand. Please try again"
User journey specific clarification - The clarification that is spoken when the Assistant is invoked having an affinity for a specific user journey. Eg: "Sorry I did not understand that item. Please try again" - notice the mention of the word item which is what it would be if the user journey was Search
The message that the Assistant will speak when asking for very specific input from the user. Eg: "Please mention the amount you want to buy?".
This would happen when the app sets a non-terminal app state condition.
The message that the Assistant will speak when it wants to tell some information to the user. Eg: "Showing you onions"
This would happen when the app sets a terminal app state condition.