User eXperience

What is a voice user interface? A comprehensive guide

Voice is a trendy medium for interacting with computer devices. This has been the case for years now— from the use of popular virtual assistants like Siri to voice-controlled speakers, Voice User Interfaces (VUI) have been a topic of much enthusiasm for brands and their end users alike. For instance, over 80 million adults use smart speakers in the USA according to a Voicebot survey.

But what is a VUI? And why is this becoming increasingly popular today?

VUIs play a crucial role in how AI is impacting UX and other fields today, the ability to converse with machines like you would a human is a fascinating feature offering many benefits to users. And this is only the beginning, studies project that the voice recognition industry will be worth 25 billion in 2025.

In this article, we dive into voice user interfaces, their current limitations and benefits, and important tips on designing them. Let's get started!

What is a Voice User Interface?

A user interface simply refers to the parts of a digital product a user interacts with in the process of using the product. Designing a user interface requires knowledge of important UI principles and a keen understanding of your potential users.

A voice user interface is, therefore, an interface that relies on verbal interactivity — the use of the human voice as a medium to "communicate" with a digital product. These could be smart speakers, voice-enabled phones, AI assistants, etc.  

The purpose of this interaction is to achieve a goal. Thus, at the core of voice-driven interfaces is the need to ensure that users achieve their goals with much better ease than traditional means can offer— convenience.  

To fully understand this, let's take a look at the progression of user interfaces over the years.

A brief history of user interfaces

Command line interfaces: the beginning  

The first PC that was released by IBM in 1981 came with Microsoft's MS-DOS, a command line interface that involved users entering and executing text commands. This was tedious.

Graphic user interfaces: what changed everything

Apple revolutionized the way we interact with machines with the popularization of Graphical User Interfaces in 1984's Apple macintosh PC. It can be argued that this played a direct role in the explosion of the tech industry.

Voice user interfaces, virtual and augmented reality: where we are now

With the recent developments in technology, notably AI and machine learning. We are beginning to see more sophisticated ways to interact with machines. People can now use voice commands instead of text or tedious gestures.

So, let's look at how voice user interfaces actually work.

woman sitting at desk using laptop computer and pr


How do voice-driven interfaces work?

Humans have been communicating through speech since the dawn of time. It is an inherent part of us. Hence, giving voice commands isn't something we are alien to, the problem lies in the receiver.

With computers and other electronic gadgets, going from writing complex commands to speaking to devices like you would a human is surely a big step.

But how exactly do they achieve this ability and what is the role of the VUI/UX designer in this?

The process of designing conversational interfaces is a complex one involving several professionals, from designers to data analysts and software developers. That said, VUIs are essentially built on AI technology which includes Natural Language Processing techniques like Speech Recognition and Synthesis, Name Entity Recognition, etc.

The VUIs collect voice input and process them in a backend usually hosted in the cloud. And then a response matching the identified intent is sent back.

Designers have to conduct research to understand user pain points and the type of questions or commands they can possibly give.

Benefits of Voice User Interfaces

Speech user interfaces provide some benefits that other interfaces cannot, some of them include:

Speed

Using voice commands is faster than typing on a keyboard or operating a touch screen. With ordinary speech, you can get devices to play a particular music, or search the internet for the meaning of a phrase. VUIs help us interact with computers using a method we know how to use best — natural language.

This streamlines the whole task by eliminating physical effort thereby increasing the speed with which we achieve our goals.

Convenience

At the core of voice-controlled interfaces is convenience. The comfort of simply using natural speech to interact with sophisticated technology cannot be understated. VUIs ensure we can achieve different objectives without having to perform complex work. We simply speak— probably half asleep or while driving— and our command is interpreted anyways.

Accessible design

VUIs eliminate a host of accessibility concerns. In essence, it is a touch and visual-free way of interacting with computers. For users with challenges in sight or other limitations, VUIs offer a medium where they can easily speak and get a response.  In regular interfaces, people can struggle with color contrast, making sense out of visual clutter, etc. But VUIs eliminate these and many more.

Ease of use

Since your voice is the major thing you need when using VUIs, there's no special learning curve or rules you need to stick to. You can use VUIs comfortably without sitting through a complex onboarding flow or finding a skills shortcut. Because speech is something you are naturally equipped to handle.

Productivity

To be productive as a UX designer, especially a freelancer. You need to devise means to guarantee that you properly manage your time and resources.

For instance, with the developments in virtual assistants, you can easily request information of interest using voice commands —you can do so while focusing on some other work.

woman controlling home devices with a voice command


Limitations of voice interfaces

Though the use of VUIs has met wide welcome, there are still challenges this exciting technology faces and problems they can pose to the people who use them.

They are resource intensive

From receiving voice data to processing it and returning a suitable answer or carrying out a command, effective voice-controlled devices, and virtual AI assistants require serious computing power and resources. One of the limitations of AI-based technologies is the cost of running such systems.

Thanks to developments in cloud computing, the cost of storage and hosting services are seeing progressive changes, but a lot still needs to be done.

Adapting UX processes to voice design

The field of user experience design from its inception has focused almost solely on visual interfaces. This has led to the development of processes and principles centered on such interfaces.

Voice design, as it is sometimes called, poses new challenges as there hasn't been much done in this field. As with every new field, time is needed to fully develop the best practices around it. For UX designers who are already accustomed to GUIs, switching from virtual to voice will require more time.

Conversational systems are complex

VUIs have gradually progressed from simple control systems to conversational ones that attempt to hold human-like conversations. Building conversational interfaces involve a lot of complexities. And massive amounts of data.

For instance, there are possibly millions of human dialects out there to worry about. While developing a virtual assistant in English, what kind of English would it be, British or American?

The complexity of such systems also requires exceptionally skilled professionals to develop and maintain.

Data privacy

As with everything AI-based. You need a lot of data. The data is what the AI systems are trained on. The higher the quantity and quality of the data, the more fine-tuned and impressive the responses of a voice interface would be.

Accessing this data raises serious challenges for designers. For security reasons, there are a lot of privacy policies in different countries, and violating them in the course of obtaining data can result in costly lawsuits.

Popular Voice User Interfaces

Some of the well-known voice-driven interfaces include:

Important UX steps in designing a Voice User Interface

Regardless of the nature of the interface you want to develop, the innovation and human-centered process known as the design thinking framework is important— in VUI design, the designers can implement tasks by closely following its iterative design process.

Just like in UX design, the goal of this endeavor is to place the user front and center of your solution. It is important to remember this fundamental requirement. There could be temptations to introduce your own opinions but doing that will only result in flawed data — and an AI bot that delivers wrong answers in a confident voice.

The design thinking framework consists of five distinct stages of product design best practices. Here are the most notable ways designing for voice differs from regular UX design.

Conducting research

User research

The fundamental goal of user research in voice design is to identify the questions the users are asking and how they are generally using voice features in their day-to-day activities.

  • What do users use voice features for?
  • Do they ask questions or give commands, and what type of questions/commands?

These could be important questions to factor into the design of a voice app.

A user journey map is one tool that can help designers make sense of users' goals, motivations and frustrations. In more complex voice products, this can aid them to recognise not just the appropriate questions to add to the product but where to add voice features.

Competitive analysis

The next important research a voice designer should perform is competitive analysis.

Competitive analysis simply means researching a product's rivals, analysing their strengths and shortcomings in the hope of gaining insight into what works and how to improve the standards by designing your unique product offering.

In VUI design, the goal is to focus on other similar voice-based products and identify their use cases, the commands they use, customer reviews, etc.


Are you overwhelmed or confused about the above, take a look at this article about what is UX research.

Designing the VUI

VUI designers have to create dialog flows— different conversational scenarios based on requirements they obtain from research. These dialog flows will represent the interactional model between the users and the AI assistant.

The voice commands which make up these dialog flows contain three components:

The intent

The intent refers to the objective of the command. Why is the user giving the command and what response do they expect from the Interface?

For instance, if a user says "Siri, play some rock music", this is a pretty straightforward command and the user expects to be provided a list of rock music. This kind of command is called a high utility request.

On the other hand, a request like "tell me more about America" isn't so specific, are you asking about the capital, population, or economy? These sorts of requests are called low-utility requests.

The utterance

This refers to how the user frames the command. Knowing and planning for various ways a user can make a request is essential for a robust interface.

"Play a rock song" can still come as "play me a rock tune" or "I want to hear a nice rock song."

Preparing voice assistants for diverse word phrasings can help them deliver more effective responses.

The slot

Some parts of the command are variables —sometimes, optional —that add more depth to the command or better describe a request in a way necessary for obtaining meaningful responses from the AI assistant.

For instance, if a user wants to book a taxi, a slot to be filled could be the "time", instead of the default "as soon as possible" the user can opt for a specific time.

Prototyping and Testing

Once the designer creates and maps out the dialog flows between the users and the AI assistant, they can use voice prototyping tools like Sayspring to test them in a voice-enabled Amazon or Google app.

Amazon and Google also offer their own tools and SDKs for building and testing these voice solutions for apps on their platform.

Here's the interesting part, these tools can help you obtain analytics after you have deployed your system. Some important UX metrics to monitor include:

  • Behaviour flows
  • Commands, Intent and Utterances
  • Sessions per user
  • Languages used

Tips for designing VUIs

Here are some tips to ensure you design speech user interfaces that meet user expectations and needs:

  • Use feedback to show when a task is completed, don't keep users wondering what happens next.
  • Implement a solid error control strategy. Make provisions for a bridge in communication.
  • Consider employing additional security measures for your interface, e.g. additional authentication methods like fingerprint or face recognition.

Conclusion

VUIs are not new but the recent growth in AI technology and related fields has contributed to the surge in this subject, and data indicates more migration from visual to voice interfaces in the future. Therefore, It is pivotal to understand your place as a UX designer in making voice products that users will be delighted to use.