EDITOR’S NOTE: In a previous post, Michael Carpenter discussed the voice interface features of Voice Typing and Voice Commands and their impact on smartphone use. In this post, he covers two more interfaces: Voice Actions and S Voice.
In a recent interview with Walt Mossberg and Kara Swisher at the D10 Conference, Apple CEO Tim Cook stated that Apple intends to “double down” on Siri, which analysts predict may result in expanded voice control for the iPad, as well as for the OS X desktop experience. With this in mind, new voice control features are worth exploring. Here is a quick survey of two voice interfaces, their features, and their limitations.
Voice Actions are Android OS-level commands that are integrated into the Google Voice Search app. Voice search is one of the oldest voice control features in the Android ecosystem, having launched with the original Android OS. Despite its age, however, additional controls, or Voice Actions, have been implemented as the software has matured. Voice Actions use the Google apps by default, so if you say, “Email,” it will launch the Gmail app. If you say, “Navigate to,” it will launch Google Maps. Unlike Voice Commands, Voice Actions can be used to create and edit alarms, calendar invites, and emails, but lacks the ability to launch or use non-Google apps.
Additional features of voice search allow the service to be configured to block (more accurately, asterisk-out) offensive words and to perform image searches using Google’s SafeSearch, which filters explicit images from the search results. Another relatively old hat, having launched in December 2010, Personalized Recognition is a feature of voice search that improves the speech recognition accuracy by storing your voice recordings on the cloud where they are analyzed by Google. Personalized Recognition offers a glimpse into how important both high-speed, always-on Internet access and the cloud have become. Across all of the speech functions described here, the software on the device is merely a conduit to the Internet, where most of the processing heavy-lifting is performed on the cloud.
S Voice is a speech-controlled, virtual assistant that is similar to Apple’s Siri. S Voice can be invoked in two ways: by launching the app or by double clicking the Home button and announcing one of the standard wake up messages. Wake up messages can also be configured to include workflow actions, including checking for missed calls and messages, opening the camera app, checking the schedule, playing music, and making a voice recording.
S Voice goes beyond voice commands and Voice Actions by offering deep and unique integration into the OS with the ability to change some simple settings, such as toggling Wi-Fi and Driving Mode on or off. When performed with S Voice creating memos, calendar invites, and navigation commands launch the appropriate apps. A key differentiator between S Voice and Voice Actions is that S Voice can launch any installed app on the device by saying “launch app name”—which voice controls are enabled inside the app varies from developer to developer. If you wanted to take a picture with Instagram, for example, instead of the default Camera app, S Voice is the only way to launch the app using speech control alone. Although you can launch any app from S Voice, the apps that are associated with phrases within S Voice cannot be edited.
One strange limitation is the inability to send emails from S Voice. Any request to send an email is met with a negative response, indicating that S Voice is not allowed send email. We know that the Vlingo software underpinning S Voice is capable of sending email, if only with the premium version app, which leads us to conclude the limitation isn’t technological, making the exclusion of this common function from S Voice all the more baffling. Here is a video from noted tech blog The Verge that offers a side-by-side comparison between Siri and S Voice:
Voice Control’s Future: Full Speed Ahead
Voice control, like many aspects of the Android OS ecosystem, suffers from fragmentation that is a result of the open nature of the operating system. Variance in the implementation of voice control manifests itself in different behavior across apps that are part of the manufacturer’s preloaded software (e.g., Email, Messaging, Navigation) versus the Google apps that come as part of the Android OS itself (e.g., Gmail, Google + Messenger, Maps) or third-party apps downloaded from the Google Play store (formerly the Android Marketplace). The ability to send emails using Voice Actions but not S Voice serves as one functional example of the downstream impact fragmentation has on the user experience.
Despite the individual quirks and limitations of the various voice interface options, smartphones incorporating this technology offer cutting-edge of speech recognition technology. Like so much else in the rapidly evolving world of emerging technologies, we can anticipate that the pace of enhancements will only increase as more and more users begin to adopt the technology.