CSE 134A Discussion Section
Designing a speech interface
Friday, 2002-11-01
TA: Dana Dahlstrom
During section we had an open discussion about designing speech interfaces. These notes summarize the issues that came up, many of which were raised by students. I partially directed the discussion based on the paper Designing SpeechActs: Issues in Speech User Interfaces by Nicole Yankelovich and others. Naturally these notes overlap with the paper, but it's still worth a look.
Common problems with speech interfaces
I asked what problems students had had with telephone-based speech interfaces they'd used. Here are some of the responses:
- Can't go back. At some point you accidentally press the wrong key and suddenly you're stuck in the wrong part of the menu system. You can't figure out how to undo your mistake, so you end up having to hang up and call back.
- Long, uninterruptible recordings. You have to listen to so much unrelated information you get distracted. It's frustrating you can't skip past it.
- Can't repeat information. You finally wade through the menu system to find what you need, but you miss that crucial piece of information. Next thing you know, the system hangs up on you. "Thanks for calling! Good-bye."
- Vicious cycles. You press one key and get routed to one department, then press another and end up back where you came from. Déjà Vu!
Principles for speech interface design
Some of these are converses of the common problems above. Here are some things to remember to avoid mistakes often made in speech interfaces:
- GUIs don't translate well to speech. It's inadvisable to create a GUI first and quickly convert it to a speech interface after the fact. Imagine listening to a dictation of Yahoo's front page! See the next point.
- Keep menus small. In a GUI it is easier to focus on what's interesting and ignore what's irrelevant (such as banner ads and long parenthetical phrases), so it makes sense to present more information and options at once. In a speech interface equal time is given to each item presented; you can't simply scan for what you want.
- Make messages concise. Keep menu items, prompts, and feedback as brief and informative as possible. This is part of an overall theme of keeping up the pace of interaction. Users are waiting to speak; the longer they have to wait, the more frustrated they become.
- Give clear context cues. A speech interface should clearly communicate important state, such as "main menu" or some other title at every transition. Unannounced state changes can be disorienting.
- Prompt for input. Make sure it's clear when input is expected. This doesn't necessarily mean enumerating all available options every time. When the context is clear it may be appropriate to simply say, "What now?"
- Give immediate feedback. When a user issues a command, it is usually a good idea to acknowledge that command by repeating it to the user. For example, "OK, next item." Remember, users make mistakes, and so do speech recognition engines. It's confusing when the response seems inappropriate but you're not sure why. Immediate feedback lets the user recognize mistakes right away.
- Confirm irreversible actions. If something is difficult or impossible to undo, ask the user for confirmation before doing it. For example, "I heard you say, 'Launch intercontinental ballistic missile.' Is this correct?"