How Things Work: Personal Digital Assistants

Credit: Alan Vangpat/Senior Staff Credit: Alan Vangpat/Senior Staff

Each year, companies announce a new slew of cell phone features. Personal digital assistants, like Siri for Apple’s newest iPhone and Iris for Android phones, have made using cell phones easier than ever before. These digital assistants make using a cell phone more intuitive: Instead of having to press buttons to give commands, one can simply talk to the cell phone as if it were another person.

The technology underlying these digital assistants has some of its roots in our campus.

In May 2003, the Defense Advanced Research Projects Agency initiated a project called Cognitive Assistant that Learns and Organizes (CALO) under its Personal Assistant that Learns program. The CALO program brought researchers from several universities, including Carnegie Mellon.

Alex Rudnicky, a research professor in the computer science department involved with the project, described CALO as having “revolved around the notion of having a computer system that would know about you, things you liked, and provide you access to information.” CALO was run by SRI International, an independent non-profit research institute that conducts contract research for government and business agencies. In 2010, Apple acquired Siri from SRI International and released it as an app.

Personal digital assistants require the use of two technologies: voice recognition and information synthesis.

Voice recognition helps the devices accurately convert human sounds into words, while information synthesis interprets what the human user needs and finds the information needed to complete the request.

According to the website SmartPlanet, voice recognition technologies have been around since 1940. Scientist at Bell Laboratories Homer Dudley patented a machine called the “Parallel Bandpass Vocoder,” which could recognize and output sounds based on what it heard. The first technologies of this kind could only recognize a few words by comparing them to signals in their memories and selecting the ones that best matched.

Since then, the technology has become more flexible and accurate, but its basic structure has remained the same. Because machines are not capable of understanding human speech, the best they can do is form reasonable guesses about what certain words sound like and when humans are likely to speak certain words.

Madison Calhoun, a sophomore chemical and biomedical engineering major, is a longtime iPhone user. Calhoun said that “while the technology is an improvement over what came before and is very convinient, it’s still not very good at recognizing voices.” However, as more people use voice recognition technologies, the more data these machines gather and the more intelligent they become.

After voice recognition, personal digital assistants synthesize the information that a user requests. This essentially involves a form with several slots for questions. “For each slot you have a question, like ‘Where do you want to fly?’ ” Rudnicky explained.

The digital assistant’s software comes with many such forms, and attempts to fit the words it hears into one of them.

For example, the software might have a form for looking up restaurants nearby. This form might have a blank space to be filled with the type of cuisine that the user wants.

If the digital assistant hears the phrase “Japanese food” in the user’s request, it could make use of the keyword “food” to realize that the user wants to look up restaurants. Then, it could default to this form, fill in the missing information, and perform a search for local Japanese restaurants. If the form is incomplete, the digital assistant might ask questions or ask the user to provide more details.

Digital assistants also continuously improve themselves by tailoring to their users; when a user tells the phone it is wrong, the phone remembers this and learns from its mistakes by fine tuning its algorithms to match the user’s needs. While this might sound long and complex, Rudnicky said that “a lot of these things whittle down to a fairly simple process.”

The technology behind personal digital assistants is still fairly new, but one of the limitations Rudnicky would like to see it overcome is its exclusivity to smartphones. The machines could also be more intuitive and more accurately understand a wider set of accents and colloquialisms. Until then, Calhoun said that she will stick to “typing with her fingers because it’s much faster most of the time.”