Robots use eye movement for greater social interaction
Imagine a world where robots walk and talk indistinguishably among us. Robotics professor Yaser Sheikh and mechanical engineering Ph.D. student Hyun Soo Park are working on research that will one day allow robots to understand and adapt upon social interactions.
“Currently, robots have a very good understanding of the environment around them, the geometric relationships,” Park said. “But we think that they need more information. In order for robots to be collaborative with humans as a team member, they need to understand the social context they are in. Robots can do tasks, but truly interacting with humans requires some sort of empathy.”
But how can robots possibly understand social interactions? Sheikh and Park have developed an algorithm that takes into account where people are looking in order to analyze the social context of a situation.
“The gaze is a very interesting social signal,” Park said. “Where you are looking at is a very strong cue to understand[ing] what is going on around you.” By using head-mounted cameras to track where people look, one can determine the point where the most gazes intersect.
Sheikh and Park use the videos captured by head-mounted cameras to track the directions people are looking at. The algorithm works by knowing where two people are and the directions they are looking at, and extends these directions to find where they meet. The point with the most intersections is determined to be the most important, or most “socially salient,” event in the room.
One application of this concept is integrating robots into the work force. “A number of professionals already wear head-mounted cameras, such as soldiers, search and rescue teams, and police officers,” Sheikh said.
Park also mentioned that this method of analyzing social behavior can be used to study children with autism. Either the clinician, parent, or the child would wear the head-mounted camera.
However, Sheikh believes that head-mounted cameras will soon become popular among ordinary consumers as well. “I consider it likely that head-mounted sensors and displays will be as commonplace in a decade as smartphones are today,” he said.
In fact, the prototype of Google Glass is scheduled to come out soon. Google Glass is an aluminum strip with a display, a camera, and two nose pads that one can wear like a pair of glasses. Park explained how the cameras that were originally used for their project were relatively large and bulky; the more recent models are more accurate, smaller, and can be worn conveniently as a headset.
“Google Glass could improve the accuracy of our research because of its sleekness,” Park said. “Although the downside is that its resolution will probably not be as good.” High resolution is important to their research because it helps them better estimate a camera’s position and orientation in three-dimensional space.
The quest to interpret human interactions for robots can be further improved by incorporating other inputs besides gaze intersections, such as facial expressions or audio signals. Sheikh explained how small gestures such as a casual shrug or a slight nod of the head are easily interpreted by humans.
“Human perception is incredibly good at picking these up, and we’re trying to understand how to develop algorithms that can do so too,” he said. “To this end, we’ve built a massive sensing dome in the basement of Newell-Simon with hundreds of cameras, and are developing algorithms to learn from what these cameras see.”
Sheikh and Park have been using head-mounted cameras to understand social interactions for the past year and are funded by the National Science Foundation. As they continue their research, their findings will not only help humans better understand social dynamics and behavioral disorders, but may one day help robots understand us too.