Making faces
November 9, 2016
Lynn Abbott
Samples of the RGB and corresponding depth images of the six basic facial expressions (happiness, surprise, anger, sadness, fear, disgust) in different poses.
Imagine an inexpensive, at-home therapy program that could help children with autism spectrum disorder (ASD) become more fluent in emotional expression. ECE associate professor Lynn Abbott is developing a technology-based tool that can potentially provide this service. Abbott, his students, and postdoctoral researcher Amira Youssef, have been working closely with a group from the Virginia Tech Department of Psychology to build an interactive computer-assisted system to teach children with ASD how to recognize and reciprocate facial expressions of emotion.
Autism statistics from the U.S. Centers for Disease Control and Prevention identify around 1 in 68 American children as on the autism spectrum. Studies have shown that young children with ASD have trouble recognizing and expressing certain emotions, which can affect their ability to infer emotions expressed by others and communicate nonverbally over the course of their entire lives.
In their proposal, Abbott and his interdisciplinary team described how a better understanding of the normal development trajectory of facial expression recognition might help in early identification and possible treatment of affective disorders such as autism, depression, and anxiety disorders.
Before they became involved in this project, Abbott and his doctoral student, Sherin Aly, were applying principles of machine learning to train a computer to recognize facial expressions.
"We were looking for an application where this would be useful. Who needs an expression-recognition system" said Abbott. "We talked to several parties only to discover that the Child Support Center, which is just about a block from here, was developing therapies or interventions that work with children who have developmental disabilities."
After strategizing with Susan White, the assistant director of the Child Study Center and a faculty member in the Department of Psychology, they submitted their proposal. In 2015, the National Institute of Child Health & Human Development granted them funding for a two-year feasibility study to investigate if the technology was mature enough to support this kind of intervention.
"Most interventions with children who have ASD involve sessions with psychologists or therapists, which can get expensive," said Abbott. "Our long-term goal is a system that could be set up in your living room to be used as often as needed and tailored to individuals."
The exploratory study is piloting a system that requires only a computer and a Microsoft Kinectan inexpensive motion-control console with a video camera and a depth sensor for capturing 3-D visual data. In the study, children are asked to watch and respond to images and videos on a computer equipped with a Kinect, which records and processes their facial expressions, and then provides feedback.
The Facial Expression Emotion Training system asks children to make the same expression as a face on the screen. A Microsoft Kinect takes video of the children and reads their facial expressions, giving them feedback on whether or not their expression matches the emotion on the screen. There are four levels, where the child sees anything from a cartoon to a video of a real person.
There are four levels to the Facial Expression Emotion Training (FEET) system, which are based on a wireframe representation of the face as sensed by the Kinect. The training starts with a simple, entertaining 2-D cartoon character making a happy, angry, frightened, or neutral face. The researchers ask the child to mimic the cartoon's expression, while the Kinect records them. The system "reads" the child's expression, and tells them if they are making the right face. Level two is an animated avatar exhibiting one of the four selected emotions, and level three contains video clips of a human actor. In all scenarios, the child tries to mimic the expression seen on the screen and receives feedback. While the first three levels are increasingly realistic faces, level four presents an emotionally charged situation and prompts the child to react. For instance, a picture of a birthday cake should elicit a smile, and a broken toy might call for a sad face.
To determine if the child's expression is emotionally appropriate, Abbott and his team must first teach the computer what expressions are correlated to which emotionsa smile usually means happiness, a frown usually means anger, and so forth. To do this, they use "human coding", which means that their colleagues manually identify emotions for use in a large database of facial expressions. The results are fed into machine learning algorithms.
"We have spent a lot of time thinking about what parts of the face change when you display different emotions," said Abbott. "Sherin Aly's dissertation focuses on the methods to decide which changes in the face are significant."
Aly applied support vector machines, a standard machine learning technique, and extracted three different types of geometric features within the wireframe that would indicate the emotion on a facetriangular surface area, distance between points, and angle measurement.
Before the study began, Aly conducted training sessions with more than 30 participants, who made facial expressions for the Kinect. The 3-D wireframe representations of their faces now populate one of the largest publically available datasets for Kinect facial expressions.