Children with autism spectrum conditions often have trouble recognizing the emotional states of people around them — distinguishing a happy face from a fearful face, for instance. To remedy this, some therapists use a kid-friendly robot to demonstrate those emotions and to engage the children in imitating the emotions and responding to them in appropriate ways.
This type of therapy works best, however, if the robot can smoothly interpret the child’s own behavior — whether he or she is interested and excited or paying attention — during the therapy. Researchers at the MIT Media Lab have now developed a type of personalized machine learning that helps robots estimate the engagement and interest of each child during these interactions, using data that are unique to that child.
Armed with this personalized “deep learning” network, the robots’ perception of the children’s responses agreed with assessments by human experts, with a correlation score of 60 percent, the scientists report June 27 in Science Robotics.
It can be challenging for human observers to reach high levels of agreement about a child’s engagement and behavior. Their correlation scores are usually between 50 and 55 percent. Rudovic and his colleagues suggest that robots that are trained on human observations, as in this study, could someday provide more consistent estimates of these behaviors.
“The long-term goal is not to create robots that will replace human therapists, but to augment them with key information that the therapists can use to personalize the therapy content and also make more engaging and naturalistic interactions between the robots and children with autism,” explains Oggi Rudovic, a postdoc at the Media Lab and first author of the study.
Rosalind Picard, a co-author on the paper and professor at MIT who leads research in affective computing, says that personalization is especially important in autism therapy: A famous adage is, “If you have met one person, with autism, you have met one person with autism.”
“The challenge of creating machine learning and AI [artificial intelligence] that works in autism is particularly vexing, because the usual AI methods require a lot of data that are similar for each category that is learned. In autism where heterogeneity reigns, the normal AI approaches fail,” says Picard. Rudovic, Picard, and their teammates have also been using personalized deep learning in other areas, finding that it improves results for pain monitoring and for forecasting Alzheimer’s disease progression.
Robot-assisted therapy for autism often works something like this: A human therapist shows a child photos or flash cards of different faces meant to represent different emotions, to teach them how to recognize expressions of fear, sadness, or joy. The therapist then programs the robot to show these same emotions to the child, and observes the child as she or he engages with the robot. The child’s behavior provides valuable feedback that the robot and therapist need to go forward with the lesson.
The researchers used SoftBank Robotics NAO humanoid robots in this study. Almost 2 feet tall and resembling an armored superhero or a droid, NAO conveys different emotions by changing the color of its eyes, the motion of its limbs, and the tone of its voice.
The 35 children with autism who participated in this study, 17 from Japan and 18 from Serbia, ranged in age from 3 to 13. They reacted in various ways to the robots during their 35-minute sessions, from looking bored and sleepy in some cases to jumping around the room with excitement, clapping their hands, and laughing or touching the robot.
Most of the children in the study reacted to the robot “not just as a toy but related to NAO respectfully as it if was a real person,” especially during storytelling, where the therapists asked how NAO would feel if the children took the robot for an ice cream treat, according to Rudovic.
One 4-year-old girl hid behind her mother while participating in the session but became much more open to the robot and ended up laughing by the end of the therapy. The sister of one of the Serbian children gave NAO a hug and said “Robot, I love you!” at the end of a session, saying she was happy to see how much her brother liked playing with the robot.
“Therapists say that engaging the child for even a few seconds can be a big challenge for them, and robots attract the attention of the child,” says Rudovic, explaining why robots have been useful in this type of therapy. “Also, humans change their expressions in many different ways, but the robots always do it in the same way, and this is less frustrating for the child because the child learns in a very structured way how the expressions will be shown.”
Personalized machine learning
The MIT research team realized that a kind of machine learning called deep learning would be useful for the therapy robots to have, to perceive the children’s behavior more naturally. A deep-learning system uses hierarchical, multiple layers of data processing to improve its tasks, with each successive layer amounting to a slightly more abstract representation of the original raw data.
Although the concept of deep learning has been around since the 1980s, says Rudovic, it’s only recently that there has been enough computing power to implement this kind of artificial intelligence. Deep learning has been used in automatic speech and object-recognition programs, making it well-suited for a problem such as making sense of the multiple features of the face, body, and voice that go into understanding a more abstract concept such as a child’s engagement.
“In the case of facial expressions, for instance, what parts of the face are the most important for estimation of engagement?” Rudovic says. “Deep learning allows the robot to directly extract the most important information from that data without the need for humans to manually craft those features.” For the therapy robots, Rudovic and his colleagues took the idea of deep learning one step further and built a personalized framework that could learn from data collected on each individual child. The researchers captured video of each child’s facial expressions, head and body movements, poses and gestures, audio recordings and data on heart rate, body temperature, and skin sweat response from a monitor on the child’s wrist.
The robots’ personalized deep learning networks were built from layers of these video, audio, and physiological data, information about the child’s autism diagnosis and abilities, their culture and their gender. The researchers then compared their estimates of the children’s behavior with estimates from five human experts, who coded the children’s video and audio recordings on a continuous scale to determine how pleased or upset, how interested, and how engaged the child seemed during the session.
Trained on these personalized data coded by the humans, and tested on data not used in training or tuning the models, the networks significantly improved the robot’s automatic estimation of the child’s behavior for most of the children in the study, beyond what would be estimated if the network combined all the children’s data in a “one-size-fits-all” approach, the researchers found.
Rudovic and colleagues were also able to probe how the deep learning network made its estimations, which uncovered some interesting cultural differences between the children. “For instance, children from Japan showed more body movements during episodes of high engagement, while in Serbs large body movements were associated with disengagement episodes,” Rudovic says.
The study was funded by grants from the Japanese Ministry of Education, Culture, Sports, Science and Technology; Chubu University; and the European Union’s HORIZON 2020 grant (EngageME).
Massachusetts Institute of Technology
Ognjen Rudovic, Jaeryoung Lee, Miles Dai, Björn Schuller, Rosalind W. Picard. Personalized machine learning for robot perception of affect and engagement in autism therapy. Science Robotics (2018). DOI: 10.1126/scirobotics.aao6760
An example of a therapy session augmented with humanoid robot NAO [SoftBank Robotics], which was used in the EngageMe study. Tracking of limbs/faces was performed using the CMU Perceptual Lab’s OpenPose utility.
Credit: MIT Media Lab