A GelSight sensor attached to a robot’s gripper enables the robot to determine precisely where it has grasped a small screwdriver, removing it from and inserting it back into a slot, even when the gripper screens the screwdriver from the robot’s camera.
Credit: Robot Locomotion Group at MIT
Eight years ago, Ted Adelson’s research group at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) unveiled a new sensor technology, called GelSight, that uses physical contact with an object to provide a remarkably detailed 3-D map of its surface.
Now, by mounting GelSight sensors on the grippers of robotic arms, two MIT teams have given robots greater sensitivity and dexterity. The researchers presented their work in two papers at the International Conference on Robotics and Automation last week.
In one paper, Adelson’s group uses the data from the GelSight sensor to enable a robot to judge the hardness of surfaces it touches — a crucial ability if household robots are to handle everyday objects.
In the other, Russ Tedrake’s Robot Locomotion Group at CSAIL uses GelSight sensors to enable a robot to manipulate smaller objects than was previously possible.
The GelSight sensor is, in some ways, a low-tech solution to a difficult problem. It consists of a block of transparent rubber — the “gel” of its name — one face of which is coated with metallic paint. When the paint-coated face is pressed against an object, it conforms to the object’s shape.
The metallic paint makes the object’s surface reflective, so its geometry becomes much easier for computer vision algorithms to infer. Mounted on the sensor opposite the paint-coated face of the rubber block are three colored lights and a single camera.
“[The system] has colored lights at different angles, and then it has this reflective material, and by looking at the colors, the computer … can figure out the 3-D shape of what that thing is,” explains Adelson, the John and Dorothy Wilson Professor of Vision Science in the Department of Brain and Cognitive Sciences.
In both sets of experiments, a GelSight sensor was mounted on one side of a robotic gripper, a device somewhat like the head of a pincer, but with flat gripping surfaces rather than pointed tips.
For an autonomous robot, gauging objects’ softness or hardness is essential to deciding not only where and how hard to grasp them but how they will behave when moved, stacked, or laid on different surfaces. Tactile sensing could also aid robots in distinguishing objects that look similar.
In previous work, robots have attempted to assess objects’ hardness by laying them on a flat surface and gently poking them to see how much they give. But this is not the chief way in which humans gauge hardness. Rather, our judgments seem to be based on the degree to which the contact area between the object and our fingers changes as we press on it. Softer objects tend to flatten more, increasing the contact area.
The MIT researchers adopted the same approach. Wenzhen Yuan, a graduate student in mechanical engineering and first author on the paper from Adelson’s group, used confectionary molds to create 400 groups of silicone objects, with 16 objects per group. In each group, the objects had the same shapes but different degrees of hardness, which Yuan measured using a standard industrial scale.
Then she pressed a GelSight sensor against each object manually and recorded how the contact pattern changed over time, essentially producing a short movie for each object. To both standardize the data format and keep the size of the data manageable, she extracted five frames from each movie, evenly spaced in time, which described the deformation of the object that was pressed.
Finally, she fed the data to a neural network, which automatically looked for correlations between changes in contact patterns and hardness measurements. The resulting system takes frames of video as inputs and produces hardness scores with very high accuracy. Yuan also conducted a series of informal experiments in which human subjects palpated fruits and vegetables and ranked them according to hardness. In every instance, the GelSight-equipped robot arrived at the same rankings.
Yuan is joined on the paper by her two thesis advisors, Adelson and Mandayam Srinivasan, a senior research scientist in the Department of Mechanical Engineering; Chenzhuo Zhu, an undergraduate from Tsinghua University who visited Adelson’s group last summer; and Andrew Owens, who did his PhD in electrical engineering and computer science at MIT and is now a postdoc at the University of California at Berkeley.
The paper from the Robot Locomotion Group was born of the group’s experience with the Defense Advanced Research Projects Agency’s Robotics Challenge (DRC), in which academic and industry teams competed to develop control systems that would guide a humanoid robot through a series of tasks related to a hypothetical emergency.
Typically, an autonomous robot will use some kind of computer vision system to guide its manipulation of objects in its environment. Such systems can provide very reliable information about an object’s location — until the robot picks the object up. Especially if the object is small, much of it will be occluded by the robot’s gripper, making location estimation much harder. Thus, at exactly the point at which the robot needs to know the object’s location precisely, its estimate becomes unreliable. This was the problem the MIT team faced during the DRC, when their robot had to pick up and turn on a power drill.
“You can see in our video for the DRC that we spend two or three minutes turning on the drill,” says Greg Izatt, a graduate student in electrical engineering and computer science and first author on the new paper. “It would be so much nicer if we had a live-updating, accurate estimate of where that drill was and where our hands were relative to it.”
That’s why the Robot Locomotion Group turned to GelSight. Izatt and his co-authors — Tedrake, the Toyota Professor of Electrical Engineering and Computer Science, Aeronautics and Astronautics, and Mechanical Engineering; Adelson; and Geronimo Mirano, another graduate student in Tedrake’s group — designed control algorithms that use a computer vision system to guide the robot’s gripper toward a tool and then turn location estimation over to a GelSight sensor once the robot has the tool in hand.
In general, the challenge with such an approach is reconciling the data produced by a vision system with data produced by a tactile sensor. But GelSight is itself camera-based, so its data output is much easier to integrate with visual data than the data from other tactile sensors.
In Izatt’s experiments, a robot with a GelSight-equipped gripper had to grasp a small screwdriver, remove it from a holster, and return it. Of course, the data from the GelSight sensor don’t describe the whole screwdriver, just a small patch of it. But Izatt found that, as long as the vision system’s estimate of the screwdriver’s initial position was accurate to within a few centimeters, his algorithms could deduce which part of the screwdriver the GelSight sensor was touching and thus determine the screwdriver’s position in the robot’s hand.