CMU Team Trains Autonomous Drones Using Cross-Modal Simulated Data

Researchers Break Down End-to-End Learning for Safe, Real-World Drone Deployment

A method developed at CMU teaches drones to learn perception and action separately, which creates a way to safely deploy them in the real world even though they're trained entirely on simulated data.

To fly autonomously, drones need to understand what they perceive in the environment and make decisions based on that information. A novel method developed by Carnegie Mellon University researchers allows drones to learn perception and action separately. The two-stage approach overcomes the "simulation-to-reality gap," and creates a way to safely deploy drones trained entirely on simulated data into real-world course navigation.

"Typically drones trained on even the best photorealistic simulated data will fail in the real world because the lighting, colors and textures are still too different to translate," said Rogerio Bonatti, a doctoral student in the School of Computer Science's Robotics Institute. "Our perception module is trained with two modalities to increase robustness against environmental variabilities."

The first modality that helps train the drone's perception is image. The researchers used a photorealistic simulator to create an environment that included the drone, a soccer field, and red square gates raised off the ground and positioned randomly to create a track. They then built a large dataset of simulated images from thousands of randomly generated drone and gate configurations.

The second modality needed for perception is knowing the gates' position and orientation in space, which the researchers accomplished using the dataset of simulated images.

Teaching the model using multiple modalities reinforces a robust representation of the drone's experience, meaning it can understand the essence of the field and gates in a way that translates from simulation to reality. Compressing images to have fewer pixels aids this process. Learning from a low-dimensional representation allows the model to see through the visual noise in the real world and identify the gates.

With perception learned, researchers deploy the drone within the simulation so it can learn its control policy — or how to physically move. In this case, it learns which velocity to apply as it navigates the course and encounters each gate. Because it's a simulated environment, a program can calculate the drone's optimal trajectory before deployment. This method provides an advantage over manually supervised learning using an expert operator, since real-world learning can be dangerous, time-consuming and expensive.

The drone learns to navigate the course by going through training steps dictated by the researchers. Bonatti said he challenges specific agilities and directions the drone will need in the real world. "I make the drone turn to the left and to the right in different track shapes, which get harder as I add more noise. The robot is not learning to recreate going through any specific track. Rather, by strategically directing the simulated drone, it's learning all of the elements and types of movements to race autonomously," Bonatti said.

Bonatti wants to push current technology to approach a human's ability to interpret environmental cues.

"Most of the work on autonomous drone racing so far has focused on engineering a system augmented with extra sensors and software with the sole aim of speed. Instead, we aimed to create a computational fabric, inspired by the function of a human brain, to map visual information to the correct control actions going through a latent representation," Bonatti said.

But drone racing is just one possibility for this type of learning. The method of separating perception and control could be applied to many different tasks for artificial intelligence such as driving or cooking. While this model relies on images and positions to teach perception, other modalities like sounds and shapes could be used for efforts like identifying cars, wildlife or objects

Contributing researchers to this work include Carnegie Mellon's Sebastian Scherer, and Ratnesh Madaan, Vibhav Vineet and Ashish Kapoor of the Microsoft Corporation. The paper, "Learning Visuomotor Policies for Aerial Navigation Using Cross-Modal Representations," has been accepted to the International Conference on Intelligent Robots and Systems (IROS) 2020. The paper's code is open-sourced and available for other researchers.

For More Information
Byron Spice | 412-268-9068 | bspice@cs.cmu.edu
Virginia Alvino Young | 412-268-8356 | vay@cmu.edu