Arrangements of objects are commonplace in a myriad of everyday scenarios, such as decorations at one’s home, displays at museums, and plates of food at restaurants. An efficient personal robot should be able to learn how to robustly recreate an arrangement using only a few examples, while overcoming difficulties caused by uneven surfaces, minor misplacements, and variations in object sizes. Furthermore, the amount of error when performing a placement should be small relative to the objects being placed. Hence, tasks where the objects can be quite small, such as food plating, require more accuracy. However, robotic food manipulation has its own challenges, especially when modeling the material properties of diverse and deformable food items. In this talk, we first propose using a multimodal sensory approach to interacting with food that aids in learning embeddings that capture distinguishing material properties across food items. These embeddings are learned in a self-supervised manner using a triplet loss formulation and a combination of proprioceptive, audio, and visual data. Second, we propose a data-efficient local regression model that is robust to errors and can learn the underlying pattern of an arrangement using visual inputs. To reduce the amount of error this regression model will encounter at execution time, a complementary neural network is trained on depth images to predict whether a given placement will be stable and result in an accurate placement. We evaluate our overall approach on a real world arrangement task that requires a robot to plate variations of Caprese salads.
Oliver Kroemer (Advisor)
Zoom Participation. See announcement.