Real-time Mapping of Physical Scene Properties with an Autonomous Robot Experimenter

CORL 2022

We demonstrate the first fully-autonomous, neural-scene labelling robot, trained from scratch, in real-time, and capable of operating in the real-world. The autonomous robot experimenter discovers and maps dense physical scene properties by providing the outcomes of sparse experiments -- a poke, spectroscopy measurement or lateral push -- to a 3D neural field.

Neural fields can be trained from scratch to represent the shape and appearance of 3D scenes efficiently. It has also been shown that they can densely map correlated properties such as semantics, via sparse interactions from a human labeller. In this work, we show that a robot can densely annotate a scene with arbitrary discrete or continuous physical properties via its own fully-autonomous experimental interactions, as it simultaneously scans and maps it with an RGB-D camera. A variety of scene interactions are possible, including poking with force sensing to determine rigidity, measuring local material type with single-pixel spectroscopy or predicting force distributions by pushing. Sparse experimental interactions are guided by entropy to enable high efficiency, with tabletop scene properties densely mapped from scratch in a few minutes from a few tens of interactions.


Overview Video

Method

We represent 3D scenes similarly to iMap, with an MLP that maps a 3D coordinate to colour and volume density. The joint optimisation is extended to include physical scene properties, which are optimised through rendering the outcomes of sparse experiments. The model is trained from scratch, in real-time and without any prior data.

The robot builds an internal representation of its environment via a series of autonomous experiments. First, it actively selects interaction locations that are both feasible and information-rich. Second, the selected locations are mapped to the real-world coordinate system of the robot, and a physical interaction with the scene is planned and executed. Third, the resulting measurement is processed and/or classified to obtain the ground-truth property label. Finally, using the labels obtained in this manner, scene properties are optimised through semantic rendering of the robot-selected keyframe pixels. The resulting joint internal representation of shape, appearance and semantics of the neural-field allows for the sparsely-annotated scene properties to be propagated efficiently and densely throughout the scene.

Sparse experiments, from left to right: top-down poke, spectroscopy measurement and lateral push.
The corresponding rendered dense scene properties: rigidity, material class and push force distribution.

The measurement of continuous-valued push force distribution demonstrates the temporal memory characteristics of neural field representations and the ability to predict dense, continuous-valued semantics with respect to sparse ground truths. Below, the robot is shown interacting with a power drill with non-uniform mass distribution. This can clearly be seen in the final push force distribution.

Contact

If you have any questions, please feel free to contact Iain Haughton