Fig. 5: The pipeline of the visual-tactile joint learning framework. | Nature Communications

Fig. 5: The pipeline of the visual-tactile joint learning framework.

From: Capturing forceful interaction with deformable objects using a deep learning-powered stretchable tactile array

Fig. 5

This model contains hand reconstructors, feature extractors, a temporal feature fusion, and a winding number field (WNF) predictor. The global and local features are extracted from visual and tactile inputs, and based on block positions on the hand. We fuse the features to compute the per-point feature with a temporal cross-attention module, predict WNF for sampled positions, and reconstruct object geometry by the marching cube algorithm.

Back to article page