Abstract
Multimodal perception can capture more precise and comprehensive information compared with unimodal approaches. However, current sensory systems typically merge multimodal signals at computing terminals following parallel processing and transmission, which results in the potential loss of spatial association information and requires time stamps to maintain temporal coherence for time-series data. Here we demonstrate bioinspired in-sensor multimodal fusion, which effectively enhances comprehensive perception and reduces the level of data transfer between sensory terminal and computation units. By adopting floating gate phototransistors with reconfigurable photoresponse plasticity, we realize the agile spatial and spatiotemporal fusion under nonvolatile and volatile photoresponse modes. To realize an optimal spatial estimation, we integrate spatial information from visual-tactile signals. For dynamic events, we capture and fuse in real time spatiotemporal information from visual-audio signals, realizing a dance-music synchronization recognition task without a time-stamping process. This in-sensor multimodal fusion approach provides the potential to simplify the multimodal integration system, extending the in-sensor computing paradigm.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have