Recent advancements in artificial intelligence have significantly expanded capabilities in processing language and images. However, the challenge of comprehensively understanding video content still needs to be solved. The main problem is the requirement to process real-time multidimensional video information at data rates exceeding 1 Tb/s, a demand that current hardware technologies cannot meet. This work introduces a hardware-accelerated integrated optoelectronic platform specifically designed for the real-time analysis of multidimensional video. By leveraging optical information processing within artificial intelligence hardware and combining it with advanced machine vision networks, the platform achieves data processing speeds of 1.2 Tb/s. This capability supports the analysis of hundreds of frequency bands with megapixel spatial resolution at video frame rates, significantly outperforming existing technologies in speed by three to four orders of magnitude. The platform demonstrates effectiveness for AI-driven tasks, such as video semantic segmentation and object understanding, across indoor and aerial scenarios. By overcoming the current data processing speed limitations, the platform shows promise in real-time AI video understanding, with potential implications for enhancing human-machine interactions and advancing cognitive processing technologies.
Read full abstract