Aiming at the research of real-time data transmission and multi-scale image decomposition of embedded optical sensor array, the principle, method and fusion strategy of multi-sensor image fusion are studied comprehensively, thoroughly and systematically by combining the imaging characteristics of source image with multi-scale geometric analysis tools using machine learning algorithm. A new quality scalable video image coding framework is also proposed in this paper, which is implemented by a multi-scale online dictionary learning algorithm based on structured sparse video signals. For the purpose of different types of images and image fusion, a new high quality scalable video image coding framework based on machine learning algorithm is proposed on the basis of comprehensive analysis of prior information such as imaging mechanism of image sensor and imaging characteristics of source image. A multi-scale online dictionary learning algorithm based on machine learning for sparse video signal structure is proposed. Through the hierarchical structure of wavelet decomposition, the searching domain of online learning is optimized to a hierarchical sparse block, and its sparse representation coefficients are obtained by using machine learning sparse coding idea. The real-time data transmission of embedded optical sensor array based on machine learning and multi-scale image decomposition algorithm proposed in this paper have good fusion performance, which is of great significance for further research and engineering application of image fusion technology.