In the microscopic imaging scenario where the object thickness exceeds the depth of field of the microscope, multi-focus image fusion (MFF) is an effective method to generate an all-in-focus image. However, for nonwoven fabric for which the captured image number is up to 100 or more, the existing methods often underperform in areas near the fiber edges, owing to image ghosting and noise accumulation caused by the platform moving. To address the above problem, this paper presents a method designed to fuse multi-layer micro-images based on the combination of spectral and spatial features of the images. Firstly, the spectral domain-based map is generated by decomposition and reconstruction of the high-frequency and low-frequency components of the images, aimed at obtaining the edge information. Simultaneously, the spatial domain-based fuse map is built through sharpness measurement, referring to visual perception. Finally, the two methods are combined via an optimized weight to obtain an all-in-focus fused image. Four groups of real-world data consisting of 100 multi-focus nonwoven images are utilized to verify the superiority of this method. The experimental results demonstrate that the proposed method can obtain satisfactory performance in terms of both human visual evaluation and objective evaluation compared with the image fusion framework based on the convolutional neural network, MFF, region-based image fusion algorithm and convolutional neural network state-of-the-art fusion methods.