The multi-sensor information fusion-based intelligent fault diagnosis (MIF-IFD) method using deep learning aims to provide more accurate and reliable ways of identifying equipment health status than single-sensor methods. However, designing feature extraction modules in deep networks and determining information fusion execution layers still requires expertise and extensive experimentation, imposing high development costs. This limitation impedes the development and application of MIF-IFD. This paper proposes an architecture to mitigate the development challenges of MIF-IFD. Firstly, the basic module of the network consists of searchable feature extraction cells and a mask self-attention mechanism to achieve the extraction and fusion of multi-sensor information. Secondly, the optimal feature extraction cells and fusion starting layers are searched by constructing a continuous search space to compose the optimal fusion diagnostic structure, respectively. Finally, validation and analysis are conducted using two experimental datasets. The results demonstrate that the proposed method not only ensures diagnostic accuracy but also significantly improves development efficiency.