In recent year, the number of people who are suffering from the Diabetes Mellitus (DM) has increased remarkably and the detection of DM disease has attracted much attention. Different from some existing methods which are invasive, Traditional Chinese Medicine (TCM) provides a non-invasive strategy for DM diagnosis by exploiting some features in the body surface, including the tongue, face, sublingual vein, pulse and odor. Since a combination of these modalities would contribute to improving detection performance, a novel multi-modal learning method is proposed to learn a shared latent variable among the tongue, face, sublingual, pulse and odor information, which efficiently exploits the correlation. In detail, the raw images or signals of five modalities are first captured through our non-invasive devices. Their corresponding features are then extracted, respectively. Finally, a shared auto-encoder gaussian process latent variable model (SAGP) is introduced to learn a latent variable for various modalities in a non-linear and generative way. An efficient algorithm is designed to optimize the proposed model. The DM detection experiments are conducted on a dataset composed of 548 Healthy and 356 DM samples collected by us and the results substantiate the superiority of the proposed method.