Abstract Stellar classification and radius estimation are crucial for understanding the structure of the universe and stellar evolution. With the advent of the era of astronomical big data, multimodal data are available and theoretically effective for stellar classification and radius estimation. A problem is how to improve the performance of this task by jointly using the multimodal data. However, existing research primarily focuses on using single-modal data. To this end, this paper proposes a model, Multi-Modal SCNet (MMSCNet), and its ensemble model Multimodal Ensemble for Stellar Classification and Regression (MESCR) for improving stellar classification and radius estimation performance by fusing two modality data. In this problem, a typical phenomenon is that the sample numbers of some type stars are evidently more than others. This imbalance has negative effects on model performance. Therefore, this work utilize a weighted sampling strategy to deal with the imbalance issues in MESCR. Some evaluation experiments are conducted on a test set for MESCR and the classification accuracy is 96.1\%, the radius estimation performance Mean of Absolute Error (MAE) and $\sigma$ are 0.084 dex and 0.149 \(R_{\odot}\) respectively. Moreover, we assessed the uncertainty of model predictions, confirming good consistency within a reasonable deviation range. Finally, we applied our model to 50,871,534 SDSS stars without spectra and published a new catalog.