Accurate segmentation of the right ventricle from cardiac magnetic resonance images (MRI) is a critical step in cardiac function analysis and disease diagnosis. It is still an open problem due to some difficulties, such as a large variety of object sizes and ill-defined borders. In this paper, we present a TSU-net network that grips deeper features and captures targets of different sizes with multi-scale cascade and multi-field fusion in the right ventricle. TSU-net mainly contains two major components: Dilated-Convolution Block (DB) and Multi-Layer-Pool Block (MB). DB extracts and aggregates multi-scale features for the right ventricle. MB mainly relies on multiple effective field-of-views to detect objects at different sizes and fill boundary features. Different from previous networks, we used DB and MB to replace the convolution layer in the encoding layer, thus, we can gather multi-scale information of right ventricle, detect different size targets and fill boundary information in each encoding layer. In addition, in the decoding layer, we used DB to replace the convolution layer, so that we can aggregate the multi-scale features of the right ventricle in each decoding layer. Furthermore, the two-stage U-net structure is used to further improve the utilization of DB and MB through a two-layer encoding/decoding layer. Our method is validated on the RVSC, a public right ventricular data set. The results demonstrated that TSU-net achieved an average Dice coefficient of 0.86 on endocardium and 0.90 on the epicardium, thereby outperforming other models. It effectively assists doctors to diagnose the disease and promotes the development of medical images. In addition, we also provide an intuitive explanation of our network, which fully explain MB and TSU-net's ability to detect targets of different sizes and fill in boundary features.