Due to the robustness resulted from scale transformation and unbalanced distribution of training samples in scene text detection task, a new fusion framework TSFnet is proposed in this paper. This framework is composed of Detection Stream, Judge Stream and Fusion Stream. In the Detection Stream, loss balance factor (LBF) is raised to improve the region proposal network (RPN). To predict the global text segmentation map, the algorithm combines regression strategy and case segmentation method. In the Judge Stream, a classification of the samples is proposed based on the Judge Map and the corresponding tags to calculate the overlap rate. As a support of Detection Stream, feature pyramid network is utilized in the algorithm to extract Judge Map and calculate LBF. In the Fusion Stream, a new fusion algorithm is raised. By fusing the output of the two streams, we can position the text area in the natural scene accurately. Finally, the algorithm is experimented on the standard data sets ICDAR 2015 and ICDAR2017-MLT. The test results show that the [Formula: see text] values are 87.8% and 67.57%, respectively, superior to the state-of-the art models. This proves that the algorithm can solve the robustness issues under the unbalance between scale transformation and training data.