In order to meet the requirement of bilingual Tibetan-Chinese text detection in a large number of scene images in reality, a dataset is synthesized and a deep-learning based model is trained, thus providing a solution to the issue of Tibetan-Chinese text detection in scene images. Firstly, to solve the problem of lacking large number of annotated samples, a synthesis method based on contour detection and Poisson image editing is proposed. A bilingual Tibetan-Chinese scene image dataset (BiTCSD) is synthesized, including 87 680 synthetic images and 5 550 manually annotated images. Secondly, the effectiveness of training model using synthetic dataset is verified. Thirdly, the connectionist text proposal network (CTPN) based on deep learning is trained on different datasets and the text detection performance of the model is evaluated for different languages on the test set. The experimental results demonstrate that the detection performance of CTPN can be greatly improved by training the model via synthetic samples, the trained CTPN can detect the bilingual Tibetan-Chinese text lines in scene images with high accuracy and recall, with the the precision of 0.91, the recall of 0.85 and the <italic>F</italic> value of 0.88 for Tibetan text detection, and the detection precision, recall and <italic>F</italic> value for Chinese text detection is 0.89, 0.83 and 0.86, respectively.