Video scene text contains valuable information for scene understanding, as scene text in video provides important semantic clues for human beings to sense the environment. Text detection in natural scene is challenging due to low resolution/low contrast, cluttered backgrounds and various illumination changes. Therefore, in this paper, a new approach has been proposed to detect video scene text based on saliency edge map, which combines both saliency map and edge features for scene text detection. The saliency map is conducive to detecting the text with cluttered backgrounds whereas the edge map is suitable for detecting the scene text with low resolution and various illumination changes. First of all, we retrieve the saliency map and edge map on the video frame/image, respectively. The saliency map can keep most of saliency regions in the video frame/image which will remove some complicated background. The edge map retrieves the edge feature which is not sensitive to the illumination changes and low resolution/low contrast regions. Then we integrate the edge map and saliency map into saliency edge map (SEM), which preserves the advantages of saliency map and edge maps. Finally, based on Gaussian mixture model (GMM), the SEM can be divided into three kinds of components: bright characters, dark characters and background, and we perform connected component analysis on these three components to get the text regions. Experimental evaluations based on public dataset, such as ICDAR 2003, 2013, MSRA-TD500 and SVT, and news video dataset demonstrate that our method significantly outperforms the other 4 text detection algorithms in terms of recall, precision, F-Score and detection speed, especially when there are challenges such as text with different alignments, character sizes, languages, appearances and uneven illumination.