Providing labeled Arabic text images dataset for scene text detection is inherently difficult and costly at the same time. Consequently, only few small datasets are available for this task. Previous work has only focused on the data augmentation technique of small datasets; however, the images generated with these techniques cannot reproduce the complexity and variability of natural images. In this paper, we propose a new Arabic text images dataset using the Google Street View service named Tunisia Street View Dataset (TSVD). The dataset contains 7k images collected from different Tunisian cities. It is much more diverse and complex than current image datasets. Taking advantage of this dataset to train Convolutional Neural Network (CNN) models, annotation is required for building high performance models. The annotation task consumes a lot of time and effort for researchers due to its repetitiveness. The development time of text detection systems in natural images is valuable with an effective use. We believe that we have developed a Deep Active Learning algorithm for the annotation phase. A Deep Active Learning algorithm for the annotation phase has been developed by approaching the annotation suggestion task using a deep learning text detector. CNN are used to perform the text detection in natural scene images. Our deep active learning framework combines CNN and active learning approach. This reduces annotation effort by making pertinent suggestions on the most effective annotation areas. We utilize uncertainty provided by CNN models to determine the maximum uncertain areas for annotation. Deep active learning is shown in order to reduce significantly the number of training samples required and also to minimize the annotation work of our dataset up to 1/5. Our dataset is publicly available in IEEE DataPort https://dx.doi.org/10.21227/extw-0k60.