The text in a low-resolution (LR) image is usually hard to read. Super-resolution (SR) is an intuitive solution to this issue. Existing single image super-resolution (SISR) models are mainly trained on synthetic datasets whose LR images are obtained by performing bicubic interpolation or gaussian blur on high-resolution (HR) images. However, these models can hardly generalize to practical scenarios because real-world LR images are more difficult to super-resolve. The newly proposed TextZoom dataset is the first dataset for real-world text image super-resolution. We propose a new model termed TSRGAN trained on this dataset. First, a discriminator is designed to prevent the SR network from generating over-smoothed images. Second, we introduce triplet attention into the SR network for better representational ability. Moreover, besides L2 loss and adversarial loss, wavelet loss is incorporated to help reconstruct sharper character edges. Since TextZoom provides text labels, the recognition accuracy of scene text recognition (STR) model can be used to evaluate the quality of SR images. It can reflect the performance of text image SR models better than traditional SR evaluation metrics such as PSNR and SSIM. Comprehensive experiments show the superiority of our TSRGAN. Compared with the state-of-the-art method, the proposed TSRGAN improves the average recognition accuracy of ASTER, MORAN and CRNN by 0.8%, 1.5% and 3.2% on TextZoom respectively.