The quality of the input text image has a clear impact on the output of a scene text recognition (STR) system; however, due to the fact that the main content of a text image is a sequence of characters containing semantic information, how to effectively assess text image quality remains a research challenge. Text image quality assessment (TIQA) can help in picking a hard sample, leading to a more robust STR system and recognition-oriented text image restoration. In this paper, by arguing that the text image quality comes from character-level texture feature and embedding robustness, we propose a learning-based fine-grained, sharp, and recognizable text image quality assessment method (FSR–TIQA), which is the first TIQA scheme to our knowledge. In order to overcome the difficulty of obtaining the character position in a text image, an attention-based recognizer is used to generate the character embedding and character image. We use the similarity distribution distance to evaluate the character embedding robustness between the intra-class and inter-class similarity distributions. The Haralick feature is used to reflect the clarity of the character region texture feature. Then, a quality score network is designed under a label–free training scheme to normalize the texture feature and output the quality score. Extensive experiments indicate that FSR-TIQA has significant discrimination for different quality text images on benchmarks and Textzoom datasets. Our method shows good potential to analyze dataset distribution and guide dataset collection.