Automatic second language (L2) speech fluency assessment has been one of the ultimate goals of several projects aiming at designing Computer-Assisted Pronunciation Training (CAPT) tools for L2 learners. Usually, three challenges must be tackled in order to solve the issues at stake: 1) Defining fluency from a threefold interdisciplinary perspective (acoustic and perceptual phonetics, computer science, L2 education); 2) Using a cost-effective algorithm; 3) Testing the procedure with actual learners’ data. Despite rapid technical developments in the field of automatic speech processing, the tools which are actually available for learners are still scarce, and most of them rely on automatic speech recognition (ASR). Moreover, most research on the topic is focusing on English as the target L2. Therefore, in this article, we address the following research questions: (a) is it possible to use a non-ASR-based low-level signal segmentation algorithm to predict human expert assessment of phonetic fluency in beginner Japanese learners of French in a text-reading task during the first stages of their learning? (b) if the answer to (a) is positive, then what are the best predictors of phonetic fluency among a set of available measures (see below for more details)? (c) is it possible to use this algorithm to monitor the evolution of phonetic fluency (and of its associated predictors) in these learners in a longitudinal study? As a first step, a corpus of French sentences read aloud by 12 Japanese learners of different proficiency levels in French was used to design a prediction system. The read-aloud speech data was perceptually annotated by three human experts on four dimensions: overall speech fluency, speech rate, regularity of speech rate, speech fluidity (i.e. smoothness of transitions between phones). Inter-rater agreement and reliability were high for all dimensions, and the average human ratings were compared with the scores provided by our prediction system. The results show strong correlations between human and automatic scores of speech rate and regularity of speech rate, and a weak correlation for speech fluidity. Automatic scores were finally combined together through a multiple linear regression model in order to predict overall speech fluency. The best model led to a correlation coefficient of .92 between automatic and human ratings, with a root-mean-square error of .38. In the second step of this study, a corpus of identical sentences read aloud four times over two years by 12 Japanese learners of French (after 4, 7, 12, and 19 months of French courses in Japan) was fed to the automatic system. The results show regular progress in overall speech fluency, which fits with the regular progress the Japanese learners under scrutiny were expected to make through their academic program in French at their university in Japan every semester. Our study suggests a positive answer to our first and third research questions, with speech rate as the best predictor to answer our second research question. In a pedagogical perspective, it seems that such a simple algorithm could be integrated in a CAPT tool to monitor learners’ progress in phonetic fluency in reading-aloud tasks.