AbstractMany well‐known probabilistic information retrieval models have shown promise for use in document ranking, especially BM25. Nevertheless, it is observed that the control parameters in BM25 usually need to be adjusted to achieve improved performance on different data sets; additionally, the assumption in BM25 on the bag‐of‐words model prevents its direct utilization of rich information that lies at the sentence or document level. Inspired by the above challenges with respect to BM25, we first propose a new normalization method on the term frequency in BM25 (called BM25QL in this paper); in addition, the method is incorporated into CRTER2, a recent BM25‐based model, to construct CRTER2QL. Then, we incorporate topic modeling and word embedding into BM25 to relax the assumption of the bag‐of‐words model. In this direction, we propose a topic‐based retrieval model, TopTF, for BM25, which is then further incorporated into the language model (LM) and the multiple aspect term frequency (MATF) model. Furthermore, an enhanced topic‐based term frequency normalization framework, ETopTF, based on embedding is presented. Experimental studies demonstrate the great effectiveness and performance of these methods. Specifically, on all tested data sets and in terms of the mean average precision (MAP), our proposed models, BM25QL and CRTER2QL, are comparable to BM25 and CRTER2 with the best b parameter value; the TopTF models significantly outperform the baselines, and the ETopTF models could further improve the TopTF in terms of the MAP.