Learning to rank models are broadly applied in ad hoc retrieval for scoring and sorting documents based on their relevance to textual queries. The generalizability of the trained model in the learning to rank approach, however, can have an impact on the retrieval performance, particularly when data includes noise and outliers, or is incorrectly collected or measured. In this paper, we introduce a Self-Distilled Learning to Rank (SDLR) framework for ad hoc retrieval, and analyze its performance over a range of retrieval datasets and also in the presence of features’ noise. SDLR assigns a confidence weight to each training sample, aiming at reducing the impact of noisy and outlier data in the training process. The confidence weight is approximated based on the feature’s distributions derived from the values observed for the features of the documents labeled for a query in a listwise training sample. SDLR includes a distillation process that facilitates passing on the underlying patterns in assigning confidence weights from the teacher model to the student one. We empirically illustrate that SDLR outperforms state-of-the-art learning to rank models in ad hoc retrieval. We thoroughly investigate the SDLR performance in different settings including when no distillation strategy is applied; when different portion of data are used for training the teacher and the student models, and when both teacher and student models are trained over identical data. We show that SDLR is more effective when training data are split between a teacher and a student model. We also show that SDLR’s performance is robust when data features are noisy.