The success of using bone mineral density and/or FRAX to predict femoral osteoporotic fracture risk is modest since they do not account for mechanical determinants that affect bone fracture risk. Computed Tomography (CT)-based geometric, densitometric, and finite element-derived biomarkers have been developed and used as parameters for assessing fracture risk. However, to quantify these biomarkers, segmentation of CT data is needed. Doing this manually or semi-automatically is labor-intensive, preventing the adoption of these biomarkers into clinical practice. In recent years, fully automated methods for segmenting CT data have started to emerge. Quantifying the accuracy, robustness, reproducibility, and repeatability of these segmentation tools is of major importance for research and the potential translation of CT-based biomarkers into clinical practice. A comprehensive literature search was performed in PubMed up to the end of July 2024. Only segmentation methods that were quantitatively validated on human femurs and/or pelvises and on both clinical and non-clinical CT were included. The accuracy, robustness, reproducibility, and repeatability of these segmentation methods were investigated, reporting quantitatively the metrics used to evaluate these aspects of segmentation. The studies included were evaluated for the risk of, and sources of bias, that may affect the results reported. A total of 54 studies fulfilled the inclusion criteria. The analysis of the included papers showed that automatic segmentation methods led to accurate results, however, there may exist a need to standardize reporting of accuracy across studies. Few works investigated robustness to allow for detailed conclusions on this aspect. Finally, it seems that the bone segmentation field has only addressed the concept of reproducibility and repeatability to a very limited extent, which entails that most of the studies are at high risk of bias. Based on the studies analyzed, some recommendations for future studies are made for advancing the development of a standardized segmentation protocol. Moreover, standardized metrics are proposed to evaluate accuracy, robustness, reproducibility, and repeatability of segmentation methods, to ease comparison between different approaches.