Hip-related pain (HRP) affects young to middle-aged active adults and impacts physical activity, finances and quality of life. HRP includes conditions like femoroacetabular impingement syndrome and labral tears. Lateral hip muscle dysfunction and atrophy in HRP are more pronounced in advanced hip pathology, with limited evidence in younger populations. While MRI use for assessing hip muscle morphology is increasing, with automated deep-learning techniques showing promise, studies assessing their accuracy are limited. Therefore, we aimed to compare hip intramuscular fat infiltrate (MFI) and muscle volume, in individuals with and without HRP as well as assess the reliability and accuracy of automated machine-learning segmentations compared with human-generated segmentation. This cross-sectional study included sub-elite/amateur football players (Australian football and soccer) with a greater than 6-month history of HRP [n = 180, average age 28.32, (standard deviation 5.88) years, 19% female] and a control group of sub-elite/amateur football players without pain [n = 48, 28.89 (6.22) years, 29% female]. Muscle volume and MFI of gluteus maximus, medius, minimis and tensor fascia latae were assessed using MRI. Associations between muscle volume and group were explored using linear regression models, controlling for body mass index, age, sport and sex. A convolutional neural network (CNN) machine-learning approach was compared with human-performed muscle segmentations in a subset of participants (n = 52) using intraclass correlation coefficients and Sorensen-Dice index. When considering adjusted estimates of muscle volume, there were significant differences observed between groups for gluteus medius (adjusted mean difference 23 858 mm3 [95% confidence interval 7563, 40 137]; p = 0.004) and tensor fascia latae (6660 mm3 [2440, 13 075]; p = 0.042). No differences were observed between groups for gluteus maximus (18 265 mm3 [-21 209, 50 782]; p = 0.419) or minimus (3893 mm3 [-2209, 9996]; p = 0.21). The CNN was trained for 30 000 iterations and assessed its accuracy and reliability on an independent testing dataset, achieving high segmentation accuracy (mean Sorenson-Dice index >0.900) and excellent muscle volume and MFI reliability (ICC2,1 > 0.900). The CNN outperformed manual raters, who had slightly lower interrater accuracy (Sorensen-Dice index >0.800) and reliability (ICC2,1 > 0.800). The increased muscle volumes in the symptomatic group compared with controls could be associated with increased myofibrillar size, sarcoplasmic hypertrophy or both. These changes may facilitate greater muscular efficiency for a given load, enabling the athlete to maintain their normal level of function. In addition, the CNNs for muscle segmentation was more efficient and demonstrated excellent reliability in comparison to manual segmentations.