Bone age assessment plays a critical role in the investigation of endocrine, genetic, and growth disorders in children. This process is usually conducted manually, with some drawbacks, such as reliance on the pediatrician’s experience and extensive labor, as well as high variations among methods. Most deep learning models use one neural network to extract the global information from the whole input image, ignoring the local details that doctors care about. In this paper, we propose a global-local feature fusion convolutional neural network, including a global pathway to capture the global contextual information and a local pathway to extract the fine-grained information from local patches. The fine-grained information is integrated into the global context information layer-by-layer to assist in predicting bone age. We evaluated the proposed method on a dataset with 11,209 X-ray images with an age range of 4–18 years. Compared with other state-of-the-art methods, the proposed global-local network reduces the mean absolute error of the estimated ages to 0.427 years for males and 0.455 years for females; the average accuracy rate is within 6 months and 12 months, reaching 70% and 91%, respectively. In addition, the effectiveness and rationality of the model were verified on a public dataset.