Accurate identification and precise localization of cephalometric landmarks provide clinicians with essential insights into craniofacial deformities, aiding in the assessment of treatment strategies for improved patient outcomes. The current methodologies heavily depend on the utilization of multiple CNNs for predicting landmark coordinates, which makes them computationally burdensome and unsuitable for translation to clinical applications. To overcome this limitation, we propose a novel, end-to-end trainable, two-stage regression framework for cephalometric landmark detection. In the initial stage, a single neural network is employed to estimate the locations of all landmarks simultaneously, enabling the identification of potential landmark regions. In the second stage, a semantic fusion block leverages the in-network multi-resolution feature hierarchy to produce high-level semantically rich features. These feature maps are then cropped based on coarsely detected landmark locations and concurrent refinement loss is used to fine-tune and refine the landmark locations. The proposed framework demonstrates the potential for enhancing clinical workflow and treatment outcomes in orthodontics. This is achieved through the utilization of a single CNN backbone augmented with multi-resolution semantically fused anatomical features, which effectively enhances representation learning in a computationally efficient manner. The performance of the proposed framework is evaluated on two publicly available anatomical landmark datasets. The experimental results demonstrate that our framework achieves a state-of-the-art detection accuracy of 87.17% within the clinically accepted range of 2 mm. The source code and the pre-trained weights are made publicly available at this https://github.com/manwaarkhd/CEPHMark-Net, promoting reproducibility and enabling further advancements.