Individuals with malocclusion require an orthodontic diagnosis and treatment plan based on the severity of their condition. Assessing and monitoring changes in periodontal structures before, during, and after orthodontic procedures is crucial, and intraoral ultrasound (US) imaging has been shown a promising diagnostic tool in imaging periodontium. However, accurately delineating and analyzing periodontal structures in US videos is a challenging task for clinicians, as it is time-consuming and subject to interpretation errors. This paper introduces DetSegDiff, an edge-enhanced diffusion-based network developed to simultaneously detect the cementoenamel junction (CEJ) and segment alveolar bone structure in intraoral US videos. An edge feature encoder is designed to enhance edge and texture information for precise delineation of periodontal structures. Additionally, we employed the spatial squeeze-attention module (SSAM) to extract more representative features to perform both detection and segmentation tasks at global and local levels. This study used 169 videos from 17 orthodontic patients for training purposes and was subsequently tested on 41 videos from 4 additional patients. The proposed method achieved a mean distance difference of 0.17 ± 0.19 mm for the CEJ and an average Dice score of 90.1% for alveolar bone structure. As there is a lack of multi-task benchmark networks, thorough experiments were undertaken to assess and benchmark the proposed method against state-of-the-art (SOTA) detection and segmentation individual networks. The experimental results demonstrated that DetSegDiff outperformed SOTA approaches, confirming the feasibility of using automated diagnostic systems for orthodontists.