This study aimed to assess the different needs of patients with breast cancer and their families in online health communities at different treatment phases using a Latent Dirichlet Allocation (LDA) model. Using Python, breast cancer-related posts were collected from two online health communities: patient-to-patient and patient-to-doctor. After data cleaning, eligible posts were categorized based on the treatment phase. Subsequently, an LDA model identifying the distinct need-related topics for each phase of treatment, including data preprocessing and LDA topic modeling, was established. Additionally, the demographic and interactive features of the posts were manually analyzed. We collected 84,043 posts, of which 9504 posts were included after data cleaning. Early diagnosis and rehabilitation treatment phases had the highest and lowest number of posts, respectively. LDA identified 11 topics: three in the initial diagnosis phase and two in each of the remaining treatment phases. The topics included disease outcomes, diagnosis analysis, treatment information, and emotional support in the initial diagnosis phase; surgical options and outcomes, postoperative care, and treatment planning in the perioperative treatment phase; treatment options and costs, side effects management, and disease prognosis assessment in the non-operative treatment phase; diagnosis and treatment options, disease prognosis, and emotional support in the relapse and metastasis treatment phase; and follow-up and recurrence concerns, physical symptoms, and lifestyle adjustments in the rehabilitation treatment phase. The needs of patients with breast cancer and their families differ across various phases of cancer therapy. Therefore, specific information or emotional assistance should be tailored to each phase of treatment based on the unique needs of patients and their families.
Read full abstract