Online medical platforms have rapidly developed in recent years. Due to search inaccuracies and incomplete information provided by doctors, some suitable doctors are excluded during the first-stage retrieval. To enhance doctor retrieval and reduce user search costs, obtaining disease word embeddings is essential. Existing methods primarily rely on external corpora and sentence context, which may not align with downstream tasks, and many languages lack standardized medical corpora. Therefore, we integrate contrastive learning with medical knowledge to generate and augment data using simulations and propose a simple network structure for training on the constructed samples. We created two task datasets based on the platform’s data. Experimental results demonstrate that our framework achieves superior outcomes with lower embedding dimensions. In the similarity task, our framework attains an accuracy of 0.854, and in the retrieval task, it achieves an F1 score of 0.853, surpassing the current best results. Our framework has been successfully implemented on Baidu Health, one of the China’s largest online medical platforms, serving over 10 million users. This framework effectively simulates a doctor retrieval system, optimizing the process to ensure more accurate and comprehensive retrieval of suitable doctors.
Read full abstract