Abstract

Medical concept normalization helps in discovering standard concepts in free-form text i.e., maps health-related mentions to standard concepts in a clinical knowledge base. It is much beyond simple string matching and requires a deep semantic understanding of concept mentions. Recent research approach concept normalization as either text classification or text similarity. The main drawback in existing a) text classification approach is ignoring valuable target concepts information in learning input concept mention representation b) text similarity approach is the need to separately generate target concept embeddings which is time and resource consuming. Our proposed model overcomes these drawbacks by jointly learning the representations of input concept mention and target concepts. First, we learn input concept mention representation using RoBERTa. Second, we find cosine similarity between embeddings of input concept mention and all the target concepts. Here, embeddings of target concepts are randomly initialized and then updated during training. Finally, the target concept with maximum cosine similarity is assigned to the input concept mention. Our model surpasses all the existing methods across three standard datasets by improving accuracy up to 2.31%.

Highlights

  • Internet users use social media to voice their views and opinions

  • Medical concept normalization aims at discovering standard medical concepts in free-form text

  • Health related mentions are mapped to standard concepts in a clinical knowledge base

Read more

Summary

Background

Internet users use social media to voice their views and opinions. Medical social media is a part of social media in which the focus is limited to health and related issues (Pattisapu et al, 2017). Text similarity approach of Pattisapu et al (2020) is the need to generate target concept embeddings separately using graph embedding methods This is time and resource consuming when different vocabularies are used for mapping in different data sets (e.g., SNOMED-CT is used in CADEC (Karimi et al, 2015) and PsyTAR (Zolnoori et al, 2019) datasets, MedDRA (Mozzicato, 2009) is used in SMM4H2017 (Sarker et al, 2018)). By learning the representations of target concepts along with input concept mention, our model a) exploits target concepts information unlike existing text classification approaches (Tutubalina et al, 2018; Miftahutdinov and Tutubalina, 2019; Kalyan and Sangeetha, 2020a) and b) eliminates the time and resource consuming process of separately generating target concept embeddings unlike existing text similarity approach (Pattisapu et al, 2020). Our model achieves the best results across three standard data sets surpassing all the existing methods with an accuracy improvement of up to 2.31%

Model Description
Evaluation Metric
Implementation Details
Datasets
Results
Merit Analysis
Demerit Analysis
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.