We are presently living in the age of intelligent machines, machines are rapidly imitating humans as a result of technological breakthroughs and advances in machine learning, deep learning, and artificial intelligence. In our work, we based our approach on the idea of utilizing a specialized corpus to enhance the performance of a pre-trained language model. We utilized the following approach: (V = vocabulary domain, C1 = initial corpus, C2 = specialization corpus). We applied this approach with different combinations such as (V = general, C1 = general, C2 = ∅), (V = general, C1 = general, C2 = medical), (V = medical, C1 = medical, C2 = ∅), and (V = medical, C1 = medical, C2 = medical) to compare the performance of a general bidirectional encoder representations from transformers model and specialized BERT models for the medical domain. In addition, we evaluated the model’s using informatics for integrating biology and the bedside, and drug-drug interaction datasets to measure their effectiveness in medical tasks.
Read full abstract