Automatic Speech Recognition (ASR) systems are quickly becoming a crucial element in supporting healthcare providers, improving the flow of information among medical teams, and enhancing the patient's experience. However, to be fully supportive, these ASR systems must meet certain requirements dictated by market realities: high accuracy of speech recognition and low rate of errors, the possibility of additional training the model, and the possibility of on-premise system installation. Therefore, the aim of this paper is to perform a comparative analysis of leading ASR systems available on the Polish market for the needs of conducting medical interviews. We selected three systems, Google ASR, Microsoft ASR, and Techmo ASR, and we compared their performance on a prepared data set of medical-related expressions spoken in Polish. The results of our analysis indicated that there are minor discrepancies in the accuracy of speech recognition between all three evaluated ASR systems, whereas only two ASR systems met the raised requirements, in both cases partially. Still, they all exhibited specific problems in recognising word endings or word boundaries. We were able to categorise such problems into: Misrecognitions, Quality Problems, and Word Boundaries, varying in their level of influence on the further speech recognition process. Our research findings are expected to provide valuable insights to a wide range of stakeholders facilitating the development of tailored speech recognition solutions that meet the specific needs of medical sector.
Read full abstract