Abstract Speech is a primary means of human communication and one of the most basic features of human conduct. Voice is an important part of its subsystems. A speech disorder is a condition that affects the ability of a person to speak normally, which occasionally results in voice impairment with psychological and emotional consequences. Early detection of voice problems is a crucial factor. Computer-based procedures are less costly and easier to administer for such purposes than traditional methods. This study highlights the following issues: recent studies, methods of voice pathology detection, machine learning and deep learning (DL) methods used in data classification, main datasets utilized, and the role of Internet of things (IoT) systems employed in voice pathology diagnosis. Moreover, this study presents different applications, open challenges, and recommendations for future directions of IoT systems and artificial intelligence (AI) approaches in the voice pathology diagnosis. Finally, this study highlights some limitations of voice pathology datasets in comparison with the role of IoT in the healthcare sector, which shows the urgent need to provide efficient approaches and easy and ideal medical diagnostic procedures and treatments of disease identification for doctors and patients. This review covered voice pathology taxonomy, detection techniques, open challenges, limitations, and recommendations for future directions to provide a clear background for doctors and patients. Standard databases, including the Massachusetts Eye and Ear Infirmary, Saarbruecken Voice Database, and the Arabic Voice Pathology Database, were used in most articles reviewed in this article. The classes, features, and main purpose for voice pathology identification are also highlighted. This study focuses on the extraction of voice pathology features, especially speech analysis, extends feature vectors comprising static and dynamic features, and converts these extended feature vectors into solid vectors before passing them to the recognizer.