The Ayurvedic medical system is heavily reliant on medicinal plants, demanding correct information retrieval from the web. This study examines specific web crawling strategies for collecting useful information about Ayurvedic botanicals, with a focus on deep learning methodologies. Crawlers optimized with machine learning models retrieve domain-specific content while filtering out unnecessary data. The paper looks at approaches such as the TRES framework, which is a reinforcement learning-based crawler that discretizes vast state and action spaces in order to effectively choose ideal URLs. Furthermore, convolutional neural networks (CNN) and natural language processing (NLP) have been used in crawlers to improve categorization, as demonstrated by successful Turkish language processing applications. The paper "Learning to Crawl: Comparing Classification Schemes" conducts a comparative comparison of old rule-based approaches and newer deep learning classifiers, demonstrating the latter's superiority. In addition, a Naive Bayes classifier is employed in an Ayurvedic plant-focused crawler, which employs query expansion via a carefully curated thesaurus to improve relevancy in retrieved web pages. This poll emphasizes the need for more efficient, adaptive, and focused crawlers powered by deep learning to progress Ayurvedic research.
Read full abstract