Efficient search of hacking information has been a topic of great discussion in recent years. Many challenges are encountered when searching for this information. In particular, researchers may encounter unfamiliar and potentially challenging terms, ideas, tools, and other items that are unique to hacking. Effective comprehension of synonyms and polysemy is necessary. These reasons serve as the driving force behind our efforts to develop a productive method for semantic hacking information searches. Semantic search, using advanced NLP techniques, has transformed information retrieval by improving search result accuracy and relevance. Unlike traditional lexical methods, neural models like sentence-transformers handle synonyms and polysemy efficiently. However, processing time increases with model size. This paper proposes a novel ensemble semantic search (NESS) approach that aggregates mini or small neural embedding models, leveraging their distinct advantages. Evaluated on a dataset with over 300,000 Hacker News stories, our proposed method significantly enhances ranking quality and retrieval accuracy compared to existing techniques, while requiring half the processing time of the best-performing large model. The findings underscore the trade-offs between model complexity, retrieval accuracy, and processing efficiency, offering insights for optimizing semantic search systems.
Read full abstract