Abstract

The world of mathematical knowledge on the WWW has grown enormously. Despite the clear importance of a mathematical search engine this research field had been abandoned until very recently. Although, currently available full text search engines can be used on these documents too, they are deficient in almost all cases. They cannot handle structured mathematical text and mathematical operations. Many problems are the result of the mathematical nature. By applying axioms, equal transformations, and by using different notation each formula can be expressed in numerous ways. Ambiguous searches like "sin" or "a" would return documents containing sine function and the English noun sin or documents containing variable a and indefinite article a. Moreover, mathematical operators and special notation cannot be expressed in their query languages. In this work, we address these issues and present a technique how to index real-world scientific documents containing mathematical notation by exploiting the current state-of-art of full text search engines. Our approach has several advantages over existing solutions. It is primarily intended for documents on the WWW, which are mostly semantically poor, and offers an extensible level of mathematical awareness supporting also similarity searches. Furthermore, it is designed as an extension and therefore any full text search engine can easily adopt it. The experiments over two real-world document sets showed that the performance is highly dependent on several features of the mathematical search engine.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call