Abstract

Full-Text Search combined with access control have a wide range of applications, for example in a multi-users system that allows each user to create their own content (e.g. blog or social media). Unfortunately, there are only few (if not zero) researches that combine the two. It is also not supported in DBMS or modern search engines. The implication is that one should make their own implementation of full-text search with access control. While inverted index is already used widely for full-text searching, we try to use generalized suffix tree for its ability to search for any substring within a document, not only exact word occurrence. Theoretically, the time and memory needed to index a collection of documents is linear in the total size of the documents. However, our implementation requires memory more than 1200 times of the size of documents. A further analysis shows that at least 32 times is needed, but it will require longer indexing time. In conclusion, generalized suffix tree may not suitable for large amount of data. In the other hand, the search using generalized suffix tree is 3 times faster than inverted index. Suffix tree can be used only if substring search is mandatory (e.g. DNA processing) or where time is significantly more important than memory (e.g. search autocomplete system). The access control itself acts as filter after the documents yielded from searching through the index.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.