Abstract

Research in the area of optimizing databases in any Database Management System (DBMS) has been evolving constantly. Today, programming languages are being integrated into database systems to help professional programmers develop software quickly to meet deadlines. Therefore, the design of a database must cater to both the needs of customers and the efficiency of database processes. In this paper, a database application, novelty detection, is used to detect new documents for readers who do not want redundant documents to be read again. This application needs a database to store history and current documents. The objective of this research is to optimize the database tables for up to 10 million records. The experiments are done on both sentence level and document level. In both levels, the investigation of data optimization and the use of proper indexing are conducted. In MYSQL, the MYSQL B-Tree index is used to speed up data selection. In addition, the use of EXPLAIN enables us to properly index the correct data column and to avoid redundant indexing. Optimizing data types are also investigated to ensure no extra work is done by MYSQL in selecting data. A technique known as batching is also introduced to speed up results insertion after novelty detection has been done. Overall, the combined optimization improved the speed by up to 90%. Therefore, we have successfully optimized the database for novelty detection, and the techniques have been integrated into a real-time novelty detection application.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.