Abstract

The concept of a data lake is proposed, the management system of which has a machine consciousness to control the parameters of the data lake, resolve independently from emergency situations and interact with administrators. For data storage and processing, a universal data model is used that allows processing data in any form: tables, multidimensional cubes, graphs, complex graphs (hypergraphs, metagraphs, etc.), what provides additional possibilities for information search and transformation. The specially configured relationships in the data lake allows to see the history of data transformation. The universal data model foresees usage of a family of languages that implement the capabilities of working with metagraphs, including declarative and navigational query and data manipulation languages, which will have built-in SQL and MDX for working with tables and cubes. The subconscious performs regular standard procedures, such as the data lake monitoring, identifying trends and abnormal situations. The subconscious uses the AutoML system to predict the characteristics of the data lake. The data lake management system uses special agents that are looking for missing information in the data lake in external data sources (the Internet, data bases and data warehouses), and special bots for interacting with administrators and clients in natural language (they must be multimodal in order to be able to work with diagrams and speech). To ensure integration with data manipulation languages and effective fulfillment of user's requests, the data lake management system includes a library of classical mathematical algorithms for working with graphs adapted for use on metagraphs (for example, shortest path algorithms, graph traversal algorithms, etc.). The potential implementation of the data lake management system is wider than just providing analytics. The data lake under management of the proposed system can be used for transactional data processing. Therefore, the data lake may become the only data storage system in organization, on the top of which all applications of the corporate management system are working. The implementation of the proposed concept will eliminate the backlog in the development of the Data Lake architecture from the architecture of enterprise management systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call