Abstract
In big data applications, the data are usually stored in data files, whose data file structures, field structures, data types and lengths are not uniform. Therefore, if these data are stored in the traditional relational database, it is difficult to meet the requirements of fast storage and access. To solve this problem, we propose the mapping model between the source data file and the target HBase file. Our method solves the heterogeneity of the file object and the universality of the storage conversion. Firstly, based on the mapping model, we design “RowKey”, generation rules and algorithm. Then according to the mapping rules of data file fields with the HBase table column, the data in the data file are transformed into HBase. Finally, the retrieved keywords in “RowKey” are stored and used to achieve fast data retrieval by prefix matching or keyword matching method. Our method has been applied to different projects, which shows these results can be applied to the data conversion from regular row store data file to HBase distributed large data storage and has strong commonality. The method can be widely used in HBase big data storage applications.
Highlights
For big data applications, massive data are stored in files by rows [1,2]
Our method has been applied to different projects, which shows these results can be applied to the data conversion from regular row store data file to HBase distributed large data storage and has strong commonality
In this paper, aiming at the problem mentioned above, we study converting and storing the data file stored by row to HBase distributed database, and fast retrieval and access to the big data in HBase
Summary
Massive data are stored in files by rows [1,2]. With the continuous development and application of distributed database technology, converting these data files into distributed storage can provide a more convenient application environment [3,4]. The storage and retrieval method through associated multi-attributes of massive data is described in Reference [20], which solves the secondary index problem based on the multi-condition query of HBase dynamic properties [21]. In this paper, aiming at the problem mentioned above, we study converting and storing the data file stored by row to HBase distributed database, and fast retrieval and access to the big data in HBase. It involves the storage domain of big data. A common tool for distributed storage, transformation and retrieval of big data is to be studied
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have