Big data is fast becoming an important resource and a hot topic in academic research, business and government. In this paper, we introduce the concept of big data, and review advances in big data research, including technology for big data collection, cloud computing technology like Googles file system, BigTable, MapReduce and Hadoop, and data mining and visualization methods for big data. Big data are commonly defined by the so-called 4 Vs, i.e., volume, variety, velocity, and value. High volume data with large variety make the analysis of big data much more difficult. Since velocity is important, fast high performance analysis methods are needed for big data. Moreover, the high value of big data is precisely the reason for the importance of and research activity in this area. In this paper, we also summarize various applications of big data in chemistry. Professional information platforms like the Collaboratory for Multi-scale Chemical Sciences (CMCS) and Chemical Informatics and Cyberinfrastructure Collaboratory (CICC) have been developed to manage and research chemical big data, while search engines like the ChemDB Portal have been established to extract chemical information from the internet. Software like the Integrated Project View and ArQiologist can be used to assist in the design of new medicines in medicinal chemistry. A data management system called BioGames has been proposed to analyze microfluidics big data. Moreover, graphics processing units are widely used to improve the computational capabilities of molecular dynamics simulations, while compressed score plots have been proposed to solve visualization issues in the field of chemometrics. In the era of big data, the analytical instruments, chemical data systems, and even the research methods may need to be changed and therefore, new strategies and techniques are still needed for the generation and processing of big data.
Read full abstract