Abstract

The entering into big data era gives rise to a novel discipline called Data Science. To start with, a very brief history, interdisciplinarity, theoretical framework, and taxonomy of Data Science are discussed. Then, the differences between domain-general Data Science and domain-specific Data Science are proposed based upon conducting literature reviews on hot topics in big data-related studies. In addition, ten common debates in Data Science are described, including debates on thinking pattern, properties of big data, enablers of intelligence, bottlenecks in data products development, data preparation, quality of services, big data analysis, evaluation of big data algorithms, the fourth paradigm and big data skills shortage. Moreover, the emerging trends in Data Science are presented: shifts in data analysis methodologies, adoption of model integration and meta-analysis, introducing data first, schema later or never paradigm, rethinking data consistency in big data systems, recognizing data replication and data locality, growth in integrated data applications, changes in the complexity of data computing, the advent of data products, the rise of pro-ams and citizen data science, as well as the increasing demand for data scientists. In conclusion, some suggestions for further studies are also proposed: to avoid misconstruing Data Science, to take advantages of active property of big data, to balance the three dimensions of Data Science, to introduce Design of Experiments, to embrace causality analysis, and to develop data products.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call