Abstract

AbstractThe advent of the Big Data era has necessitated a transformational shift in statistical research, responding to the novel demands of data science. Despite extensive discourse within statistical communities on confronting these emerging challenges, we offer our unique perspectives, underscoring the extended responsibilities of statisticians in pre-analysis and post-analysis tasks. Moreover, we propose a new definition and classification of Big Data based on data sources: Type I Big Data, which is the result of aggregating a large number of small datasets via data sharing and curation, and Type II Big Data, which is the Real-World Data (RWD) amassed from business operations and practices. Each category necessitates distinct data preprocessing and preparation (DPP) methods, and the objectives of analysis as well as the interpretation of results can significantly diverge between these two types of Big Data. We further suggest that the statistical communities should consider adopting and rapidly incorporating new paradigms and cultures by learning from other disciplines. Particularly, beyond Breiman’s (Stat Sci 16(3):199–231, 2021) two modeling cultures, statisticians may need to pay more attention to a newly emerging third culture: the integration of algorithmic modeling with multi-scale dynamic modeling based on fundamental physics laws or mechanisms that generate the data. We draw from our experience in numerous related research projects to elucidate these novel concepts and perspectives.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.