Abstract
We review tasks and methods most relevant to Big Data analysis. Emphasis is made on the conceptual and pragmatic issues of the tasks and methods (avoiding unnecessary mathematical details). We suggest that all scope of jobs with Big Data fall into four conceptual modes (types): four modes of large-scale usage of Big Data: 1) intelligent information retrieval; 2) massive (large-scale) conveyed data processing (mining); 3) model inference from data; 4) knowledge extraction from data (regularities detection and structures discovery). The essence of various tasks (clustering, regression, generative model inference, structures discovery etc.) are elucidated. We compare key methods of clustering, regression, classification, deep learning, generative model inference and causal discovery. Cluster analysis may be divided into methods based on mean distance, methods based on local distance and methods based on a model. The targeted (predictive) methods fall into two categories: methods which infer a model; "tied to data" methods which compute prediction directly from data. Common tasks of temporal data analysis are briefly overviewed. Among diverse methods of generative model inference we make focus on causal network learning because models of this class are very expressive, flexible and are able to predict effects of interventions under varying conditions. Independence-based approach to causal network inference from data is characterized. We give a few comments on specificity of task of dynamical causal network inference from timeseries. Challenges of Big Data analysis raised by data multidimensionality, heterogeneity and huge volume are presented. Some statistical issues related to the challenges are summarized. Problems in programming 2019; 3: 58-85
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.