Extracting knowledge from large, heterogeneous, unstructured and high-dimensional data is one of the major challenges for large-scale machine learning algorithms. In this talk, I will present our recent results developing unsupervised machine learning approaches to explore such data sets. A large number of these datasets follow heavy-tailed distributions, characterized by long-range dependencies. We quantify the tails of these distributions using higher order statistics and use tensor-based representations to build data mining algorithms for: (1) online detection of events that signify anomalies in spatio-temporal patterns; (2) building low- dimensional latent variable models to capture the intrinsic multiscale structure; and (3) hierarchical clustering and visual organization of the data to gain relevant insights. We will illustrate these approaches on a variety of applications including the integration of sparse experimental observations with atomistic-scale information for understanding the function of cellular systems. We will also discuss how these approaches can be widely applied to other domains.