Networked data are ubiquitous in this era of the social, economical and technological revolution resting on the backbone of the internet. With the spread of mobile phones, sensors, embedded devices and industrial robots, the ability to collect and generate interdependent data is on an all time high. This enormity of data is supplemented with social networks like Facebook, Tumblr, LinkedIn, Twitter and many others, that connect people around the globe as a network, allowing them to share the data they collect, generate or distribute, in real-time. Further, with the 'Internet of things', appliances, vehicles and wearable technological devices can communicate with each other. The resulting trend is better and bigger ways of "collecting, creating, managing and storing of data" also known as Big Data (White House, 2014). Most often, such big data capture a rich structure of inter-relationships, resulting in either an extrinsic graphical pattern as in social and sensor networks or dependencies that can be modelled as a graph, like neighbouring pixels sharing similar intensities, in real-time 3D scene capture images from the autonomous vehicle cameras. More often, the nature of such data is such that it is streamed in real-time from multiple sources as in the case of the network of sensors. The data has a sequential nature as seen in product recommendation based on the user clickthrough rate. The data is dynamic as in the case of evolving blog communities or shifting in pattern as often seen in the example of trending tweets. In reality, even with the abundance of data, only a small percentage of it has available labelled information or annotation that can be used to model and categorize the vast quantities of unlabelled data. Naturally, the questions that arise are: how to program computers to automatically learn the underlying model and predict on the fly from streaming data? How does the computer algorithm capture the sequential nature of events? Further, more complex questions are can the algorithms guarantee that they will be efficient regardless of any sequence of data they see, in any order? Are these methods adaptive enough to predict based on dynamic real-time changes or how can the algorithms learn the unknown labels from the few available labels for very large and sparse networked data?