Abstract
Typically, inference algorithms for big data address non-relational data. However, clearly, a lot of real-world data such as social network data, healthcare data, etc. are relational in nature. Therefore, we need more powerful techniques that can scale up richer inference algorithms on relational data. Markov Logic Networks (MLNs) are arguably one of the most popular statistical relational models that can represent complex, uncertain knowledge succinctly. In this paper, we scale up inference algorithms for MLNs to big relational data. Specifically, the probabilistic graphical model underlying an MLN is typically extremely large even for small-sized problems, and performing inference on this model is highly challenging. A pre-dominant approach that is used to improve scalability is to perform lifted inference that does not construct the full graphical model underlying the MLN. Instead, the idea in lifted inference is to use symmetries in the distribution to reduce the size of the model, thus improving scalability. A popular approach to perform lifting utilizes clustering techniques to group together variables with similar distributional characteristics. However, for big relational data, it quickly becomes infeasible to identify these symmetries scalably. In this paper, we design a novel lifted inference system built on top of Spark that takes advantage of parallelism to identify symmetries in the MLN. Thus our work unifies advances in inference for relational data with advances in big data processing technologies. Utilizing the power of Spark, we show that we can perform more accurate inference and scale up relational inference to orders of magnitude larger sized datasets than currently possible by state-of-the-art MLN systems.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.