Abstract

Machine learning has made great strides in recent years, and its applications are spreading rapidly. Unfortunately, the standard machine learning formulation does not match well with data management problems. For example, most learning algorithms assume that the data is contained in a single table, and consists of i.i.d. (independent and identically distributed) samples. This leads to a proliferation of ad hoc solutions, slow development, and suboptimal results. Fortunately, a body of machine learning theory and practice is being developed that dispenses with such assumptions, and promises to make machine learning for data management much easier and more effective [1]. In particular, representations like Markov logic, which includes many types of deep networks as special cases, allow us to define very rich probability distributions over non-i.i.d., multi-relational data [2]. Despite their generality, learning the parameters of these models is still a convex optimization problem, allowing for efficient solution. Learning structure-in the case of Markov logic, a set of formulas in first-order logic-is intractable, as in more traditional representations, but can be done effectively using inductive logic programming techniques. Inference is performed using probabilistic generalizations of theorem proving, and takes linear time and space in tractable Markov logic, an object-oriented specialization of Markov logic [3]. These techniques have led to state-of-the-art, principled solutions to problems like entity resolution, schema matching, ontology alignment, and information extraction. Using tractable Markov logic, we have extracted from the Web a probabilistic knowledge base with millions of objects and billions of parameters, which can be queried exactly in subsecond times using an RDBMS backend [3]. With these foundations in place, we expect the pace of machine learning applications in data management to continue to accelerate in coming years.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.