Abstract

Is massively collaborative machine learning possible? Can we share and organize our collective knowledge of machine learning to solve ever more challenging problems? In a way, yes: as a community, we are already very successful at developing high-quality open-source machine learning libraries, thanks to frictionless collaboration platforms for software development. However, code is only one aspect. The answer is much less clear when we also consider the data that goes into these algorithms and the exact models that are produced. A tremendous amount of work and experience goes into the collection, cleaning, and preprocessing of data and the design, evaluation, and finetuning of models, yet very little of this is shared and organized in a way so that others can easily build on it. Suppose one had a global platform for sharing machine learning datasets, models, and reproducible experiments in a frictionless way so that anybody could chip in at any time to share a good model, add or improve data, or suggest an idea. OpenML is an open-source initiative to create such a platform. It allows anyone to share datasets, machine learning pipelines, and full experiments, organizes all of it online with rich metadata, and enables anyone to reuse and build on them in novel and unexpected ways. All data is open and accessible through APIs, and it is readily integrated into popular machine learning tools to allow easy sharing of models and experiments. This openness also allows a budding ecosystem of automated processes to scale up machine learning further, such as discovering similar datasets, creating systematic benchmarks, or learning from all collected results how to build the best machine learning models and even automatically doing so for any new dataset. We welcome all of you to become a part of it.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call