Abstract

Experimenting with different models, documenting results and findings, and repeating these tasks are day-to-day activities for machine learning engineers and data scientists. There is a need to keep control of the machine-learning pipeline and its metadata. This allows users to iterate quickly through experiments and retrieve key findings and observations from historical activity. This is the need that Arangopipe serves. Arangopipe is an open-source tool that provides a data model that captures the essential components of any machine learning life cycle. Arangopipe provides an application programming interface that permits machine-learning engineers to record the details of the salient steps in building their machine learning models. The components of the data model and an overview of the application programming interface is provided. Illustrative examples of basic and advanced machine learning workflows are provided. Arangopipe is not only useful for users involved in developing machine learning models but also useful for users deploying and maintaining them.

Highlights

  • Experimenting with different models, documenting results and findings, and repeating these tasks are day-to-day activities for machine learning engineers and data scientists

  • While these efforts aim to standardize the operations and the data captured in productionalizing machine learning applications, it can be adapted thanks to the flexibility of the graph data model

  • A comprehensive discussion of the advantages of using a graph data model to capture machine learning meta-data and a narrative detailing the progress of a data science project using Arangopipe is available in arangopipe-overview [7] and collaboration with arangopipe [8]

Read more

Summary

Overview

The outlook for the adoption of machine learning-based solutions into technology-enabled aspects of business is strong [25]. Much of model development for machine learning and data analytic applications involves analyzing activities and findings that went into building earlier models, such as examining distribution characteristics of features, effective modeling choices, and results from hyper-parameter tuning experiments. When these models are deployed, there is a frequent need to review data from previous deployments to verify configuration and deployment steps. A database that permits both a graph and a document-oriented data model is ideal to capture data from machine learning model development and deployment activity.

Data science workflow
Software implementation
Illustrative examples of Arangopipe
Basic workflow
Reusing archived steps
Extending the data model
Experimenting and documenting facts about models and data
Using the Arangopipe web user-interface
Storing features from model development
Support for R models
Building and testing
Related work
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call