Abstract

Despite algorithmic advancements in the field of machine learning, a need for improvement in the infrastructure supporting machine learning development and research has become increasingly apparent. Machine learning experiments usually tend to be more ad-hoc in nature, and results are communicated most often in the form of a publication. Experimental details are often omitted due to size or time constraints, or simply because the complexity in terms of technical setup or parametrization became intractable. Even access to code bases, disregard important properties of the environment and experimental setup, like for example random generators or computing infrastructure. At the same time, tracking and communicating an often inherently exploratory scientific process is a task with considerable effort. We explored different venues to tackle these issues from a data science engineering point of view. The efforts resulted in PyPads, a framework providing an infrastructure to extend experimental setups with logging, communication and analysis features in a mostly non-intrusive way. PyPads can be extended to different Python-based frameworks, utilizing community driven, descriptive metadata in an effort to harmonize library specific logs in an ontology. Meanwhile, we also try to emphasize similarities to practices in software engineering, which have turned out to be essential in practical applications.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.