Abstract

Data pipelines play an important role throughout the data management process. It automates the steps ranging from data generation to data reception thereby reducing the human intervention. A failure or fault in a single step of a data pipeline has cascading effects that might result in hours of manual intervention and clean-up. Data pipeline failure due to faults at different stages of data pipelines is a common challenge that eventually leads to significant performance degradation of data-intensive systems. To ensure early detection of these faults and to increase the quality of the data products, continuous monitoring and fault detection mechanism should be included in the data pipeline. In this study, we have explored the need for incorporating automated fault detection mechanisms and mitigation strategies at different stages of the data pipeline. Further, we identified faults at different stages of the data pipeline and possible mitigation strategies that can be adopted for reducing the impact of data pipeline faults thereby improving the quality of data products. The idea of incorporating fault detection and mitigation strategies is validated by realizing a small part of the data pipeline using action research in the analytics team at a large software-intensive organization within the telecommunication domain.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.