Given a large-scale mobile network with a variety of equipment and radio access network technologies for an approximate 20 million subscribers, there are many types of data that can be used for big data analytics and machine learning (ML) tasks for network operations, monitoring, and optimization. However, a variety of data is measured, collected, and propagated through numerous complex data and software systems. Thus, people, software components, and data-driven operations for big data and ML pipelines face great challenges in dealing with data quality impacts. Data quality related problems occur and are propagated through complex operations involving different types of data, people, software components, and analytics that cannot be solved purely through data quality engineering. This paper discusses our TENSAI framework, as a practical and responsible observability for ensuring data quality in such a mobile network. TENSAI focuses on methods of communication, strategy specifications, and data quality engineering for diverse types of data and analytics among different types of operations. TENSAI presents techniques for capturing and communicating causes/effects of data quality problems clearly to all relevant stakeholders, developing data quality-aware adaptation strategies for actions on data that can be integrated into analytics processes, and engineering data quality awareness in software and data pipelines. Thus, TENSAI supports full visibility of data quality problems and impacts among related systems to empower the utilization and adaptation of data analytics for different types of operations. We will illustrate our TENSAI with several real-world data types, pipelines, and cases based on a real-world mobile network.
Read full abstract