In recent years, machine-learning (ML) applications have generated considerable interest and shown great potential in optimizing optical network management, such as quality of transmission estimation, traffic prediction, and resource allocation. However, these applications often require large datasets for training, inference, and updating, while network operators are generally reluctant to disclose their data due to privacy concerns and the sensitivity of operational information. Most open-source datasets typically lack transparency regarding network specifics, such as topology details and device configurations, making data acquisition and ML model training more difficult. In response, this paper presents a unified monitoring and telemetry platform that leverages distributed and centralized time-series databases on InfluxDB, a Kafka-based telemetry pipeline, and advanced ML applications. The separation of distributed and centralized databases improves data management flexibility and scalability. The Kafka-based telemetry pipeline ensures high-throughput, low-latency data streaming with end-to-end latency under 0.05 s through optimized partitioning. Additionally, integrating Kafka and InfluxDB allows for real-time data visualization from multiple sources, improving transparency and supporting real-time data streaming for network applications. By implementing this advanced telemetry and ML architecture, network operators can build a more intelligent, responsive, and resilient optical network infrastructure.
Read full abstract