Abstract
Abstract Facebook Datacenter consists of a large number of servers that run diverse Facebook services aggregated to serve any given user request. To allow this aggregation, servers have to interact with each other via different traffic flows which are managed by networking fabric. The underlying connection powering this fabric consists of a large number of pluggable optical interconnects and On Board Optical (OBO) modules carrying production data. This connectivity at scale requires fast and reliable detection of the link failures to ensure resolution. In the first generation of the deployments, detection of the link failure was sequential and a slow process. The troubleshoot process was equally tedious as the available tools required characterizing one optical transceiver at a time. Further, the failure analysis also presented a majority of resolution with no failed optics as a root cause resulting in high No Trouble Found (NTF) rate. In this paper we introduce a novel link failure detection and resolution method that improves on the previous method across three dimensions: faster resolution, reliable troubleshooting and scalable implementation. We introduce BER Illusion Methodology (BIM) that is a highly scalable and resource efficient solution that significantly reduces the time taken to troubleshoot pluggable optical interconnects. This is also scalable to next-gen OBO modules at Facebook datacenters aiming to lower the NTF rate and optimally utilizing the available resources. BIM, which is based on Open Compute Platform (OCP) network switches, can be used to troubleshoot 128 QSFP28, 64 QSFP56 or 32 OBO modules simultaneously in under 30 minutes. The tool is easy to implement and capable of also reporting diagnostics on the transceiver such as Transmitter Power, Transmitter Bias Current, Receiver Power, Case Temperature, Bit Error Rate result per channel, Vendor information and Manufacturing part number. This additional test data report along with true failure indication helps optic suppliers gain confidence and build customer credibility. The open-source nature and the universal applicability of this tool offers possibility for other users to adopt and further customize it for their networking needs.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.