|
| 1 | +# Feature Name |
| 2 | +Deterministic Approach for Interface Link bring-up sequence |
| 3 | + |
| 4 | +# High Level Design Document |
| 5 | +#### Rev 0.7 |
| 6 | + |
| 7 | +# Table of Contents |
| 8 | + * [List of Tables](#list-of-tables) |
| 9 | + * [Revision](#revision) |
| 10 | + * [About This Manual](#about-this-manual) |
| 11 | + * [Abbreviation](#abbreviation) |
| 12 | + * [References](#references) |
| 13 | + * [Problem Definition](#problem-definition) |
| 14 | + * [Background](#background) |
| 15 | + * [Objective](#objective) |
| 16 | + * [Plan](#plan) |
| 17 | + * [Pre-requisite](#pre-requisite) |
| 18 | + * [Breakout handling](#breakout-handling) |
| 19 | + * [Proposed Work-Flows](#proposed-work-flows) |
| 20 | + |
| 21 | +# List of Tables |
| 22 | + * [Table 1: Definitions](#table-1-definitions) |
| 23 | + * [Table 2: References](#table-2-references) |
| 24 | + |
| 25 | +# Revision |
| 26 | +| Rev | Date | Author | Change Description | |
| 27 | +|:---:|:-----------:|:----------------------------------:|-----------------------------------| |
| 28 | +| 0.1 | 08/16/2021 | Shyam Kumar | Initial version |
| 29 | +| 0.2 | 12/13/2021 | Shyam Kumar, Jaganathan Anbalagan | Added uses-cases, workflows |
| 30 | +| 0.3 | 01/19/2022 | Shyam Kumar, Jaganathan Anbalagan | Addressed review-comments |
| 31 | +| 0.4 | 01/26/2022 | Shyam Kumar, Jaganathan Anbalagan | Addressed further review-comments |
| 32 | +| 0.5 | 01/28/2022 | Shyam Kumar, Jaganathan Anbalagan | Addressed further review-comments |
| 33 | +| 0.6 | 02/02/2022 | Shyam Kumar | Added feature-enablement workflow |
| 34 | +| 0.7 | 02/02/2022 | Jaganathan Anbalagan | Added Breakout Handling |
| 35 | + |
| 36 | + |
| 37 | +# About this Manual |
| 38 | +This is a high-level design document describing the need to have determinstic approach for |
| 39 | +Interface link bring-up sequence and workflows for use-cases around it |
| 40 | + |
| 41 | +# Abbreviation |
| 42 | + |
| 43 | +# Table 1: Definitions |
| 44 | +| **Term** | **Definition** | |
| 45 | +| -------------- | ------------------------------------------------ | |
| 46 | +| pmon | Platform Monitoring Service | |
| 47 | +| xcvr | Transceiver | |
| 48 | +| xcvrd | Transceiver Daemon | |
| 49 | +| CMIS | Common Management Interface Specification | |
| 50 | +| gbsyncd | Gearbox (External PHY) docker container | |
| 51 | +| DPInit | Data-Path Initialization | |
| 52 | +| QSFP-DD | QSFP-Double Density (i.e. 400G) optical module | |
| 53 | + |
| 54 | +# References |
| 55 | + |
| 56 | +# Table 2 References |
| 57 | + |
| 58 | +| **Document** | **Location** | |
| 59 | +|---------------------------------------------------------|---------------| |
| 60 | +| CMIS v4 | [QSFP-DD-CMIS-rev4p0.pdf](http://www.qsfp-dd.com/wp-content/uploads/2019/05/QSFP-DD-CMIS-rev4p0.pdf) | |
| 61 | +| CMIS v5 | [CMIS5p0.pdf](http://www.qsfp-dd.com/wp-content/uploads/2021/05/CMIS5p0.pdf) | |
| 62 | + |
| 63 | + |
| 64 | +# Problem Definition |
| 65 | + |
| 66 | +1. Presently in SONiC, there is no synchronization between Datapath Init operation of CMIS complaint optical module and enabling ASIC (NPU/PHY) Tx which may cause link instability during administrative interface enable “config interface startup Ethernet” configuration and bootup scenarios. |
| 67 | + |
| 68 | + For CMIS-compliant active (optical) modules, the Host (NPU/PHY) needs to provide a valid high-speed Tx input signal at the required signaling rate and encoding type prior to causing a DPSM to exit from DPDeactivated state and to move to DP Init transient state. |
| 69 | + |
| 70 | + Fundamentally it means - have a deterministic approach to bring-up the interface. |
| 71 | + |
| 72 | + Also, this problem is mentioned ‘as outside-the-scope’ of ‘CMIS Application Initialization’ high-level design document |
| 73 | + **(https://github.com/ds952811/SONiC/blob/0e4516d7bf707a36127438c7f2fa9cc2b504298e/doc/sfp-cmis/cmis-init.md#outside-the-scope)** |
| 74 | + |
| 75 | +2. During administrative interface disable “config interface shutdown Ethernet”, only the ASIC(NPU) Tx is disabled and not the opticcal module Tx/laser. |
| 76 | + This will lead to power wastage and un-necessary fan power consumption to keep the module temperature in operating range |
| 77 | + |
| 78 | +# Background |
| 79 | + |
| 80 | + Per the ‘CMIS spec’, ‘validation, diagnostics’ done by HW team' and 'agreement with vendors', |
| 81 | + need to follow following bring-up seq to enable port/interface with CMIS compliant optical modules in LC/chassis: |
| 82 | + |
| 83 | + a) Enable port on NPU (bring-up port, serdes on the NPU ; enable signals) : syncd |
| 84 | + b) Enable port on PHY (bring-up port, serdes on the PHY ; enable signals) : gbsyncd |
| 85 | + - Wait for signal to stabilize on PHY |
| 86 | + c) Enable optical module (data path initializatio, turn laser on/ enable tx) : xcvrd |
| 87 | + |
| 88 | + In boards not having PHY, #b) not needed but #a) and #c) sequence to be followed. |
| 89 | + |
| 90 | + ## Clause from CMIS4.0 spec |
| 91 | + |
| 92 | + Excerpt from CMIS4.0 spec providing detailed reasoning for the above-mentioned bring-up sequence |
| 93 | + |
| 94 | +  |
| 95 | + |
| 96 | + |
| 97 | + ## Clause from CMIS5.0 spec |
| 98 | + |
| 99 | + Excerpt from CMIS5.0 spec providing detailed reasoning for the above-mentioned bring-up sequence |
| 100 | + |
| 101 | +  |
| 102 | + |
| 103 | + |
| 104 | +# Objective |
| 105 | + |
| 106 | +Have a determistic approach for Interface link bring-up sequence for all interfaces types i.e. below sequence to be followed: |
| 107 | + 1. Initialize and enable NPU Tx and Rx path |
| 108 | + 2. For system with 'External' PHY: Initialize and enable PHY Tx and Rx on both line and host sides; ensure host side link is up |
| 109 | + 3. Then only perform optics data path initialization/activation/Tx enable (for CMIS complaint optical modules) and Tx enable (for SFF complaint optical modules) |
| 110 | + |
| 111 | +# Plan |
| 112 | + |
| 113 | +Plan is to follow this high-level work-flow sequence to accomplish the Objective: |
| 114 | +- xcvrd to subscribe to a new field “host_tx_ready” in port table state-DB |
| 115 | +- Orchagent will set the “host_tx_ready” to true/false based on the SET_ADMIN_STATE attribute return status from syncd/gbsyncd. (As part of SET_ADMIN_STATE attribute enable, the NPU Tx is enabled) |
| 116 | +- xcvrd process the “host_tx_ready” value change event and do optics datapath init / de-init using CMIS API |
| 117 | +- Per the discussion and agreement in sonic-chassis workgroup and OCP community, plan is to follow this proposal for all the known interfaces types- 400G/100G/40G/25G/10G. Reason being: |
| 118 | + - CMIS complaint optical modules:- |
| 119 | + All CMIS complaint optical modules will follow this approach as recommended in the CMIS spec. |
| 120 | + - SFF complaint optical modules:- |
| 121 | + - deterministic approach to bring the interface will eliminate any link stability issue which will be difficult to chase in the production network |
| 122 | + e.g. If there is a PHY device in between, and this 'deterministic approach' is not followed, PHY may adapt to a bad signal or interface flaps may occur when the optics tx/rx enabled during PHY initialization. |
| 123 | + - there is a possibility of interface link flaps with non-quiescent optical modules <QSFP+/SFP28/SFP+> if this 'deterministic approach' is not followed |
| 124 | + - It helps bring down the optical module laser when interface is adminstiratively shutdown. Per the workflow here, this is acheived by xcvrd listening to host_tx_ready field from PORT_TABLE of STATE_DB. Turning the laser off would reduce the power consumption and avoid any lab hazard |
| 125 | + - Additionally provides uniform workflow (from SONiC NOS) across all interface types with or without module presence. |
| 126 | + - This synchronization will also benefit SFP+ optical modules as they are "plug N play" and may not have quiescent functionality. (xcvrd can use the optional 'soft tx disable' ctrl reg to disable the tx) |
| 127 | + |
| 128 | +# Pre-requisite |
| 129 | + |
| 130 | +As mentioned above in 'Background' and 'Plan' sections, need to follow specified bring-up sequence. |
| 131 | +Work flows are designed considering SONiC NOS operating in sync mode. |
| 132 | + |
| 133 | +In case SONiC NOS operates in async mode, then expected behavior is - the return status of the set ADMIN_STATE attribute update in ASIC-DB (syncd/GBsyncd) will be treated to set the host_tx_ready in Orchagent. |
| 134 | + |
| 135 | +# Breakout Handling |
| 136 | + - The new 'host_tx_ready' field of Port table in state-DB is created for every interface <regular/breakout interface>. |
| 137 | + - Xcvrd processes the 'host_tx_ready' change event and is responsible to disable Tx/laser for all optical lanes or respective optical lane that belongs to the interface in case of breakout. |
| 138 | + - Currently the logical mapping between the interface and optical lane is not present in xcvrd. Creating this logical mapping in xcvrd will address breakout interface handling. |
| 139 | + |
| 140 | +# Proposed Work-Flows |
| 141 | + |
| 142 | +Please refer to the flow/sequence diagrams which covers the following required use-cases |
| 143 | + - Enabling this feature |
| 144 | + - Transceiver initialization |
| 145 | + - admin enable configurations |
| 146 | + - admin disable configurations |
| 147 | + - No transceiver present |
| 148 | + |
| 149 | +# Feature enablement |
| 150 | + This feature (optics Interface Link bring-up sequence) would be enabled on per platform basis. |
| 151 | + There could be cases where vendor(s)/platform(s) may take time to shift from existing codebase to the model (work-flows) described in this document. |
| 152 | + In order to avoid any breakage and ensure gradual migration of different platforms/vendors to this model, there would be new field (flag) in xcvrd to enable/disable this feature. |
| 153 | + When xcvrd spawns on LC/board, it would invoke platform plugin to check with the platform (hwsku) whether this feature is yet supported on underlying platform (board/LC) or not |
| 154 | + |
| 155 | + Workflow : |
| 156 | +  |
| 157 | + |
| 158 | + |
| 159 | +# Transceiver Initialization |
| 160 | + (at platform bootstrap layer) |
| 161 | + |
| 162 | + |
| 163 | + |
| 164 | +# Applying 'interface admin startup' configuration |
| 165 | + |
| 166 | + |
| 167 | + |
| 168 | + |
| 169 | +# Applying 'interface admin shutdown' configuration |
| 170 | + |
| 171 | + |
| 172 | + |
| 173 | +# No transceiver present |
| 174 | +if transceiver is not present: |
| 175 | + - All the workflows mentioned above will reamin same ( or get exercised) till host_tx_ready field update |
| 176 | + - xcvrd will not perform any action on receiving host_tx_ready field update |
| 177 | + |
| 178 | + |
| 179 | +# Out of Scope |
| 180 | +Following items are not in the scope of this document. They would be taken up separately |
| 181 | +1. xcvrd restart |
| 182 | + - If the xcvrd goes for restart, then all the DB events will be replayed. |
| 183 | + Here the Datapath init/activate for CMIS compliant optical modules, tx-disable register set (for SFF complaint optical modules), will be a no-op if the optics is already in that state |
| 184 | +2. syncd/gbsyncd/swss docker container restart |
| 185 | + - Cleanup scenario - Check if the host_tx_ready field in STATE-DB need to be updated to “False” for any use-case, either in going down or coming up path |
| 186 | + - Discuss further on the possible use-cases |
| 187 | +3. CMIS API feature is not part of this design and the APIs will be used in this design. For CMIS HLD, Please refer to: |
| 188 | + https://github.com/Azure/SONiC/blob/9d480087243fd1158e785e3c2f4d35b73c6d1317/doc/sfp-cmis/cmis-init.md |
| 189 | +4. Error handling of SAI attributes |
| 190 | + a) At present, If there is a set attribute failure, orch agent will exit. |
| 191 | + Refer the error handling API : https://github.com/Azure/sonic-swss/blob/master/orchagent/orch.cpp#L885 |
| 192 | + b) Error handling for SET_ADMIN_STATUS attribute will be added in future. |
| 193 | + c) A propabale way to handle the failure is to set a error handling attribute to respective container syncd/GBsyncd with attribute that is failed. |
| 194 | + The platform layer knows the error better and it will try to recover. |
| 195 | + |
0 commit comments