Skip to content

Commit 6a344c5

Browse files
authored
Merge pull request sonic-net#916 from shyam77git/patch-1
HLD for 'Have a deterministic approach in SONiC for Interface Link bring-up sequence'
2 parents 66277d7 + 1bdd505 commit 6a344c5

File tree

1 file changed

+195
-0
lines changed

1 file changed

+195
-0
lines changed
Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
# Feature Name
2+
Deterministic Approach for Interface Link bring-up sequence
3+
4+
# High Level Design Document
5+
#### Rev 0.7
6+
7+
# Table of Contents
8+
* [List of Tables](#list-of-tables)
9+
* [Revision](#revision)
10+
* [About This Manual](#about-this-manual)
11+
* [Abbreviation](#abbreviation)
12+
* [References](#references)
13+
* [Problem Definition](#problem-definition)
14+
* [Background](#background)
15+
* [Objective](#objective)
16+
* [Plan](#plan)
17+
* [Pre-requisite](#pre-requisite)
18+
* [Breakout handling](#breakout-handling)
19+
* [Proposed Work-Flows](#proposed-work-flows)
20+
21+
# List of Tables
22+
* [Table 1: Definitions](#table-1-definitions)
23+
* [Table 2: References](#table-2-references)
24+
25+
# Revision
26+
| Rev | Date | Author | Change Description |
27+
|:---:|:-----------:|:----------------------------------:|-----------------------------------|
28+
| 0.1 | 08/16/2021 | Shyam Kumar | Initial version
29+
| 0.2 | 12/13/2021 | Shyam Kumar, Jaganathan Anbalagan | Added uses-cases, workflows
30+
| 0.3 | 01/19/2022 | Shyam Kumar, Jaganathan Anbalagan | Addressed review-comments
31+
| 0.4 | 01/26/2022 | Shyam Kumar, Jaganathan Anbalagan | Addressed further review-comments
32+
| 0.5 | 01/28/2022 | Shyam Kumar, Jaganathan Anbalagan | Addressed further review-comments
33+
| 0.6 | 02/02/2022 | Shyam Kumar | Added feature-enablement workflow
34+
| 0.7 | 02/02/2022 | Jaganathan Anbalagan | Added Breakout Handling
35+
36+
37+
# About this Manual
38+
This is a high-level design document describing the need to have determinstic approach for
39+
Interface link bring-up sequence and workflows for use-cases around it
40+
41+
# Abbreviation
42+
43+
# Table 1: Definitions
44+
| **Term** | **Definition** |
45+
| -------------- | ------------------------------------------------ |
46+
| pmon | Platform Monitoring Service |
47+
| xcvr | Transceiver |
48+
| xcvrd | Transceiver Daemon |
49+
| CMIS | Common Management Interface Specification |
50+
| gbsyncd | Gearbox (External PHY) docker container |
51+
| DPInit | Data-Path Initialization |
52+
| QSFP-DD | QSFP-Double Density (i.e. 400G) optical module |
53+
54+
# References
55+
56+
# Table 2 References
57+
58+
| **Document** | **Location** |
59+
|---------------------------------------------------------|---------------|
60+
| CMIS v4 | [QSFP-DD-CMIS-rev4p0.pdf](http://www.qsfp-dd.com/wp-content/uploads/2019/05/QSFP-DD-CMIS-rev4p0.pdf) |
61+
| CMIS v5 | [CMIS5p0.pdf](http://www.qsfp-dd.com/wp-content/uploads/2021/05/CMIS5p0.pdf) |
62+
63+
64+
# Problem Definition
65+
66+
1. Presently in SONiC, there is no synchronization between Datapath Init operation of CMIS complaint optical module and enabling ASIC (NPU/PHY) Tx which may cause link instability during administrative interface enable “config interface startup Ethernet” configuration and bootup scenarios.
67+
68+
For CMIS-compliant active (optical) modules, the Host (NPU/PHY) needs to provide a valid high-speed Tx input signal at the required signaling rate and encoding type prior to causing a DPSM to exit from DPDeactivated state and to move to DP Init transient state.
69+
70+
Fundamentally it means - have a deterministic approach to bring-up the interface.
71+
72+
Also, this problem is mentioned ‘as outside-the-scope’ of ‘CMIS Application Initialization’ high-level design document
73+
**(https://github.com/ds952811/SONiC/blob/0e4516d7bf707a36127438c7f2fa9cc2b504298e/doc/sfp-cmis/cmis-init.md#outside-the-scope)**
74+
75+
2. During administrative interface disable “config interface shutdown Ethernet”, only the ASIC(NPU) Tx is disabled and not the opticcal module Tx/laser.
76+
This will lead to power wastage and un-necessary fan power consumption to keep the module temperature in operating range
77+
78+
# Background
79+
80+
Per the ‘CMIS spec’, ‘validation, diagnostics’ done by HW team' and 'agreement with vendors',
81+
need to follow following bring-up seq to enable port/interface with CMIS compliant optical modules in LC/chassis:
82+
83+
a) Enable port on NPU (bring-up port, serdes on the NPU ; enable signals) : syncd
84+
b) Enable port on PHY (bring-up port, serdes on the PHY ; enable signals) : gbsyncd
85+
- Wait for signal to stabilize on PHY
86+
c) Enable optical module (data path initializatio, turn laser on/ enable tx) : xcvrd
87+
88+
In boards not having PHY, #b) not needed but #a) and #c) sequence to be followed.
89+
90+
## Clause from CMIS4.0 spec
91+
92+
Excerpt from CMIS4.0 spec providing detailed reasoning for the above-mentioned bring-up sequence
93+
94+
![61f5b485-cf3b-4ca8-beac-9102b6feabfe](https://user-images.githubusercontent.com/69485234/147173702-f124fc9d-ef27-4816-b1a1-b4a44a5833a7.PNG)
95+
96+
97+
## Clause from CMIS5.0 spec
98+
99+
Excerpt from CMIS5.0 spec providing detailed reasoning for the above-mentioned bring-up sequence
100+
101+
![96a35dc5-618f-418c-9593-5639a90f1b28](https://user-images.githubusercontent.com/69485234/147173164-5ad0123c-479a-4774-b3ee-12a81fdd7d7e.PNG)
102+
103+
104+
# Objective
105+
106+
Have a determistic approach for Interface link bring-up sequence for all interfaces types i.e. below sequence to be followed:
107+
1. Initialize and enable NPU Tx and Rx path
108+
2. For system with 'External' PHY: Initialize and enable PHY Tx and Rx on both line and host sides; ensure host side link is up
109+
3. Then only perform optics data path initialization/activation/Tx enable (for CMIS complaint optical modules) and Tx enable (for SFF complaint optical modules)
110+
111+
# Plan
112+
113+
Plan is to follow this high-level work-flow sequence to accomplish the Objective:
114+
- xcvrd to subscribe to a new field “host_tx_ready” in port table state-DB
115+
- Orchagent will set the “host_tx_ready” to true/false based on the SET_ADMIN_STATE attribute return status from syncd/gbsyncd. (As part of SET_ADMIN_STATE attribute enable, the NPU Tx is enabled)
116+
- xcvrd process the “host_tx_ready” value change event and do optics datapath init / de-init using CMIS API
117+
- Per the discussion and agreement in sonic-chassis workgroup and OCP community, plan is to follow this proposal for all the known interfaces types- 400G/100G/40G/25G/10G. Reason being:
118+
- CMIS complaint optical modules:-
119+
All CMIS complaint optical modules will follow this approach as recommended in the CMIS spec.
120+
- SFF complaint optical modules:-
121+
- deterministic approach to bring the interface will eliminate any link stability issue which will be difficult to chase in the production network
122+
e.g. If there is a PHY device in between, and this 'deterministic approach' is not followed, PHY may adapt to a bad signal or interface flaps may occur when the optics tx/rx enabled during PHY initialization.
123+
- there is a possibility of interface link flaps with non-quiescent optical modules <QSFP+/SFP28/SFP+> if this 'deterministic approach' is not followed
124+
- It helps bring down the optical module laser when interface is adminstiratively shutdown. Per the workflow here, this is acheived by xcvrd listening to host_tx_ready field from PORT_TABLE of STATE_DB. Turning the laser off would reduce the power consumption and avoid any lab hazard
125+
- Additionally provides uniform workflow (from SONiC NOS) across all interface types with or without module presence.
126+
- This synchronization will also benefit SFP+ optical modules as they are "plug N play" and may not have quiescent functionality. (xcvrd can use the optional 'soft tx disable' ctrl reg to disable the tx)
127+
128+
# Pre-requisite
129+
130+
As mentioned above in 'Background' and 'Plan' sections, need to follow specified bring-up sequence.
131+
Work flows are designed considering SONiC NOS operating in sync mode.
132+
133+
In case SONiC NOS operates in async mode, then expected behavior is - the return status of the set ADMIN_STATE attribute update in ASIC-DB (syncd/GBsyncd) will be treated to set the host_tx_ready in Orchagent.
134+
135+
# Breakout Handling
136+
- The new 'host_tx_ready' field of Port table in state-DB is created for every interface <regular/breakout interface>.
137+
- Xcvrd processes the 'host_tx_ready' change event and is responsible to disable Tx/laser for all optical lanes or respective optical lane that belongs to the interface in case of breakout.
138+
- Currently the logical mapping between the interface and optical lane is not present in xcvrd. Creating this logical mapping in xcvrd will address breakout interface handling.
139+
140+
# Proposed Work-Flows
141+
142+
Please refer to the flow/sequence diagrams which covers the following required use-cases
143+
- Enabling this feature
144+
- Transceiver initialization
145+
- admin enable configurations
146+
- admin disable configurations
147+
- No transceiver present
148+
149+
# Feature enablement
150+
This feature (optics Interface Link bring-up sequence) would be enabled on per platform basis.
151+
There could be cases where vendor(s)/platform(s) may take time to shift from existing codebase to the model (work-flows) described in this document.
152+
In order to avoid any breakage and ensure gradual migration of different platforms/vendors to this model, there would be new field (flag) in xcvrd to enable/disable this feature.
153+
When xcvrd spawns on LC/board, it would invoke platform plugin to check with the platform (hwsku) whether this feature is yet supported on underlying platform (board/LC) or not
154+
155+
Workflow :
156+
![Enabling 'Interface link bring-up sequence' feature(3)](https://user-images.githubusercontent.com/69485234/152266723-050377ce-d4de-4c67-a405-5acc66474d46.png)
157+
158+
159+
# Transceiver Initialization
160+
(at platform bootstrap layer)
161+
162+
![LC boot-up sequence - optics INIT (platform bootstrap)](https://user-images.githubusercontent.com/69485234/152261613-e20dcda9-2adc-42aa-a1f1-4b8a47dd32af.png)
163+
164+
# Applying 'interface admin startup' configuration
165+
166+
![LC boot-up sequence - 'admin enable' Config gets applied](https://user-images.githubusercontent.com/69485234/147166867-56f3e82d-1b1c-4b7a-a867-5470ee6050e7.png)
167+
168+
169+
# Applying 'interface admin shutdown' configuration
170+
171+
![LC boot-up sequence - 'admin disable' Config gets applied](https://user-images.githubusercontent.com/69485234/147166884-92c9af48-2d64-4e67-8933-f80531d821b4.png)
172+
173+
# No transceiver present
174+
if transceiver is not present:
175+
- All the workflows mentioned above will reamin same ( or get exercised) till host_tx_ready field update
176+
- xcvrd will not perform any action on receiving host_tx_ready field update
177+
178+
179+
# Out of Scope
180+
Following items are not in the scope of this document. They would be taken up separately
181+
1. xcvrd restart
182+
- If the xcvrd goes for restart, then all the DB events will be replayed.
183+
Here the Datapath init/activate for CMIS compliant optical modules, tx-disable register set (for SFF complaint optical modules), will be a no-op if the optics is already in that state
184+
2. syncd/gbsyncd/swss docker container restart
185+
- Cleanup scenario - Check if the host_tx_ready field in STATE-DB need to be updated to “False” for any use-case, either in going down or coming up path
186+
- Discuss further on the possible use-cases
187+
3. CMIS API feature is not part of this design and the APIs will be used in this design. For CMIS HLD, Please refer to:
188+
https://github.com/Azure/SONiC/blob/9d480087243fd1158e785e3c2f4d35b73c6d1317/doc/sfp-cmis/cmis-init.md
189+
4. Error handling of SAI attributes
190+
a) At present, If there is a set attribute failure, orch agent will exit.
191+
Refer the error handling API : https://github.com/Azure/sonic-swss/blob/master/orchagent/orch.cpp#L885
192+
b) Error handling for SET_ADMIN_STATUS attribute will be added in future.
193+
c) A propabale way to handle the failure is to set a error handling attribute to respective container syncd/GBsyncd with attribute that is failed.
194+
The platform layer knows the error better and it will try to recover.
195+

0 commit comments

Comments
 (0)