Comware Support For RoCE Technical White Paper
Comware Support For RoCE Technical White Paper
CONTENTS
Introduction.................................................................................................................................................................................................................................................................................................................................2
Versions..........................................................................................................................................................................................................................................................................................................................................2
Benefits............................................................................................................................................................................................................................................................................................................................................ 2
Deployment.................................................................................................................................................................................................................................................................................................................................2
Build a lossless Ethernet for RoCE.........................................................................................................................................................................................................................................................................3
Priority-based Flow Control..........................................................................................................................................................................................................................................................................................3
PFC PAUSE generation mechanism..................................................................................................................................................................................................................................................................... 4
Mapping between message priority and queue ........................................................................................................................................................................................................................................ 4
PFC extended functions .................................................................................................................................................................................................................................................................................................. 4
Explicit congestion notification ................................................................................................................................................................................................................................................................................. 5
Data Center Bridging Exchange Protocol........................................................................................................................................................................................................................................................5
Enhanced Transmission Selection .........................................................................................................................................................................................................................................................................6
Configuration example of lossless Ethernet ...........................................................................................................................................................................................................................................6
Device configuration.....................................................................................................................................................................................................................................................................................................7
Resources...................................................................................................................................................................................................................................................................................................................................... 9
Technical white paper Page 2
INTRODUCTION
RoCE technology supports carrying RDMA protocol on Ethernet to realize RDMA over Ethernet. RoCE and IB technologies have the same
application layer and transmission control layer; only the network layer and the Ethernet link layer are different.
UDP
IB network layer IB network layer
IP
VERSIONS
The RoCE protocol has two versions:
• RoCE v1 protocol is based on Ethernet bearer RDMA. It can only be deployed in a Layer 2 network. RoCE v1 message adds a Layer 2
Ethernet header to the original IB architecture message and identifies the RoCE message through EtherType 0x8915.
• RoCE v2 protocol is based on the UDP/IP protocol to carry RDMA and can be deployed in a 3-layer network. RoCE v2 message adds a
UDP header, an IP header, and a Layer 2 Ethernet message header to the original IB architecture message. The RoCE message is
identified by the UDP destination port number 4791. RoCE v2 supports hash based on the source port number and uses ECMP to
achieve load sharing, which improves network utilization.
BENEFITS
RoCE enables Ethernet-based data transmission to:
• Improve data transmission throughput
• Reduce network latency
• Reduce CPU load
DEPLOYMENT
RoCE technology can be realized by Ethernet switches. To deploy RoCE technology, the server-side is required to support RoCE
network cards, and the network side must support lossless Ethernet. This is because the loss of any packet in the IB packet loss
processing mechanism will cause a large number of retransmissions, which severely reduces data transmission performance.
Technical white paper Page 3
FIGURE 2. Schematic diagram of building a lossless Ethernet network with key features
After receiving the PFC PAUSE, the The local queue buffer reaches the
device buffers the packets in the queue. buffer threshold, and PFC PAUSE is
The cache reaches the threshold, and generated.
the device sends PFC PAUSE to the
upstream device.
Traffic
PFC PAUSE
DCBX processes the information exchange through LLDP and supports configuration information such as ETS, PFC, and application
priority of both sides.
Users can enable the DCBX function by LLDP function globally or on the interface and allow the interface to publish the DCBX TLV, and
then configure the device to publish Application Protocol (APP), ETS, and PFC parameters through the interface according to application
requirements.
Technical white paper Page 6
Device networking
As shown in Figure 7, RoCE network cards are installed on Server 1, Server 2, and Server 3. Server 1 and Server 2 are connected to Server 3
through Ethernet switches Device A and Device B.
To support the RoCE technology, it is now required to build the entire network as a lossless Ethernet. The specific requirements are as follows:
All ports of the packet forwarding path enable the PFC function. In this example, the lossless transmission of packets with 802.1p priority 5.
The switch connects to the server port to enable the DCBX function so that the device and the server network card can negotiate ETS and
PFC parameters.
Configure the ETS function on the Twenty-FiveGigE1/0/3 of Device A and Twenty-FiveGigE1/0/2 of Device B to ensure the transmission
bandwidth of packets with 802.1p priority 5.
The Twenty-FiveGigE1/0/3 port of Device A is configured with the ECN function so that the device can notify the sender to adjust the
sending rate when congestion occurs.
Device configuration
Device A
1. Configure interfaces to trust the 802.1p priority of the packet on Twenty-FiveGigE1/0/1, Twenty-FiveGigE1/0/2, and
Twenty-FiveGigE1/0/3; enable the PFC function of interfaces; and enable PFC function for 802.1p priority 5.
<DeviceA> system-view
[DeviceA-if-range] quit
3. Enable the LLDP function on Twenty-FiveGigE1/0/1 and Twenty-FiveGigE1/0/2 and allow the publication of DCBX TLVs.
Configure the DCBX on Twenty-FiveGigE1/0/1 and Twenty-FiveGigE1/0/2 as version v1.01.
[DeviceA] interface range twenty-fivegige 1/0/1 to twenty-fivegige 1/0/2
[DeviceA-if-range] lldp enable
4. Open the WRR queue on the interface Twenty-FiveGigE1/0/3, calculate based on the number of bytes that can be sent in each poll,
and configure a strict priority scheduling algorithm for port queue 5 (802.1p priority 5 to local priority 5 is the default mapping).
[DeviceA] interface twenty-fivegige 1/0/3
[DeviceA-wred-table-queue-table5] quit
Device B
1. Configure interfaces to trust the 802.1p priority of packets on Twenty-FiveGigE1/0/1 and Twenty-FiveGigE1/0/2; enable the
PFC function of the interface; and enable the PFC function for 802.1p priority 5.
<DeviceB> system-view
[DeviceB] interface range twenty-fivegige 1/0/1 to twenty-fivegige 1/0/2
[DeviceB-if-range] qos trust dot1p
[DeviceB-if-range] priority-flow-control enable
[DeviceB-if-range] priority-flow-control no-drop dot1p 5
[DeviceB-if-range] quit
2. Enable LLDP globally.
[DeviceB] lldp global enable
3. Enable the LLDP function on Twenty-FiveGigE1/0/2 and allow the publication of DCBX TLVs. Configure the DCBX on
Twenty-FiveGigE1/0/2 as version v1.01.
[DeviceB]interface twenty-fivegige 1/0/2
[DeviceB-Twenty-FiveGigE1/0/2] lldp enable
[DeviceB-Twenty-FiveGigE1/0/2] lldp tlv-enable dot1-tlv dcbx
[DeviceB-Twenty-FiveGigE1/0/2] dcbx version rev101
[DeviceB-Twenty-FiveGigE1/0/2] quit
4. Open the WRR queue on the interface Twenty-FiveGigE1/0/2, calculate based on the number of bytes that can be sent in each poll,
and configure a strict priority scheduling algorithm for port queue 5 (802.1p priority 5 to local priority 5 is the default mapping).
[DeviceB] interface twenty-fivegige 1/0/2
[DeviceB-Twenty-FiveGigE1/0/2] qos wrr byte-count
[DeviceB-Twenty-FiveGigE1/0/2] qos wrr 5 group sp
Configuration verification
1. Display the number of discarded packets on Device B.
<DeviceB> display packet-drop summary
All interfaces:
Packets dropped due to Fast Filter Processor (FFP): 0
Packets dropped due to STP non-forwarding state: 0
Packets dropped due to insufficient data buffer. Input dropped: 0 Output dropped: 0
Packets of ECN marked: 1622267130
Packets of WRED dropped: 0
2. Display the bandwidth utilization of port Twenty-FiveGigE1/0/2 on Device B.
<DeviceB> display counters rate outbound interface Twenty-FiveGigE 1/0/2
Usage: Bandwidth utilization in percentage
Interface Usage (%) Total (pps) Broadcast (pps) Multicast (pps)
WGE1/0/2 100 2825427 -- --
Additional information
QoS setting could be mapped to the VXLAN EVPN underlay in the case of DCI. When IP QoS setting is leveraged for use cases such as
RoCE v2, the payload DSCP setting can be mapped to the underlay network. Underlay network must honor such setting from overlay in
support of the QoS requirements.
Technical white paper
By default, on inbound ports with XConnect configured, QoS markings are trusted. The following configuration sample can remark the
incoming traffic when needed.
DC1 DC2
QoS
Trust Queuing DCI
Network
(VX LAN)
RESOURCES
HPE FlexFabric 5945 Switch Series Fundamentals Configuration Guide
HPE FlexFabric 5945 Switch Series documentation
LEARN MORE AT
hpe.com/us/en/networking/comware.html
© Copyright 2022 Hewlett Packard Enterprise Development LP. The information contained herein is subject to change without
notice. The only warranties for Hewlett Packard Enterprise products and services are set forth in the express warranty statements
accompanying such products and services. Nothing herein should be construed as constituting an additional warranty.
Hewlett Packard Enterprise shall not be liable for technical or editorial errors or omissions contained herein.
VMware vSphere is a registered trademark or trademark of VMware, Inc. and its subsidiaries in the United States and other
jurisdictions. All third-party marks are property of their respective owners.
a50005454ENW