86% found this document useful (7 votes)
6K views

PCIe Training PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
86% found this document useful (7 votes)
6K views

PCIe Training PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 133

Basics of the PCIe Protocol

Bhabani Shankar Agrawala

© Mirafra Technologies Pvt. Ltd.

1
Introduction
Agenda
1) Application Areas
2) PCIe Features
3) PCIe Topology
4) PCIe Layers
5) PCIe Transaction

Transaction Layer
1) Generic TLP Header
2) Split Transaction
3) Transaction Types
4) Memory Transaction
5) Memory Read Locked Transaction
6) IO Transaction
7) Configuration Transaction
8) Completion Transaction
9) Message Transaction
10) Virtual Channel
11) Transaction Ordering
12) ECRC
2
13) Completion Timeout Mechanism
Agenda Contd
14) MSI
15) Address Routing
16) ID Routing
17) Enumeration

Data Link Layer


1) Generic DLLP Packet Format
2) ACK/NAK DLLP
3) Flow Control DLLP
4) Vendor Defined DLLP
5) TLP Within DLL
6) Flow control protocol
7) Priorities of the packets in DLL

Physical Layer
1) Features of PCIe PhyLayer
2) Generic Packet format rules
3) Scrambling
4) Order Sets
5) LTSSM states 3
Agenda Contd
6) ASPM States

Power Management
1) PM Introduction
2) Device PM

4
Introduction
First Generation Bus protocol
1. ISA: Industry Standard Architecture
2. EISA: Extended ISA
3. VESA: Video Electronics Standard Association, VL Bus

Second Generation Bus Protocol


1. AGP: Advanced Graphics port
2. PCI: Peripheral Component Interconnect
3. PCI-X: PCI Extension

Third Generation Bus Protocol


1. PCIe: PCI Express

ISA is the oldest of all these and today's computers still have a ISA bus interface in form of an ISA slot
(connection) on the main board.

ISA has 8-bit and 16-bit standards along with the 32-bit version (EISA). ISA has 20-bits address bus.
All three versions operate at 8MHz

5
Introduction (Cont’d)
EISA is rarely used -- mainly as a disk controller or video graphics adapter
Most of the new EISA connections are used for the 32-bit data and 32-bit latched address bus.

VESA (VL bus) is a 33MHz extension of the ISA bus used of high-speed data transfer applications.
It contains 32-bit address and data bus and is mainly used for video and disk interfaces.
Requires a third connector (VESA connector) to be added behind the standard 16-bit ISA connector.

PCI Limitations:
--> Maximum frequency is 66MHz as PCI supports reflective wave signalling drivers that are weaker drivers,
which have slower rise and fall times as compared to incident waves signalling drivers.
The bus is loaded with fewer loads in order to ensure faster signal rise and fall times. Taking into account
typical board impedances and minimum signal trace length, it is possible to interconnect a maximum of 4-5
66MHz PCI devices.
--> Efficiency is in the order of 50% to 60%.
--> Do not indicate transfer size, so buffer management with-in the master and target is inefficient.
--> Delayed transaction is not handle properly, master does not know when to retry.
--> All PCI master access to the system memory result in a snoop access to the system CPU cache.
--> PCI bus cycles observes strict ordering rules.
--> Interrupt handling is inefficient as multiple devices share a PCI interrupt signal.
--> Error handling is not proper.
The processor's NMI interrupt input pin is asserted when a PCI parity or system error is detected, ultimately
the system bus shuts down when an error is detected.
--> Supports data-bus size of 64bits. 6
Introduction (Cont’d)
PCI-X Limitations:
--> PCI-X supports 8-10 loads or 4 connectors at 66MHz
3-4 loads or 1-2 connectors at 133MHz
--> following the first data phase, PCI-X does not allow wait states during subsequent data phases.
--> Transfer size is specified in the attribute phase, this allows more efficient device buffer management.
--> Requester and completer supports split transaction model (retry in PCI only).
--> Increased transfer efficiency of 85%.
--> Supports MSI, resulting in reduced interrupt servicing latency and elimination of interrupt signal.
--> Supports Relaxed ordering.
--> Supports Non-Snoop access.
--> PCI-X supports 66MHz to 533MHz.
--> PCI-X peak band-width capability is 4256MB-per-Second for a 64-bits 533MHz, which is 4 times of PCI
133MHz.

7
Introduction (Cont’d)
 High speed Serial Interconnect
 Envisioned to replace conventional PCI
 Data Flow Bidirectional
 Data flows in parallel in both TX and RX direction
 Gen1: 2.5Gb/lane/direction/sec
 Gen2: 5.0Gb/lane/direction/sec
 Gen3: 8.0Gb/lane/direction/sec
 Backward compatible to the existing OS without any changes in current driver.

 Application Areas:
Mobile
Desktop
Server
Work station
Embedded computing
Communication platforms

8
PCIe Features
 Software backward compatible
 Extra configuration space defined for PCIe specific features.
-- Same configuration space definition as conventional PCI
 Supports chip-2-chip, board-2-board interconnection via cards & connector
 Serial interconnect reduces the pin count, package density & cost.
 Bandwidth is scalable: x1, x2, x4, x8, x12, x16 and x32
 QOS (Quality of Service)
-- Virtual Channel
-- Differentiated traffic flow over the same physical pipe based on Virtual channel concepts
 Hot plug: Software added hot plug capability
 Power management
-- Supports conventional software-based power management
-- New Autonomous Link-level power management
 Error handling: Multiple levels of error handling
 Flow control
-- Credit based transactions, Helps to estimate buffering requirements and so packets don’t get
dropped due to lack of space

9
PCIe Features Contd
 Data Integrity: CRCs at many layers
-- Link CRC (LCRC),
-- End-to-end CRC (ECRC)
-- Suppress data corruption at the earliest
-- Replay Feature: Replays packets if there is CRC error
 Interrupt signalling: Pin based Interrupt through Virtual message
 MSI Interrupt

10
PCIe Topology
Endpoint
Dnstream Switch Port
Upstream Switch Port
CPU Root Port

Endpoint
ROOT COMPLEX
MEMORY
PCIe - PCI Bridge PCI/PCI-X

PCIe SWITCH
Endpoint Legacy Endpoint

Endpoint Endpoint Endpoint

Example PCIe Topology


11
Link:
PCIe Topology Contd
A logical identify where data flows over single or multiple lanes
PCIe packet is striped over multiple lane.

Lane:
A pair of two differential lines (Tx+ and Tx-)
A lane belongs to a link
The number of lanes in a link can be 1, 2, 4, 8, 12, 16, 32.

Root complex:
In the PCIe fabric, it’s closest to the CPU
Denotes the device connected to CPU & Memory Subsystem.
Supports one or more PCIe ports.
It generates transaction req. on behalf of CPU.
Generates Mem. & IO req. along with locked transaction req. on behalf of CPU.
It implements central resources such as:
hot plug controller,
power management controller,
interrupt. controller,
12
error detection & reporting logic.
PCIe Topology Contd
Endpoint:
It is at the farthest from CPU in PCIe fabric
It can initiate transaction as a requester and respond to a transaction as a completer.
It is initialized with a device ID consisting of a bus no., device no. and function no. Device number
is always 0 on a bus.
Two types of endpoints exist:
Legacy Endpoints – Supports IO transactions, Locked transaction (as completer only). Does
not support 64-bit mem. addressing capability.
PCIe Endpoints – Does not support IO or locked transactions. Supports 64-bit mem.
addressing capability.

Switches:
Switches are used to connect multiple devices.

13
PCI Express (PCIe) Layers
• PCIe is Layered Architecture
• Transaction layer
• Data Link Layer
• Phy Logical Layer
MAC Layer (Media Access Control)
PCS Layer (Physical Coding Sublayer)
PMA Layer (Physical Media Attachment)

14
PCIe Transaction Layer
 The uppermost layer in the PCIe protocol stack
 Primary responsibility is the assembly and disassembly of Transaction Layer Packets (TLPs).
 TLPs are used to communicate transactions, such as read and write, as well as certain types of
events.
 Transaction Layer is responsible for managing credit-based flow control for TLPs.
 Every request packet requiring a response packet is implemented as a split transaction.
-- Each packet has a unique identifier that enables response packets to be directed to the correct
originator.
 The Packets may also have attributes such as No Snoop and Relaxed Ordering.

 The transaction Layer supports four address spaces


-- It includes the three PCI address spaces (memory, I/O, and configuration) and,
-- Also adds a Message Space. This specification uses Message Space support all prior sideband
signals, such as interrupts, power-management requests, and so on, as in-band Message
transactions.

15
PCIe Data Link Layer
• The primary responsibilities of the Data Link Layer include Link management and data integrity,
including error detection and error correction.
• The transmission side of the Data Link Layer accepts TLPs assembled by the Transaction Layer,
calculates and applies a data protection code and TLP sequence number, and submits them to
Physical Layer for transmission across the Link.
• The receiving DLL is responsible for checking the integrity of received TLPs and for submitting them
to the TL for further processing.
• On detection of TLP error(s), DL Layer is responsible for requesting retransmission of TLPs until
information is correctly received, or the Link is determined to have failed.
• DLL also generates and consumes packets that are used for Link management functions called
Data Link Layer Packet (DLLP).

16
PCIe Phy Layer
• The interface between the digital controller and the PHY is called the PIPE (Physical interface for
PCI Express)
• Interface initialization, maintenance control, and status tracking
• Reset/Hot-Plug control/status
• Interconnect power management
• Width and Lane mapping negotiation
• Polarity reversal
• Symbol and special ordered set generation:
• 8-bit/10-bit encoding/decoding
• Embedded clock tuning and alignment
• Symbol transmission and alignment
• Transmission circuits
• Reception circuits
• Elastic buffer at receiving side
• Multi-Lane de-skew (for widths > x1) at receiving side

17
PCIe Transaction
Terminology:
Requester: It is a device that originate a transaction.
Completer: It is the targeted device. It can return upto 4KB of data per Completion packet.

Port: it is the interface between PCIe and link.

Transaction: A series of packet transmission to complete the information transfer bet. requester &

completer.

Group of transaction:-
Memory
I/O
Configuration
Message
Type of transaction:-

Posted
Non-posted
Completion

18
PCIe Transaction Contd

19
TRANSACTION LAYER

20
Generic TLP Header

21
Generic TLP Header Contd
Fmt[0]: 0: 3DW Header 1: 4DW Header
Fmt[1]: 0: Transaction does not contain data
1: Transaction contain data
Type: 5'b0_0000: Memory read/write Request
5'b0_0001: Memory Read Locked Request
5'b0_0010: IO Read/Write Request
5'b0_0100: Config Type0 read/write request
5'b0_0101: Config Type1 read/write request
5'b0_1010: Completion TLP with/with-out data
5'b0_1011: Completion-Locked with/with-out data
5'b1_0RRR: Message TLP (RRR represent routing subfield)

22
Generic TLP Header Contd

23
Generic TLP Header Contd
Error poison(EP[0]):
If set, the data accompanying this data should be considered invalid although the transaction is
being allowed to complete normally.
Error Forwarding is only used for Read Completion Data or Write Data, never for the cases when
the error is in the “header” because header errors cannot be forwarded in general since true
destination cannot be positively known and, therefore, forwarding may cause direct or side effects such
as data corruption, system failures, etc.
Error Forwarding is used for controlled propagation of errors through the system, system
diagnostics, etc.
Receipt of a Poisoned TLP is a reported error associated with the Receiving Function.
A Poisoned Configuration Write Request must be discarded by the Completer, and a Completion
with a Completion Status of UR is returned.
A Poisoned I/O or Memory Write Request, or a Message with data (except for vendor-defined
Messages), that addresses a control register or control structure in the Completer must be handled as
an Unsupported Request (UR) by the Completer.
A Switch must route a Poisoned TLP with data in the same way it would route the same Request if it
were not poisoned, unless the Request addresses the Switch itself, in which case the Switch is the
Completer for the Request and must follow the above rule.

24
Generic TLP Header Contd
Here are some examples of cases where Error Forwarding might be used:
 A read from main memory encounters un-correctable error.
 Data integrity error on an internal data buffer or cache.

Attr[1:0]: The Attributes field is used to provide additional information that allows modification of the
default handling of Transactions. These modifications apply to different aspects of handling the
Transactions within the system
-- Attr[0]: NS: No Snoop
1) It may be used when accessing system memory.
2) PCI-X bus masters can use the NS bit to indicate whether the region of memory being
accessed is cacheable (NS=0) or not (NS=1).
3) If Transactions with NS bit set, then the host bridge does not snoop the processor cache.
The result is improved system performance during access to non-cacheable memory.
clear: hardware enforced cache coherency expected (default)
set: hardware enforced cache coherency not expected
-- Attr[1]: RO: Relaxed Ordering
 Transactions with the RO bit set can complete on the bus in any order with respect to other
transactions that are pending completion.
 Transactions with RO bit set is not part of the PCI based strong ordering model.
clear: Default Ordering model: PCI strongly ordering model
set: Relaxed Ordering model: PCI-X Relaxed ordering model. 25
Generic TLP Header Contd
Length[9:0]:
-- TLP data payload transfer size in DW.
10'h1: 1DW
10'h2: 2DW
10'h3FF: 1023DW
10'h0: 1024DW
First DW BE[3:0]:
-- These four high-true bits map one-to-one to the bytes with-in the first double word of the payload.
Last DW BE[3:0]:
-- These four high-true bits map one-to-one to the bytes with-in the last double word of the payload.
-- If the Length field is 1DW then:
First DW BE fields can be any value and Last DW BE field must be 0.
-- If the Length field is greater than 1DW then:
First DW and last DW BE fields must not be 0.
-- If the Length field is 2DW and address is Quad-Word aligned:
Non-contiguous BE are permitted in both Byte Enables fields.
-- If the Length field is more then 2DW or Length is 2DW with address is not Quad-Word aligned:
Byte enable fields are contiguous with the data between the first and last DW of the Request.

26
Generic TLP Header Contd
-- A Write Request with a length of 1 DW with no bytes enabled is permitted, and has no effect at
the Completer.
-- If a Read Request of 1 DW specifies that no bytes are enabled to be read the corresponding
Completion must specify a Length of 1 DW, and include a data payload of 1 DW ( The contents of the
data payload within the Completion packet is unspecified and may be any value).

27
Request Types
Transaction types Routing Types Basic Usage

Memory Read32, Read64, Non-posted, Non- Address based Transfer data to or


Write32, Write64 posted, posted, from Memory mapped
posted location

IO Read32, Write32 Non-posted Address based Transfer data to or


from IO mapped
location

Configuration Read(Type0), Non-posted ID based Device function


Read(Type1), configuration or setup
Write(Type0),
Write(Type1)
Message Base line including posted Implicit From event signalling
vendor defined mechanism to
general purpose
messaging

28
Packet Generic Header Details
Packet Type Fmt Type TC ATTR AT Length
MemRD32 00 00000 XXX XX XX X
MemRD64 01 00000 XXX XX XX X
MemWR32 10 00000 XXX XX XX X
MemWR64 11 00000 XXX XX XX X
IORD 00 00010 000 00 00 1
IOWR 10 00010 000 00 00 1
CfgRd0 00 00100 000 00 00 1
CfgWr0 10 00100 000 00 00 1
CfgRd1 00 00101 000 00 00 1
CfgWr1 10 00101 000 00 00 1
Completion 00 01010 RSVD
Cplwith data 10 01010

29
Split Transaction
 PCI: If the completer is unable to return the requested data immediately then, it signals a retry
transaction.
 PCI-X: If the completer is unable to return the requested data immediately then the completer
memorizes the transaction(address, transaction type, byte count, requester ID) and signals a split
response.
-- This prompts the requester to end the bus cycle and the bus goes to idle.
-- The bus is now available for other transactions , resulting in more efficient bus utilization.
-- The requester simply waits for the completer to supply it the requested data at a later time.
-- Once the completer has gathered the requested data, it then arbitrates and obtains the bus
ownership and initiates a split completion bus cycle during which it returns the requested data. The
requester claims the split completion bus cycle and accepts the data from the completer.
 Split transaction increased transfer efficiency of 85% for PCI-X as compared to 50%-60% with PCI
protocol.

30
Transaction Types
Posted Transaction:
The write request including data is sent and the transaction is over from the requester's perspective

as soon as the request is sent out of the egress port; responsibility for delivery is now the problem of
the next device. No completion is sent or expected.
Advantage: higher performance

Dis-advantage: failure of the transaction is taken care by the higher layer protocol.

Non-Posted Transaction:
 Completion is expected.

 Memory read transaction follow the split transaction protocol.

Memory (Write) and message transaction is considered as posted transaction while IO and
configuration transaction is non-posted because these transactions may change the device behavior
and so a completion will always be sent to report the status of IO or configuration write transaction.

31
Memory Transaction
 Memory transfers are never permitted to cross a 4KB boundary.
 Either 32 bits or 64 bits addressing may be used. The 3DW header format supports 32 bit address
and 4DW header format supports 64bit address.
 Memory Read request length must not exceed the value specified in the read_request_size in the
PCIe capability structure.
 Memory write request length must not exceed the value specified in the max_payload_size in the
PCIe capability structure.

Requester-ID[15:0]:
{Bus_Number[7:0], Device_number[4:0], Function_Number[2:0]}
-- Identifies the requester so a completion may be returned.

Tag[7:0]:
-- It is generated by each Requestor, and it must be unique for all outstanding Requests that
require a Completion for that Request.
-- For Requests that do not require a Completion (Posted Requests), the value in the Tag[7:0] field
is undefined and may contain any value.
-- Default: only bits[4:0] are used (32 outstanding transactions at a time).
-- If the extended tag bit in the PCIe control register is set then all 8 bits may be used. 32
Memory Request Transaction

33
Memory Read Locked
 Only RC is allowed to initiate locked request on PCIe.
 PCIe EP do not support lock and must treat a MrdLk Request as an Unsupported Request.
 Legacy EP are permitted to support locked accesses, although their use is discouraged.
 Locked transaction sequences are generated by the Host CPU(s) as one or more reads followed by
a number of writes to the same location(s). When a lock is established, all other traffic is blocked
from using the path between the RC and the locked Legacy EP or Bridge.
 All writes for the locked sequence use MWr Requests.
 Locked Requests which are completed with a status other than Successful Completion do not
establish lock.
 If any read associated with a locked sequence is completed unsuccessfully, the Requester must
assume that the atomicity of the lock is no longer assured, and that the path between the Requester
and Completer is no longer locked.
 Regardless of the status of any of the Completions associated with a locked sequence, all locked
sequences and attempted locked sequences must be terminated by the transmission of an Unlock
Message.
 Only one locked transaction sequence attempt may be in progress at a given time within a single
hierarchy domain.

34
Memory Read Locked Contd
 The Unlock Message is sent from the Root Complex down the locked transaction path to the
Completer, and may be broadcast from the Root Complex to all Endpoints and Bridges (Any device
which is not involved in the locked sequence must ignore this Message).
 Locked accesses are limited to TC0, which is always mapped to VC0.
 When a Switch propagates a MRdLk Request from the Ingress Port (closest to the RC) to the Egress
Port, it must block all Requests which map to the default Virtual Channel (VC0) from being
propagated to the Egress Port.
 When the CplDLk for the first MRdLk Request is returned, if the Completion indicates a Successful
Completion status, the Switch must block all Requests from all other Ports from being propagated to
either of the Ports involved in the locked access, except for Requests which map to non-VC0 on the
Egress Port.
 The two Ports involved in the locked sequence must remain blocked until the Switch receives the
Unlock Message.

Message code Routing

Unlock Message 0000_0000 011

35
IO Transaction

36
Configuration Transaction
 Register number[5:0]
 Extended Register Number[3:0]
 The extended register number is used in conjunction with the Register number to provide the full 10-
bits of offset needed for the 1024DW PCIe configuration space.

37
Completion TLP
 Completions are returned for each non-posted TLP.
 Completions route by ID, and use a 3 DW header.
 The routing ID fields correspond directly to the Requester ID supplied with the corresponding
Request. Thus for Completions these fields will be referred to collectively as the Requester ID
instead of the distinct fields used generically for ID routing.
 Completion headers must supply the same values for the Requester ID, Tag, Attribute, and Traffic
Class as were supplied in the header of the corresponding Request.
 A Completion including data must specify the actual amount of data returned in that Completion, and
must include the amount of data specified.

38
Completion TLP Contd
The amount of data specified in the completion must not be more than Max_payload_size of the
PCIe capability structure.

BCM-Byte count modified


Must not set by PCIe Completers, and may only be set by PCI-X completers.
The BCM bit will never be set in subsequent packets of the Read Completion.

Byte Count[11:0]-
Indicate the remaining number of bytes required to complete the Request including the number
of bytes returned with the Completion, except when the BCM field is 1b.
If a Memory Read Request is completed using multiple Completions, the Byte Count value for
each successive Completion is the value indicated by the preceding Completion minus the
number of bytes returned with the preceding Completion.
When the BCM bit set, indicating that the Byte Count field reports the size of just that first
packet instead of the entire remaining byte count.
For memory read completion the byte count field must from 1 to 12'hFFF, and for all other types
of Completions, the Byte Count field must be 4.
39
Completion TLP Contd
Lower Address Field:
For memory read request completion it indicate the lower bits of the byte address for the first
enabled byte of data returned with the Completion.
For the first (or only) Completion, the Completer can generate this field from the least significant 5
bits of the address of the Request concatenated with 2 bits of byte-level address.
For any subsequent Completions, the Lower Address field will always be zero except for
Completions generated by a Root Complex with an RCB value of 64 bytes. In this case the least
significant 6 bits of the Lower Address field will always be zero and the most significant bit of the Lower
Address field will toggle according to the alignment of the 64-byte data Payload.

Completion status:
000: Successful Completion(SC)
001: Unsupported Request (UR)
010: Configuration Request Retry Status (CRS)
100: Completer Abort (CA)
Others: Reserved
40
Completion TLP Contd
Completer Abort:
The completer is offline due to an error much like target abort in PCI.
If the Request violates the programming model of the device Function, the Function may optionally
treat the Request as a Completer Abort and if the Request requires Completion, a Completion Status of
CA is returned.
Examples include unaligned or wrong-size access to a register block and unsupported size of
request to a Memory Space.

CRS:
It indicates the target was temporarily offline and the attempt should be retried.
Ex: initialization delay after reset

UR:
It indicates the original request failed at the target because it targeted an unsupported address,
carried an unsupported address or request etc.
Examples:
If the Request Type is not supported (by design or because of configuration settings) by the device,
the Request is an Unsupported Request and if the Request requires Completion, a Completion Status
of UR is returned. 41
Completion TLP Contd
Examples:
If the Request is a Message, and the Message Code specifies a value that is undefined, or that
corresponds to a Message not supported by the device Function, (other than Vendor_Defined Type 1
which is not treated as an error), the Request is an Unsupported Request
If the Request arrives between the time an FLR(Function Level Reset) has been initiated and the
completion of the FLR by the targeted Function, the Request is permitted to be silently discarded
(following update of flow control credits) without logging or signalling it as an error. It is recommended
that the Request be handled as an Unsupported Request (UR).
Completions with a Reserved Completion Status value are treated as if the Completion Status was
Unsupported Request.

SC:
It indicates the original request completed properly at the target.
The Completion Status for a Completion corresponds only to the status associated with the data
returned with that Completion
A Completion with status other than Successful Completion terminates the Completions for a single
Read Request

42
Completion TLP Contd
Individual Completions for Memory Read Requests may provide less than the full amount of data
Requested so long as all Completions for a given Request when combined return exactly the amount of
data Requested in the Read Request.
Completions for different Requests cannot be combined.
I/O and Configuration Reads must be completed with exactly one Completion.
Completions must not include more data than permitted by the Max_Payload_Size parameter.

RCB: Read completion boundary


Determines the naturally aligned address boundaries on which a Read Request may be serviced with
multiple Completions.
Completions for Requests which do not cross the naturally aligned address boundaries at integer
multiples of RCB bytes must include all data specified in the Request.
Requests which do cross the address boundaries at integer multiples of RCB bytes may be
completed using more than one Completion, and in this case the data must be fragmented an address
boundary between the start and end of the Request at an integer multiple of RCB bytes.
All Completions between, but not including, the first and final Completions must be an integer multiple
of RCB bytes in length.

43
Completion TLP Contd
Example:
MemRead Request with Address of 1 0020h and Length of 100h bytes could be completed by a
RC with an RCB value of 64 bytes in one of the following combinations of Completions (bytes):
 256 –or–
 32, 224 –or–
 32, 64, 160 –or–
 32, 64, 64, 96 –or–
 32, 64, 64, 64, 32 –or–
 32, 64, 128, 32 –or–
 32, 128, 96 –or–
 32, 128, 64, 32 –or–
 96, 160 –or–
 96, 128, 32 –or–
 96, 64, 96 –or–
 96, 64, 64, 32 –or–
 160, 96 –or–
 160, 64, 32 –or–
 224, 32
44
Completion TLP Contd
Unexpected completion:
The Transaction ID of the completion does not match any of the outstanding Requests issued by that
device is considered as a UC.
If a received Completion is not malformed and matches the Transaction ID of an outstanding
Request, but in some other way does not match the corresponding Request, it is permitted for the
receiver to handle the Completion as a UC.
The agent receiving an Unexpected Completion must discard the Completion.

45
Message Transaction
Fmt Type TC ATTR Length

INTx 00 10RRR 000 RSVD RSVD

PM 00 10RRR 000 RSVD RSVD

Error Signalling 00 10RRR 000 RSVD RSVD

Locked 00 10RRR 000 RSVD RSVD


Transaction

Slot power limit 10 10RRR 000 RSVD 1

Vendor defined x1 10[000-100] XXX XXXX XXXX

46
Message Transaction Contd
Implicit Routing: No ID or address specifies the destination but rather the destination is implied by
the routing type(TYPE[2:0])

R[2:0] Description of Routing Type Bytes 8 through 15


000 Routed to Root Complex(RC) RSVD
001 Routed by address
010 Routed by ID
011 Broadcast from RC RSVD
100 Local - Terminate at Receiver RSVD
101 Gathered and routed to RC RSVD
110-111 Reserved - Terminate at Receiver RSVD
47
Interrupt Message
 The Assert_INTx/Deassert_INTx Message pairs constitute four “virtual wires” for each of the legacy
PCI interrupts designated A, B, C, and D.
 The components at both ends of each Link must track the logical state of the four virtual wires using
the Assert/Deassert Messages to represent the active and inactive transitions (respectively) of each
corresponding virtual wire.
 When the local logical state of an INTx virtual wire changes at an Upstream Port, the Port must
communicate this change in state to the Downstream Port on the other side of the same Link using
the appropriate Assert_INTx or Deassert_INTx Message.
 If a Downstream Port goes to DL_Down status, the INTx virtual wires associated with that Port must
be deasserted, and the Upstream Port virtual wire state updated accordingly. If this results in de-
assertion of any Upstream INTx virtual wires, the appropriate Deassert_INTx Message(s) must be
sent by the Upstream Port.
 Virtual and actual PCI to PCI Bridges must map the virtual wires tracked on the secondary side of
the Bridge according to the Device Number of the device on the secondary side of the Bridge.
 Switches must track the state of the four virtual wires independently for each Downstream Port,and
present a “collapsed” set of virtual wires on its Upstream Port.
 The RC must track the state of the four INTx virtual wires independently for each of its Downstream
Ports, and map these virtual signals to system interrupt resources.

48
Interrupt Message Contd
 If a Downstream Port of the Root Complex goes to DL_Down status, the INTx virtual wires associated
with that Port must be de-asserted, and any associated system interrupt resource request(s) must be
discarded.

INTx Message Message code Routing[2:0]


Assert_INTA 0010_0000 100
Assert_INTB 0010_0001 100
Assert_INTC 0010_0010 100
Assert_INTD 0010_0011 100
Deassert_INTA 0010_0100 100
Deassert_INTB 0010_0101 100
Deassert_INTC 0010_0110 100
Deassert_INTD 0010_0111 100

49
Power Management Message
PM Message Message code Routing Purpose

PM_Active_State_NAK 0001_0100 100 ASPM L1 reject by the upstream component

PM_PME 0001_1000 000 Activate the link from software driven low power
state.
PME_Turn_Off 0001_1001 011 Sent by RC for removal of main power

PME_To_Ack 0001_1011 101 Sent by DownStream component, when it is ready for


removal of main power.

50
Error Message
 Error messages are sent upstream by enabled devices that detect correctable, non-fatal uncorrectable
and fatal uncorrectable error.
 The device detecting the error is defined by the requester-id field in the message header.
 The root complex converts error messages into system specific events.

Error message Message code Routing

ERROR_CORRECTANLE 0011_0000 000

ERROR_NONFATAL 0011_0001 000

ERROR_FATAL 0011_0011 000

51
Slot Power Limit Message
 The message is sent from a downstream switch or RC to the upstream port of a switch or device
attached to it.
 The message is sent automatically anytime the link transitions to DL_Up status or if a configuration
write to the slot capabilities register occur when the data link layer reports DL_Up status.
 If a card in a slot consumes less power than the power limit specified for the card/form factor, it may
ignore the message.

Mesage_code Routing

set_slot_power_limit 0101_0000 100

52
VIRTUAL CHANNEL
 Provides support for carrying, throughout the fabric, traffic that is differentiated using TC labels.
 Traffic is associated with VCs by mapping packets with particular TC labels to their corresponding
VCs.
 To allow performance/cost trade-offs, PCI Express provides the capability of mapping multiple TCs
onto a single VC.
 Support for TCs and VCs beyond default TC0/VC0 pair is optional. The association of TC0 with
VC0 is fixed, i.e., “hardwired,” and must be supported by all components.
 Conceptually, traffic that flows through VCs is multiplexed onto a common physical Link resource
on the Transmit side and de-multiplexed into separate VC paths on the Receive side.
 VC ID assignment must be unique per Port – The same VC ID cannot be assigned to different VC
hardware resources within the same Port.
 VC ID assignment must be the same (matching in the terms of numbers of VCs and their Ids) for
the two Ports on both sides of a Link.
 VC ID 0 is assigned and fixed to the default VC.

53
VIRTUAL CHANNEL Contd
 TC to VC mapping:
1) Every Traffic Class that is supported must be mapped to one of the Virtual Channels. The mapping
of TC0 to VC0 is fixed.
2) One TC must not be mapped to multiple VCs in any Port or Endpoint Function.
3) TC/VC mapping must be identical for Ports on both sides of a Link.
 TC and VC rules:
1) All devices must support the general purpose I/O Traffic Class, i.e., TC0 and must implement the
default VC0.
2) Each Virtual Channel (VC) has independent Flow Control.
3) There are no ordering relationships required between different TCs.
4) There are no ordering relationships required between different VCs.
5) A Switch’s peer-to-peer capability applies to all Virtual Channels supported by the Switch.
6) Transactions with a TC that is not mapped to any enabled VC in an Ingress Port are treated as
Malformed TLPs by the receiving device.
7) For Switches, transactions with a TC that is not mapped to any of the enabled VCs in the target
Egress Port are treated as Malformed TLPs.
8) Switches must support independent TC/VC mapping configuration for each Port.
9) A RC must support independent TC/VC mapping configuration for each RCRB, the associated Root
Ports, and any Root Complex Integrated Endpoints. 54
Transaction Ordering
 Why do we need transaction ordering?
 Ensuring that the completion of transactions is deterministic and in the sequence intended by the

programmer.
 Avoiding deadlock conditions.

 Maintaining compatibility with ordering rules already used in legacy devices.

 Maximize performance and throughput by minimizing read latencies and maximizing read/write

ordering.

 Rules of transaction ordering:


 The No Snoop bit does not affect the required ordering behavior.

 The ordering rules apply within a single Traffic Class (TC). There is no ordering requirement

among transactions with different TC labels. This also implies that there is no ordering required
between traffic that flows through different Virtual Channels since transactions with the same TC
label are not allowed to be mapped to multiple VCs on any PCIe Link.
 A Posted Request with the RO (Relaxed Ordering) Attribute bit clear (0b) must not pass any other

posted Request.
 A Posted Request with the RO Attribute bit set (1b) is permitted to pass any other Posted Request.

 A Posted Request must be allowed to pass Non-Posted Requests to avoid deadlocks.

55
Transaction Ordering Contd
 Posted Requests may be allowed to pass Completions or be blocked by Completions.
 Non-Posted(NP) request can not pass a Posted request.
 NP Requests are permitted to be blocked by or to pass other NP Requests.
 NP Requests are permitted to be blocked by or to pass Completions.
 If the Relaxed Ordering attribute bit is not set, then a Read Completion cannot pass a previously
enqueued Posted Request.
 If the Relaxed Ordering attribute bit is set, then a Read Completion is permitted to pass a previously
enqueued Posted Request.
 Completions must be allowed to pass Non-Posted Requests to avoid deadlocks.
 Read Completions associated with different Read Requests are allowed to be blocked by or to pass
each other.
 Read Completions are permitted to be blocked by or to pass I/O or Configuration Write Completions.
 I/O or Configuration Write Completions are permitted to be blocked by or to pass Memory Write and
Message Requests. Such Transactions are actually moving in the opposite direction and, therefore,
have no ordering relationship.
 I/O or Configuration Write Completions are permitted to be blocked by or to pass Read Completions
and other I/O or Configuration Write Completions.
 If a single write transaction containing multiple DWORDs and the Relaxed Ordering bit clear is
accepted by a Completer, the observed ordering of the updates to locations within the Completer's data
buffer must be in increasing address order. 56
ECRC
 If the TD field is set then the requester generates a 32 bits ECRC and appends it to the end of the TLP.
 Switches must pass TLPs with ECRC unchanged from the Ingress Port to the Egress Port.
 If a device supports ECRC generation/checking, at least one of its Functions must support Advanced
Error Reporting capability structure.
 If a device Function is enabled to check ECRC, it must do so for all TLPs with ECRC where the device
is the ultimate PCI Express Receiver (Note that it is still possible for the Function to receive TLPs
without ECRC, and these are processed normally – this is not an error)

ECRC calculation:
 The polynomial used has coefficients expressed as 04C1 1DB7h.
 The seed value (initial value for ECRC storage registers) is FFFF FFFFh.
 All invariant fields of the TLP header and the entire data payload (if present) are included in the
ECRC calculation, all bits in variant fields must be set to 1b for ECRC calculations. (bit 0 of the
Type field and EP field are variant)
 ECRC calculation starts with bit 0 of byte 0 and proceeds from bit 0 to bit 7 of each byte of the TLP.
 The result of the ECRC calculation is complemented, and the complemented result bits are
mapped into the 32-bit TLP Digest field.

57
ECRC Contd
 The 32-bit ECRC value is placed in the TLP Digest field at the end of the TLP.
 For TLPs including a TLP Digest field used for an ECRC value, Receivers which support end-to-
end data integrity checking, check the ECRC value in the TLP Digest field by:
1) Applying the same algorithm used for ECRC calculation to the received TLP, not including the
32-bit TLP Digest field of the received TLP.
2) Comparing the calculated result with the value in the TLP Digest field of the received TLP.
 Receivers which support end-to-end data integrity checks report violations as an ECRC Error.
 Intermediate Receivers are still required to forward TLPs whose ECRC checks fail.

58
Completion Timeout Mechanism
 PCI Express device Functions that issue Requests requiring Completions must implement the
Completion Timeout mechanism. The Completion Timeout mechanism is activated for each Request
that requires one or more Completions when the Request is transmitted.
 Since Switches do not autonomously initiate Requests that need Completions, the requirement for
Completion Timeout support is limited only to Root Complexes, PCI Express-PCI Bridges, and
Endpoints.
 A Memory Read Request for which there are multiple Completions must be considered completed
only when all Completions have been received by the Requester. If some, but not all, requested data
is returned before the Completion Timeout timer expires, the Requester is permitted to keep or to
discard the data that was returned prior to timer expiration.

59
MSI
MSI: Message Signal Interrupt

MSI mechanisms use Memory Write Requests to represent interrupt Messages. The Request
format used for MSI transactions is identical to the Memory Write Request format.
MSI Requests are indistinguishable from memory writes with regard to ordering, Flow Control, and

data integrity.
Attr[1:0] must be 0.

TC value can be TC0 to TC7.

Message control register


RSVD Per vector 64-bit address Multiple Multiple MSI Enable


[15:9] masking capable[7] message enable message
capable[8] [6:4] capable[3:1]

60
Multiple message enable:
MSI Contd
Software writes to this field to indicate the number of allocated vectors (equal to or less than the
number of requested vectors).
 Number of allocated vectors: 2**(multiple message enable)
MSI Enable:
 If set the function is permitted to use MSI to request service and is prohibited from using its INTx# pin
(if implemented).
Message Data:
 System-specified message data.
 If the Message Enable bit is set, the message data is driven onto the lower word (AD[15::00]) of the
memory write transaction’s data phase. AD[31::16] are driven to zero during the memory write
transaction’s data phase.
 The Multiple Message Enable field defines the number of low order message data bits the function is
permitted to modify to generate its system software allocated vectors.
 For example, a Multiple Message Enable encoding of “010” indicates the function has been allocated
four vectors and is permitted to modify message data bits 1 and 0 (a function modifies the lower message
data bits to generate the allocated number of vectors). If the Multiple Message Enable field is “000”, the
function is not permitted to modify the message data.
Mask bits:
 For each Mask bit that is set, the function is prohibited from sending the associated message. 61
Address Routing Mechanism

62
Address
Routing
Mechanism
Contd

63
Address
Routing
Mechanism
Contd

64
Address
Routing
Mechanism
Contd

65
Address Routing Mechanism Contd

66
Address
Routing
Mechanism
Contd

67
Address
Routing
Mechanism
Contd

68
ID based Routing
 Primary bus:
The bus connected to the upstream side of a bridge is referred to as its primary bus. In PCIe, the
primary bus is the one in the direction of the Root Complex and host processor.
 Secondary bus:

The bus connected to its downstream side of a bridge is referred to as its secondary bus.
 Sub-ordinate bus:

The bus connected to its furthest


downstream side of a bridge is
referred to as its sub-ordinate bus.

The Subordinate and Secondary Bus


Number registers will contain the same
value unless there is another bridge
(switch) on the secondary side.

69
PCIe Enumeration

Header Type:
 0X00: The device is not a pci bridge (Type 0 header)
 0X01: The device is a pci-to-pci bridge (Type 1 Header)
 0X02: The device is a pci card bus bridge (Type 2 header)

Capability register device or port_type field encoding:


 0000: PCIe EP
 0001: Legacy PCIe EP
 0100: Root port of PCIe RC
 0101: Upstream port of PCIe switch /
 0110: Downstream port of PCIe switch
 0111: PCIe to PCI/PCI-X bridge
 1000: PCI/PCI-X to PCIe bridge 70
PCIe
Enumeration
Contd

71
PCIe
Enumeration
Contd

72
Data Link Layer

73
Features
Function of the data link layer...

Flow control: this avoids over-writing the receiver’s buffer by regulating the
amount of data that can be sent.

Error control: it checks the CRC to ensure the correctness of the frame. If
incorrect, it asks for retransmission. Multiple schemes are present-ACK,NAK, go-
back-N protocol.

74
Generic DLLP Packet Format

1. ACK/NAK DLLPs
2. Flow control DLLPs
3. Vendor specific DLLPs
4. Power management DLLPs

75
Flow Control
Each device implements credit based link flow control for each virtual channel on each port.

FC guarantees that transmitter will never send TLP something that the receiver can't accept. This avoids
receiver buffer over-flow, so retries of TLPs and wait-states on the link will not occur.

FC operates independently for each of the below six buffers..


1. non-posted header
2. non-posted data ( valid only for VC0)
3. posted header
4. posted data
5. completion header
6. completion data

Flow control credits:


Header FC credits: maximum header size + digest
for completion: 4DW
for requests: 5DW
Data FC credits: 4DWs
76
Flow control DLLPs
 DataFC – This field contains the credits associated with the data storage. This is updated in
the flow control counter for the virtual channel indicated in V[2:0] and traffic type indicated by
Byte0 bits 7:4
1 data credit = 16 bytes

 HdrFC – This field contains the credits associated with the header storage. This is updated in
the flow control counter for the virtual channel indicated in V[2:0] and traffic type indicated by
Byte0 bits 7:4
1 header credit = 1 header + digest

•0100: INIT1-Posted
•0101: INIT1-Non-Posted
•0110: INIT1-Completion
•1100: INIT2-Posted
•1101: INIT2-Non-Posted
•1100: INIT2-Completion
•1000: Update-Posted
•1001: Update-Non-Posted
•1010: Update-Completion 77
Flow control DLLP Contd...
VC Flow Control Buffer Organization
•Posted Header
•Posted Data
•Non-Posted Header
•Non-Posted Data
•Completion Header
•Completion data

78
Flow control DLLP Contd...
Flow control mechanism

79
Flow control DLLP Contd...
Flow control protocol(initialization)

Transmitter
Pending transaction Buffer : stores transactions that are pending within the same virtual
channel.
Credit consumed counter : tracks the size of all transactions sent from the VC buffer in flow
control credits.
Credit limit register : Initialized with credit limit of corresponding flow control receive buffer
Flow control gating logic : Performs the calculation to determine if the receiver has sufficient
flow control credits to receive pending TLPs.

Receiver
FC receive buffer : stores incoming header and data
Credit allocated : tracks the total FCC that have been made available since initialization .
Credit receive counter : Tracks the total size of all data received from transmitting device and
placed into flow control buffer.

80
Flow Control – When can the Tx send a TLP ?
Each device implements credit based link flow control for each virtual channel on each port.

CL - (CC + PTLP)%(2**Field_Size) <= 2**(Field_Size-1)

CL: credit limit--> the receiver sends update flow control information in a periodic time interval to let the transmitter
know the available buffer space.
CC: credit consumed--> the buffer space consumed by the transmitter
PTLP: pending TLP--> buffer space needed for the pending TLP

The above equation ensures unique results from the unsigned arithmetic. for example unsigned 2's complement
subtraction yields the same result for both 0-127 and 255-127. To ensure this kind of conflict do not occur, the
maximum number of unused credits that can be reported is limited to 2**(Field_Size-1), 128 credits for header and
2048 credits for data.

InitFC1: this DLLP represents the initial buffer space available for the corresponding VC.
If receiver has infinite flow control for a particular buffer then the corresponding field will be zero in InitFC1.
Non-posted data is only available for VC0, so rest of the VCs (VC1-VC7) must have to show infinite FC
during initialization process.
InitFc2: This DLLP represents that the FC1 state completed successfully and contains the same value ad InitFC1. It
also means that confirmation of successful FC initialization.
81
Flow Control – When can the Tx send a TLP ? (Cont’d)
FC initialization sequence:
1. Each receive interface sends FC credit information to the other end (InitFC1).
2. After receiving InitFC1, the receive updates their corresponding CL counter.
3. After both sides receives InitFC1, sends InitFC2 DLLP packets.

NOTES:
1. VC0 is hardware initialized after LinkUp happens so need to send continuously InitFC1 and InitFC2 DLLP at the
maximum rate possible.
2. VC1-VC7 is software initialized so that the InitFC1 and InitFC2 DLLP will be sent when no other TLP or DLLP is
scheduled for transmission and at least once in 17USec time interval.
3. Update FC DLLP must not be sent for infinite FC. it means that if both the header and data portion shows infinite
flow control for a corresponding VC then the receiver may not send update DLLP for the corresponding DLLP.
If receiver sends only header or data is infinite flow control during InitFC1 then it needs to send update FC
periodically as required with the finite header/data credit updated and infinite header/data as 0.
when the receive FC header limit reached zero(the other side is blocked because of lack of FC header credits)
and then it reaches one it needs to update the FC immediately.
when the receive data FC limit is below MPL (maximum payload size) and when it becomes above the MPL,
receiver needs to send the Update-FC DLLP immediately.

82
ACK/NAK DLLPs
 Positive acknowledge DLLPs for good TLPs received use sequence number =
NEXT_REC_SEQ
 Negative acknowledge DLLPs for TLPs that failed LCRC check use sequence number =
NEXT_REC_SEQ - 1

83
go-back-n Protocol
1) maximum size of the sending window is (2**N) -1
• N = number of bits used in the sequence number
2) It only accepts segments that arrive in sequence and discards any out-of-sequence segment that it
receives.
3) the acknowledgment is cumulative.
• When it sends an ACK for a sequence number X, it implicitly acknowledgment the reception
of all segments whose sequence number is earlier than X.
4) The sender uses a sending buffer that can store an entire sliding window of segments. The
segments are sent with increasing sequence number (module maxseq_number). The sender must
wait for an ACK once its sending buffer is full. When it receives an ACK, it removes from the
sending buffer all the ACK segments and uses a re-transmission timer to detect segment losses.
5) A sender maintains one transmission timer per connection. The timer is started when the first
segment is sent and when the sender receives an ACK, it restarts the timer only if there are still
unacknowledged segments in the sending buffer.
6) go-back-n provides good performance only when a few segments are lost. However when there
are many losses, the performance quickly drops because:
• the receiver does-not accept out-of sequence segments.
• The sender re-transmits all unacknowledged segments once it has detected a loss.
7) Receiving sliding window is 1.
8) Maximum sending sliding window is (2**N) -1.
84
ACK/NAK DLLP Contd
 ACK DLLPs are used to confirm the successful reception of one or more TLPs
 NAK DLLPs are used to confirm the error reception of one or more TLPs. Retransmission of
TLPs required here.

85
ACK/NAK DLLP
Elements of ACK/NAK

86
Transmitter Elements of the ACK/NAK protocol
1. NEXT_TRANSMIT_SEQUENCE:
--> this is a 12-bit counter, initialized to zero at reset or when the link in inactive.
--> the counter increments by one for each good TLP transmitted. (it will not increment for sending a nullified
TLP).
--> Data link layer must not accept new TLP from the transaction layer if there is a separation greater then
2047 (2**(FIELD_SIZE-1)) (here FIELD_SIZE is sequence number) between the sequence number of a TLP
being transmitted and that of a TLP in the replay buffer that receives an ACK/NAK DLLP. it also means that replay
buffer must not contains more than 2048 number of TLPs and it also means that number of out-standing TLPs
must not be more than 2048.

2. LCRC Generator
--> The 32-bits LCRC (link level CRC) is calculated using all the fields of the TLP including Header, data
payload, ECRC and sequence number.
--> If the TLP is a nullified TLP, then the transmitter inverts the LCRC and send it to the receiver using EDB
symbol instead of END symbol. The receiver checks the LCRC and if it does not match with the calculated LCRC
then it drops the packet and report it as a DLLP protocol error. If the receiver receives EDB symbol with the LCRC
inverted then it simply drops the packet with-out any error reported.

87
Transmitter Elements of the ACK/NAK protocol
(Cont’d)
3. Replay buffer:
The replay buffer stores TLPs with all fields including the data link layer sequence number and LCRC fields.
when the transmitter receives ACK DLLP, it purges the associated TLP from the replay buffer. when the
transmitter receives a NAK DLLP, it replays the contents of the replay buffer.

4. Replay_Num count:
ACK/NAK DLLP should be sent periodically. if the transmitter do not receive a ACK/NAK DLLP then it should
have to retransmit the contains of the replay buffer. if the timer expires and the receiver is receiving a TLP then
the transmitter should be intelligent enough not to retransmit the TLPs in the replay buffer due to ACK/NAK timer
expires, it should have to wait for the end of the TLP and then retransmit the contains of the replay buffer if it do
nor receive a ACK/NAK DLLP.
if the transmitter replay the contains of the replay buffer three times and ACK/NAK timer expires then the link
should be retrained by entering into the Recovery state and then retransmit the contains of the replay buffer.
During Replay, DLL must not accept new transactions from the transaction layer.

88
Receiver Elements of the ACK/NAK protocol
1. LCRC checker
--> If LCRC matches and sequence number matches with the expected NEXT_RECEIVE_SEQUENCE then the
TLP is considered as good TLP.
--> If the received LCRC is inverse of the calculated LCRC and the physical layer receives EDB instead of END
the TLP is considered as nullified TLP and it will be dropped.

2. NEXT_RECEIVE_SEQUENCE:
--> the 12-bit counter keeps track of the next expected TLP sequence number.
--> the counter is initialized to zero at reset or when the DLL is inactive.
--> the counter is incremented once for each good TLP received that is forwarded to the transaction layer.
--> the counter will not be incremented for nullified TLP.

2.1. Received TLP sequence number == NEXT_RECEIVE_SEQUENCE number


--> If the LCRC check passes the TLP will be considered as a good TLP and will forwarded to the transaction
layer.
-->In switch, In cut-through mode when the TLP is already started forwarding to the transaction layer and
LCRC check fails, the TLP is marked as nullified TLP by sending EDB and inverse of the calculated LCRC to the
other port.

89
Receiver Elements of the ACK/NAK protocol
(Cont’d)
2.2. (NEXT_RECEIVE_SEQUENCE - Received TLP sequence number) <= 2047
--> the TLP is considered as duplicate TLP.
--> Receiver will Send ACK DLLP immediately with ACK_NAK Sequence number as
(NEXT_RECEIVE_SEQUENCE - 1).

2.3. For any other case


--> the TLP will be discarded and a NAK will be returned with ACK_NAK Sequence number as
(NEXT_RECEIVE_SEQUENCE - 1).

3. NAK_SCHEDULED flag
--> the flag will be set when the receiver schedules a NAK DLLP to return to the remote transmitter.
--> the flag will be clear when the receiver sees the first TLP associated with the replay of a previously NAK TLP.

90
ACK/NAK DLLP Contd...
Transmitter's Response to an ACK DLLP

91
ACK/NAK DLLP Contd...
Transmitter's Response to a NAK DLLP

92
ACK/NAK DLLP Contd...
Receiver Behavior with Receipt of Good TLPs

93
Switch Cut-Through Mode
 cut-through is the ability to start streaming a packet through a switch with-out waiting for the receipt of the tail
end of the packet. If ultimately a CRC error is detected when the CRC is received at the tail end of the packet,
the packet that has already begun transmission from the switch egress port can be nullified.

 A nullified packet is a packet that terminates with an EDB symbol as opposed to an END. It also has an
inverted 32-bit LCRC.

94
Power Management DLLPs
DLLP Code PM DLLP Type Purpose
0X20 PM_Enter_L1 Sent by down stream component during entry into software
driven power management L1 state.

0X21 PM_ENTER_L23 Sent by down stream component during L2/L3 Entry process

0X23 PM_Active_State_Req_l1 Sent by down stream component during ASPM L1 Entry

0X24 PM_Req_ack Sent by upstream component to acknowledge the entry into


PM L1 state (both SW and ASPM based)

95
Vendor specific DLLPs

96
Priorities of the packets in DL layer
 Completion of any TLP or DLLP which is currently in progress
 NAK DLLP
 ACK DLLP
 Flow control DLLP
 Replay Buffer, re-transmission of TLPs
 TLPs that are waiting in the transaction layer
 Power management and vendor defined DLLPS

97
Physical layer

98
Physical Layer – Top Level

99
Features of PCIe Phy Layer:

Utilizes 8-bit or 16-bit parallel interface to transmit and receive PCIe data.
Allows integration of high speed components into a single functional block as seen by the endpoint

device designer.
Data and clock recovery from serial stream on the PCI Express bus.

Holding registers to stage transmit and receive data.

Supports direct disparity control for use in transmitting compliance pattern.

8b/10b encode/decode and error indication.

Receiver detection.

Beacon transmission and reception.

100
General Packet Format Rules
 TLPs always start with the STP character.
 DLLPs always start with SDP and are 8 characters long (6 characters + SDP + END)
 All TLPs terminate with either an END or EDB character.
 DLLPs terminate with the END character.
 The total packet length (including Start and End characters) of each packet must be a multiple of four
characters.
 STP and SDP characters must be placed on Lane 0 when starting the transmission of a packet after
the transmission of Logical Idles. If not starting a packet transmission from Logical Idle (i.e. back-to-
back transmission of packets), then STP and SDP must start on a Lane number divisible by 4.
 If a packet doesn't end on the last Lane and there are no more packet transmissions, PAD symbols
are transmitted on the Lanes above the Lane on which the END/EDB character is transmitted. This
keeps the Link aligned so that transmission of the Logical Idle sequence can start on all Lanes at the
same time.
 When an Logical Idle sequences and Ordered-Set such as the SKIP Ordered-Set is transmitted (SKIP
OS used for clock compensation in the receiver), it must be sent on all four Lanes simultaneously.
 Any violation of these rules may be reported as a Receiver Error to the Data Link Layer.

101
Scrambler
 Repetitive patterns result in large amount of energy concentrated in discrete frequencies which
results in significant EMI noise generated. By scrambling the transmitted data, repetitive patterns—
such as 10101010—are eliminated. As a result, no single frequency component of the signal is
transmitted for significant periods of time.
 On a multi-Lane Link with wires routed in close proximity, a scrambled transmission on one Lane
generates white noise which does not interfere or correlate with another Lane's data transmission.
 Polynomial: G(x) = X 16 + X 5 + X 4 + X 3 +1
 The initialized value of the 16-bit LFSR is FFFFh.
 On a multi-Lane Link implementation, Scramblers associated with each Lane must operate in
concert, maintaining the same simultaneous value in each LFSR.
 Scrambling is applied to 'D' characters associated with TLP and DLLPs, including the Logical Idle
(00h) sequence. 'D' characters within the TS1 and TS2 Ordered-Set are not scrambled.
 'K' characters and characters within Ordered-Sets—such as TS1, TS2, SKIP, FTS and Electrical Idle
Ordered-Sets—are not scrambled. These characters bypass the scrambler logic.
 Compliance Pattern related characters are not scrambled.

102
Scrambler Contd
 When a COM character exits the Scrambler, (COM does not get scrambled) it initializes the LFSR.
Similarly on the receiver side, when a COM character enters the De-Scrambler, it is initialized.
 The LFSR does NOT advance on SKP characters associated with the SKIP OS otherwise LFSR
advances for every character (Both D and K character) with the transmission.

103
Device Link-up
Bit Lock: The receiver PLL uses the transitions in the received bit-stream to synchronize the Rx Clock
with the Tx Clock that was used at the transmitter to clock out the serialized bit stream. When the
receiver PLL locks on to the Tx Clock frequency, the receiver is said to have achieved "Bit Lock".

Symbol Lock: When the receive logic starts receiving a bit stream, with no markers to differentiate one
symbol from another. The receive logic uses COM symbol to determine the start and end of a 10-bit
symbol.
Upon detection of the COM symbol, the COM Detector knows that the next bit received after the
COM symbol is the first bit of a valid 10-bit symbol. The De-serializer is then initialized so that it can
henceforth generate valid 10-bit symbols. The De-serializer is said to achieve 'Symbol Lock'.

Lane-to-Lane De-Skew:
Due to Link wire length variations and the different driver/receiver characteristics on a multi-Lane
Link, each of the parallel bit streams that represent a packet are transmitted simultaneously, but they do
not arrive at the receivers on each lane at the same time. The receiver circuit must compensate for this
skew by adding or removing delays on each Lane so that the receiver can receive and align the serial
bit streams of the packet.
104
Polarity Inversion:
Device Link-up Contd
If Lane polarity inversion occurs, the TS1 Symbols 6-15 received will be D21.5 as opposed to the
expected D10.2. Similarly, if Lane polarity inversion occurs, Symbols 6-15 of the TS2 ordered set will be
D26.5 as opposed to the expected D5.2. This provides the clear indication of Lane polarity inversion.
If polarity inversion is detected the Receiver must invert the received data. The Transmitter must never
invert the transmitted data. Support for Lane Polarity Inversion is required on all PCI Express Receivers
across all Lanes independently.
D21.5: 1010101010

D10.2: 0101010101

D26.5:0101_101010

D5.2: 1010_010101

Lane Reversal:
Crisscrossed wires will introduce
interference into the Link. If however,
one or both of the devices support
Lane Reversal, the designer could
wire the Lanes in parallel fashion.
During the Link training and
initialization process, one device
reverses the Lane numbering so
the Lane numbers of the two ports
would match up. 105
Ordered Sets
EIOS:
This Ordered-Set is transmitted to a receiver prior to the transmitter placing its transmit half of the
Link in the Electrical Idle state. The receiver detects this Ordered-Set, de-gates its error detection logic
and prepares for the Link to go to the Electrical Idle state. Shortly after transmitting the Electrical Idle
Ordered-Set, the transmitter drives a differential voltage of less than 20mV peak.

FTSOS:
A transmitter that wishes to transition the state of its Link from the L0s low power state (Electrical
Idle) to the L0 state sends a defined number of FTS Ordered-Sets to the receiver. The minimum
number of FTS Ordered-Sets that the transmitter must send to the receiver is sent to the transmitter by
the receiver during Link training and initialization.

106
Ordered Sets Contd

107

SKIP OS
Using a PLL (Phase-Locked Loop), the receiver circuit generates the Rx Clock from the data bit
transitions in the input data stream. This recovered clock has the same frequency (2.5GHz) as that
of the Tx Clock used by the transmitting device to clock the data bit stream onto the wire (or fiber).
The Rx Clock is used to clock the inbound serial symbol stream into the Serial-to-Parallel converter
(Deserializer). The 10-bit symbol stream produced by the Deserializer is clocked into the elastic
buffer with a divide by 10 version of the Rx Clock. The Rx Clock is different from the Local Clock that
is used to clock symbols out of the Elastic Buffer to the 10b/8b decoder.
 Tx clock frequency is 2.5GHz and it must be accurate to +/–300ppm from a center frequency of
2.5GHz (or 600ppm total). The clock can skew by one clock every 1666 clock cycles. (Tx Clock is
different from the local clock of the Physical Layer which is a much slower clock)
 In a multi-lane implementation, the SKIP Ordered-Set is periodically transmitted on all Lanes to allow
the receiver clock tolerance compensation logic to compensate for clock frequency variations
between the clock used by the transmitting device to clock out the serial bit stream and the receiver
device's local clock. The receiver adds a SKP symbol to a SKIP Ordered-Set in the receiver elastic
buffer to prevent a potential buffer underflow condition from occurring due to the transmitter clock
being slower than the local receiver clock. Alternately, the receiver deletes a SKP symbol from the
SKIP Ordered-Set in the receiver elastic buffer to prevent a potential buffer overflow condition from
occurring due to the transmitter clock being faster than the local receiver clock.

108
SKIP OS Contd...
 SKIP OS: COM SKIP SKIP SKIP
 The set must be scheduled for insertion at most once every 1180 symbol clocks (i.e., symbol times)
and at least once every 1538 symbol clocks.
 When it's time to insert a SKIP Ordered-Set, it is inserted at the next packet boundary (not in the
middle of a packet). SKIP Ordered-Sets are inserted between packets simultaneously on all Lanes. If
a long packet transmission is already in progress, the SKIP Ordered-Sets are accumulated and then
inserted consecutively at the next packet boundary.
 SKIP Ordered-Sets must not be transmitted while the Compliance Pattern is in progress.
 In order to keep the receiver's PLL sync'd up (i.e., to keep it from drifting), when there are no TLPs,
DLLPs or PLPs to transmit Logical Idle sequence is transmitted during these times.
 formula for the maximum number of Symbols (n) between SKIP Ordered-Sets is:
n = 1538 + (maximum packet payload size + 26)
where,
26 is the number of symbols associated with the header (16 bytes), the optional ECRC (4 bytes), the
LCRC (4 bytes), and the sequence number (2 bytes).

109
1) Detect State: LTSSM States
A device detects the presence or absence of a device connected at the far end of the Link.
2) Polling State:
Bit Lock
Symbol Lock
Lane Polarity
Compliance testing:
The transmitter outputs a specified compliance pattern. This is intended to be used with test
equipment to verify that all of the voltage, noise emission and timing specifications are within
tolerance.
3) Configuration:
The main function of this state is the assignment of Link numbers and Lane numbers to each Link
that is connected to a different device. The Link is also De-skewed in this state.
An upstream device sends TS1 Ordered-Sets on all downstream Lanes. This starts the Link
numbering and Lane numbering process. If the width determination and Lane numbering is
completed successfully, then TS2 Ordered-Sets are transmitted to the neighbouring device to
confirm the Link Width, Link Number and Lane Number for each Link connected to a different
device.
Link Width
Link Number
Lane Reversal 110
Lane-to-Lane de-skew is performed
LTSSM States
Contd

111
LTSSM States Contd
4) L0:
This is the normal, fully active state of a Link during which TLPs, DLLPs and PLPs can be
transmitted and received.
5) Recovery State:
This state is entered from the L0 state due to an error that renders the Link inoperable. Recovery is
also entered from the L1 state when the Link needs re-training before it transitions to the L0 state.
In Recovery, Bit Lock and Symbol Lock are re-established in a manner similar to that used in the
Polling state.
Lane-to-Lane de-skew is performed. The number of FTS Ordered-Sets required to transition from
the L0s state to the L0 state is re-established.
6) Hot Reset:
This state is entered when directed to do so by a device's higher layer, or when a device receives
two, consecutive TS1 Ordered-Sets with the Hot Reset bit set in the TS1 Training Control field.

112
LTSSM States Contd
7) Loopback:
This state is used as a test and fault isolation state. Only entry and exit of this state is specified. The
details of what occurs in this state are unspecified. Testing can occur.
On a per Lane basis or on the entire configured Link. The Loopback Master device sends TS1 OS
to the Loopback Slave with the Loopback bit set in the TS1 Training Control field. The Loopback Slave
enters Loopback when it receives two consecutive TS1OS with the Loopback bit set. How the
Loopback Master enters into the Loopback state is device specific.
Once in the Loopback state, the Master can send any pattern of symbols, as long as the 8b/10b
encoding rules are followed.
8) Disable:
This state allows a configured Link to be disabled (e.g., due to a surprise removal of the remote
device). In this state, the transmitter driver is in the electrical high impedance state and the receiver is
enabled and in the low impedance state.
Software commands a device to enter the Disable state by setting the Disable bit in the Link
Control register. The device then transmits 16 TS1OS with the Disable Link bit set in the TS1 Training
Control field. A connected receiver is Disabled when it receives TS1 Ordered-Sets with the Disable Link
bit set.

113
LTSSM States Contd
9) L1 State (Software driven):
This is a lower power state than L0s and has a longer exit latency than the L0s exit latency.
Power management software may direct a device to place its upstream Link into L1 (both directions
of the Link go to L1) when the device is placed in a lower power device state such as D1, D2, or D3.

Entry process into the L1 State:


Upcomp=0 (Downstream component)
Receives config-write TLP to the power management control register to enter into non-D0 state.
Send completion of the config-write TLP and starts entry into the low power L1 state.
It accumulates minimum credits for all posted, non-posted and completion TLP of all enabled VC
and blocks scheduling of new TLPs (Disables TL layer)
It waits to receive Ack for last TLP (It means retry buffer should be empty).
PM_Enter_L1 DLLPs sent repeatedly.
Waits to receive PM_Request_Ack DLLP.

114
LTSSM States Contd
Upcomp=1 (Upstream component)
It blocks scheduling of new TLPs (Disables TL layer).
It waits to receive Ack for last TLP.
Sends PM_Request_Ack DLLP repeatedly until it sees electrical idle.

Upcomp=0 (Downstream component)


It sees PM_Request_Ack DLLP, disables DLLP, TLP transmission and brings Physical Layer to
electrical idle.

Upcomp=1 (Upstream component)


Upstream component completes L1 transition: disables DLLP, TLP transmission and brings
Physical Layer to electrical idle.

Exit process from the L1 state:


Upcomp=1 (Upstream component)
Sends configuration write TLP to the PMCSR register.

Upcomp=0 (Downstream component)


Send a PM_PME message to notify the PM software that the function has experienced an event
that requires it be returned to full power state. 115
LTSSM States Contd
10) L2/L3 State
Entry Process into L2/L3 State
 RC sent PME_Turn_Off message for the removal of reference clock and main power.
 EP blocks acceptance of TLP from Transaction Layer and sent PME_To_Ack TLP.
 Switch Up Stream port gathers PME_To_Ack TLP from each upstream port and then sent a
cumulative PME_To_Ack TLP to the RC.
 EP blocks Data Link Layer and sent PM_Enter_L23 DLLP, which represents that the EP is ready for
the removal of main power and reference clock.
 Switch Up Stream port also blocks Data Link Layer and sent PM_Enter_L23 DLLP to the RC.
 After receiving PM_Enter_L23 DLLP from each root port, RC can remove the reference clock and
main power.
 If Auxilary clock is present then the link will enter into L2 state otherwise it will enter into L3 state. L3
is the complete shut down state, wake-up from this method is not possible.

116
Exit Process from L2 State LTSSM States Contd
 When there will be an event occur in the EP, it will send inband beacon signal or assert the WAKE#
using the axillary clock.
 Once main power has been restored and the Link is UP, EP will send a PM_PME message to the RC.
PM_PME Messages are posted Transaction Layer Packets (TLPs) that inform the power
management software which agent within the Hierarchy requests a PM state change.
Beacon Signal
The Beacon is a DC balanced signal of periodic arbitrary data, which is required to contain some pulse
widths >= 2 ns but no larger than 16 μs.
DC balance must be always be restored within a maximum time of 32 μs.
Beacon is transmitted in a low impedance mode.
All Beacons must be transmitted and received on at least Lane 0 of multi-Lane Links(Lane 0 as defined
after Link width and Lane reversal negotiations are complete.).
WAKE# Signal
It is an open drain signal, asserted by components requesting wakeup and observed by the associated
power management controller.
Once WAKE# has been asserted, the asserting Function must continue to drive the signal low until main
power has been restored to the component as indicated by Fundamental Reset going inactive 117
ASPM
ASPM: Activate State Power Management

What is considered as Link Idle?


EP and RC:
 NO TLPs are pending transmission or flow control credits are available for a pending TLP.
 No DLLP are pending transmission.
Upstream port of a switch:

 The receive lanes of all downstream ports are in the electrical idle state.
 NO TLPs are pending transmission or flow control credits are available for a pending TLP.
 No DLLP are pending transmission.
Downstream port of a switch:

 The receive lanes of upstream port is in the electrical idle state.


 NO TLPs are pending transmission or flow control credits are available for a pending TLP.
 No DLLP are pending transmission.

118
ASPM Contd
When the link will enter into L0s state?
 ASPM L0s is enabled.
 The link is idle for implementation specific set period of time.

When the link will start negotiation of entry into ASPM L1 state?
 ASPM L1 is enabled.
 Both sides of the link must be in the electrical idle state for implementation specific set period of
time.
 For upstream port of a switch:
All of the Switch’s Downstream Port Links are in the L1 state (or deeper).

L0s:
This is a low power, ASPM (Active State Power Management) state. It takes a very short time (in
the order of 50ns) to transit from the L0s state back to the L0 state (because the LTSSM does not have
to go through the Recovery state).
Entry: After a transmitter sends and the remote receiver receives Electrical Idle Ordered-Sets while
in the L0 state.
Exit: from the L0s state to the L0 state involves sending and receiving FTS Ordered-Sets. When
transitioning from L0s exit to L0, Lane-to-Lane de-skew must be performed, and Bit and Symbol Lock
must be re-established. 119
ASPM L1
Upcomp=0 (Downstream component)
Device blocks scheduling of new TLP.
Device received ACK for last TLP (it means that retry buffer is empty).
All FC credits sufficient to send a maximum sized transaction.
PM_Active_state_Request_L1 sent continuously until PM_Request_ACK received from the opposite
end.

Upcomp=1 (Upstream component)


PM_Active_state_Request_L1 received.
If the upstream component does not satisfy the transition to L1 state then it sends
PM_Active_state_NAK message.
If the upstream component satisfies the condition to enter into ASPM L1 state then
Device blocks scheduling of new TLPs.
Device receive ACK for last TLPs.

Upcomp=0 (Downstream component)


If PM_Request_ACK DLLP received, the device disables the DLL, sends EIOS and enter into
electrical idle by disabling the phy layer.
If PM_Active_state_NAK TLP received and entry into L0s satisfied then downstream component will
enter into L0s state otherwise it will move to L0 state.
120
ASPM L1 Contd
Upcomp=1 (Upstream component)
If PM_Request_ACK DLLP is sent continuously then it waits for the EIOS from the downstream
component.
After receiving the EIOS, the upstream component disables the DLL, send EIOS and enter into
electrical idle.
If PM_Active_state_NAK TLP is sent and the link satisfy the entry into L0s state then it will enter
into L0s state otherwise it will stay in L0 state.

Exit From ASPM L1 state:


 The procedure is same for exit for both upstream and downstream component.

 When switches are involved in exiting from ASPM L1 state, the other switch ports in the ASPM L1

low power state also must transition to the L0 state.


 When there is TLP ready to transmit, the link will transition to the L0 state through Recovery. DLLP

timers are frozen during L1 state, so for DLLP transmission the link will never return to the L0
state.

121
POWER MANAGEMENT

122
 Sleeping State: PM Introduction
A computer state where the computer consumes a small amount of power, user mode threads are not
being executed, and the system appears to be off (from an end user’s perspective, the display is off, etc.).
Latency for returning to the Working state varies on the wakeup environment
selected prior to entry of this state (for example, should the system answer phone calls, etc.).
Work can be resumed without rebooting the operating system because large elements of system
context are saved by the hardware and the rest by system software. It is not safe to disassemble the
machine in this state.
 Soft-Off State:
A computer state where the computer consumes a minimal amount of power. No user mode or
system mode code is run. This state requires a large latency in order to return to the Working state.
The system must be restarted to return to the Working state. The system’s context will not be preserved
by the hardware. It is not safe to disassemble the machine.
 Working State:
A computer state where the system dispatches user mode (application) threads and they execute. In
this state, devices (peripherals) are dynamically having their power state changed. The user will be able to
select (through some user interface) various performance/power characteristics of the system to have the
software optimize for performance or battery life. The system responds to external events in real time. It is
not safe to disassemble the machine in this state. 123
Configuration context: PM Introduction Contd
It consists of data written to the configuration space registers of the PCI device by either BIOS
(system firmware) during POST or by the operating system. Configuration context is additional to PME
Context. Configuration Context is specific register data that is either preserved (returned) or not
preserved (lost) across a D3 state transition. It does not imply or cause an action.
Ex: Device Control (or Command), Device Status, Cache Line Size, Latency Timer, Interrupt Line,
Base Address (all), Message Control, Message Address, Message Data.
PCI Function Context
The variable data held by the PCI function, usually volatile. Function context refers to small
amounts of information held internal to the function. Function context is not limited only to the contents
of the function’s PCI registers, but rather refers also to the operational states of the function including
state machine context, power state, execution stack (in some cases), etc.
PME(Power Management Event):
A PME is the process by which a PCI function can request a change of its power consumption state.
Typically, a device uses a PME to request a change from a power savings state to the fully operational
(and fully powered) state. However, a device could use a PME to request a change to a lower power
state.
A PME is requested via the assertion of the PME# signal when enabled by software. The power
management policies of the system ultimately dictate what action is taken as a result of a PME. 124
PM Introduction Contd
PME Context:
PME Context is defined as the logic responsible for identifying PMEs, the logic responsible for
generating the PME# signal, and fields (PME_En and PME_Status bits) within PM Capability register
that provide the standard system interface for this functionality. PME Context also contains any device
class specific status that must survive the transition to the D0 Uninitialized state as well.
If a function supports PME# generation from D3 cold , its PME Context is not affected by either a PCI
Bus Segment Reset (hardware component reset). This is because the function’s PME functionality itself
may have been responsible for the wake event which caused the transition back to D0. Therefore, the
PME Context must be preserved for the system software to process.
If PME# generation is not supported from D3 cold , then all PME Context is initialized with the
assertion of a bus segment reset.
Wake Up Event:
An event which can be enabled to wake the system from a Sleeping or Soft Off state to a Working
state to allow some task to be performed.

125
Device PM
PCI Function PM:

PM Capability registers:

126
Device PM Contd
PM Capability Register
PME_Support[15:11] D2_Support[10] D1_Support[9] Aux_Current[8:6] DSI[5] RSVD[4:3] Version[2:0]
= 3'b011

PME_Support[15:11]:
XXXX1: PME# can be asserted from D0
XXX1X: PME# can be asserted from D1
XX1XX: PME# can be asserted from D2
X1XXX: PME# can be asserted from D3-hot
1XXXX: PME# can be asserted from D3-Cold
Aux_Current[8:6]
For functions that support PME# from D3-Cold and do not implement the data register, this field reports
the 3.3Vaux auxilary current requirements for the PCI function.
DSI[5] (Device Specific Initialization):
The Device Specific Initialization bit indicates whether special initialization of this function is required
(beyond the standard PCI configuration header) before the generic class device driver is able to use it.
127
PMCSR (Power Management Control and Status Register): Device PM Contd
PME# Data Data PME_En[8] Rsvd[7:4] No_Soft_Reset[ Rsvd[2] Power_State[
Status[15] Scale[14:13] Select[12:9] 3] 1:0]

 power_state:
This field determines the current power state of a function and to set the function to a new power
state. (0: D0, 1: D1, 2:D2, 3:D3-Hot).
 No_Soft_Reset:
When clear (“0”), devices do perform an internal reset upon transitioning from D3 hot to D0 via
software control of the PowerState bits. Configuration Context is lost when performing the soft reset.
Upon transition from the D3 hot to the D0 state, full reinitialization sequence is needed to return the
device to D0 Initialized State.
When set (“1”), Configuration Context is preserved. Upon transition from the D3 hot to the D0 Initialized
state, no additional operating system intervention is required to preserve Configuration Context
beyond writing the PowerState bits.
 PME_En:
This bit defaults to “0” if the function does not support PME# generation from D3 cold .
If the function supports PME# from D3 cold , then this bit is sticky and must be explicitly cleared by
the operating system each time it is initially loaded. 128
 PME_status: Device PM Contd
This bit is set when the function would normally assert the PME# signal independent of the state of
the PME_En bit.
This bit defaults to “0” if the function does not support PME# generation from D3 cold.
If the function supports PME# from D3 cold , then this bit is sticky and must be explicitly cleared by
the operating system each time the operating system is initially loaded.
 Data_Scale:
This 2-bit read-only field indicates the scaling factor to be used when interpreting the value of the
Data register. The value and meaning of this field will vary depending on which data value has been
selected by the Data_Select field.
 Data_Select:
This field is used to select which data is to be reported through the Data register and Data_Scale
field. (RW register field).
 Data:
The Data register is an optional, 8-bit read-only register that provides a mechanism for the function to
report state dependent operating data such as power consumed or heat dissipation. Typically the data
returned through the Data register is a static copy (look up table, for example) of the function’s worst case
“DC characteristics” data sheet. This data, when made available to system software, could then be used
to intelligently make decisions about power budgeting, cooling requirements, etc. 129
Device PM Contd
This register is used to report the state dependent data requested by the Data_Select field. The
value of this register is scaled by the value reported by the Data_Scale field.
If the Data register is not implemented then Data_Select and Data_Scale field will be reserved value.
Power consumption/dissipation reporting:

Data_Select Data_Register[7:0] Data_Select[1:0]


0 D0 Power Consumed
1 D1 Power Consumed
2 D2 Power Consumed 0: Unknown
3 D3 Power Consumed 1: 0.1X
4 D0 Power Dissipated 2:0.01X
5 D1 Power Dissipated 3:0.001X
6 D2 Power Dissipated
7 D3 Power Dissipated
8 Common logic power consumption (valid only
for function 0 of multi function device)
9-15 Reserved
130
Device PM State: Device PM Contd
 In the D3 hot state, full context may be preserved (if no_soft_reset field is supported) and it can be
returned to D0 Initialized by writing the D0 state command to the PMCSR register. System level reset is
not required here.
 In device state D3 cold , function context need not be maintained. However, if PMEs are supported from
D3, then PME context must be retained at a minimum. When the function is brought back to D0
Uninitialized from D3 cold (the only legal state transition from D3 cold ), software will need to perform a
full re-initialization of the function including its PCI Configuration Space.
System level reset is required here.
Only Auxiliary power source will be available, main power will not be made available from the normal
VCC power plane.
 D1/D2/D3-Hot States
A Function must not initiate any Request TLPs on the Link with the exception of a PME Message. All
other received Requests must be handled as Unsupported Requests.
Configuration and Message Requests are the only TLPs accepted by a Function And all received
Completions may optionally be handled as Unexpected Completions.
If an error caused by a received TLP (e.g., an Unsupported Request) is detected while in non-D0, and
reporting is enabled, the Link must be returned to L0 if it is not already in L0 and an error message must be
sent. If an error caused by an event other than a received TLP (e.g.,. a Completion Timeout) is detected
while in D1, an error message must be sent when the Function is programmed back to the D0 state. 131
References
1. PCI Express Technology: Comprehensive Guide to Generations 1.x,
2.x and 3.0
Author(s): Mike Jackson and Ravi Budruk
ISBN: 978-0-9836465-2-5
Publisher: Mindshare Press
Weblink: : https://www.mindshare.com/Learn/PCI_Express/Books

2. PCI Express Specifications


Weblink:: https://pcisig.com/specifications/pciexpress/

3. PHY Interface for the PCI Express Architecture


Weblink: http://www.applistar.com/wp-content/uploads/apps/pipe2_00.pdf

132
Thank You

133

You might also like