-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Testing PCIe graphics cards on Pi5 #7072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
6by9
wants to merge
670
commits into
raspberrypi:rpi-6.17.y
Choose a base branch
from
6by9:rpi-6.17.y-pcie-gpu
base: rpi-6.17.y
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
+249,331
−6,546
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
d352ac5
to
d783ebd
Compare
👀 |
This was referenced Oct 1, 2025
Some hardware will implement transpose as a rotation operation, which when combined with X and Y reflect can result in a rotation, but is a discrete operation in its own right. Add an option for transpose only. Signed-off-by: Dave Stevenson <[email protected]>
Some connectors, particularly writeback, can implement flip or transpose operations as writing back to memory. Add a connector rotation property to control this. Signed-off-by: Dave Stevenson <[email protected]>
For devices where transfer lengths are not known upfront, there is a danger when the destination is wider than the source that partial words can be lost at the end of a transfer. Ideally the controller would be able to flush the residue, but it can't - it's not even possible to tell that there is any. Instead, allow the client driver to avoid the problem by setting a smaller width. Signed-off-by: Phil Elwell <[email protected]>
SPI transfers are of defined length, unlike some UART traffic, so it is safe to let the DMA controller choose a suitable memory width. Signed-off-by: Phil Elwell <[email protected]>
In order to avoid losing residue bytes when a receive is terminated early, set the destination width to single bytes. Link: raspberrypi#6365 Signed-off-by: Phil Elwell <[email protected]>
The xHC may commence Host Initiated Data Moves for streaming endpoints - see USB3.2 spec s8.12.1.4.2.4. However, this behaviour is typically counterproductive as the submission of UAS URBs in {Status, Data, Command} order and 1 outstanding IO per stream ID means the device never enters Move Data after a HIMD for Status or Data stages with the same stream ID. For OUT transfers this is especially inefficient as the host will start transmitting multiple bulk packets as a burst, all of which get NAKed by the device - wasting bandwidth. Also, some buggy UAS adapters don't properly handle the EP flow control state this creates - e.g. RTL9210. Set Host Initiated Data Move Disable to always defer stream selection to the device. xHC implementations may treat this field as "don't care, forced to 1" anyway - xHCI 1.2 s4.12.1. Signed-off-by: Jonathan Bell <[email protected]>
Attempting to start a non-idle channel causes an error message to be logged, and is inefficient. Test for emptiness of the desc_issued list before doing so. Signed-off-by: Phil Elwell <[email protected]>
The Raspberry Pi RP1 includes 2 M3 cores running firmware. This driver adds a mailbox communication channel to them via a doorbell and some shared memory. Signed-off-by: Phil Elwell <[email protected]>
The RP1 firmware runs a simple communications channel over some shared memory and a mailbox. This driver provides access to that channel. Signed-off-by: Phil Elwell <[email protected]> firmware: rp1: Simplify rp1_firmware_get Simplify the implementation of rp1_firmware_get, requiring its clients to have a valid 'firmware' property. Also make it return NULL on error. Link: raspberrypi#6593 Signed-off-by: Phil Elwell <[email protected]> firmware: rp1: Linger on firmware failure To avoid pointless retries, let the probe function succeed if the firmware interface is configured correctly but the firmware is incompatible. The value of the private drvdata field holds the outcome. Link: raspberrypi#6642 Signed-off-by: Phil Elwell <[email protected]> firmware: rp1: Rename to rp1-fw to avoid module name collision There is already the driver in drivers/mfd/rp1.ko, so having drivers/firmware/rp1.ko can cause issues when using modinfo and similar, and we can get errors with "Module rp1 is already loaded" when trying to load it. Rename the module so that the name is unique. Signed-off-by: Dave Stevenson <[email protected]> mailbox: rp1: Don't claim channels in of_xlate The of_xlate method saves the calculated event mask in the con_priv field. It also rejects subsequent attempt to use that channel because the mask is non-zero, which causes a repeated instantiation of a client driver to fail. The of_xlate method is not meant to be a point of resource acquisition. Leave the con_priv initialisation, but drop the test that it was previously zero. Signed-off-by: Phil Elwell <[email protected]>
Provide remote access to the PIO hardware in RP1. There is a single instance, with 4 state machines. Signed-off-by: Phil Elwell <[email protected]> misc: rp1-pio: Support larger data transfers Add a separate IOCTL for larger transfer with a 32-bit data_bytes field. See: raspberrypi/utils#107 Signed-off-by: Phil Elwell <[email protected]> misc: rp1-pio: More logical probe sequence Sort the probe function initialisation into a more logical order. Signed-off-by: Phil Elwell <[email protected]> misc: rp1-pio: Minor cosmetic tweaks No functional change. Signed-off-by: Phil Elwell <[email protected]> misc: rp1-pio: Add in-kernel DMA support Add kernel-facing implementations of pio_sm_config_xfer and pio_xm_xfer_data. Signed-off-by: Phil Elwell <[email protected]> misc: rp1-pio: Handle probe errors Ensure that rp1_pio_open fails if the device failed to probe. Link: raspberrypi#6593 Signed-off-by: Phil Elwell <[email protected]> misc: rp1-pio: SM_CONFIG_XFER32 = larger DMA bufs Add an ioctl type - SM_CONFIG_XFER32 - that takes uints for the buf_size and buf_count values. Signed-off-by: Phil Elwell <[email protected]> misc/rp1-pio: Fix copy/paste error in pio_rp1.h As per the subject, there was a copy/paste error that caused pio_sm_unclaim from a driver to result in a call to pio_sm_claim. Fix it. Signed-off-by: Phil Elwell <[email protected]> misc: rp1-pio: Fix parameter checks wihout client Passing bad parameters to an API call without a pio pointer will cause a NULL pointer exception when the persistent error is set. Guard against that. Signed-off-by: Phil Elwell <[email protected]> misc: rp1-pio: Convert floats to 24.8 fixed point Floating point arithmetic is not supported in the kernel, so use fixed point instead. Signed-off-by: Phil Elwell <[email protected]> misc: rp1-pio: Error out on incompatible firmware If the RP1 firmware has reported an error then return that from the PIO probe function, otherwise defer the probing. Link: raspberrypi#6642 Signed-off-by: Phil Elwell <[email protected]> misc: rp1-pio: Demote fw probe error to warning Support for the RP1 firmware mailbox API is rolling out to Pi 5 EEPROM images. For most users, the fact that the PIO is not available is no cause for alarm. Change the message to a warning, so that it does not appear with "quiet" in cmdline.txt. Link: raspberrypi#6642 Signed-off-by: Phil Elwell <[email protected]> misc: rp1-pio: Don't just reuse the same DMA buf A missing pointer increment meant that not only was the same buffer being reused again and again, there was also no protection against using it simultaneously for multiple transfers. Fix that basic bug, and also move a similar increment to before the transfer is started, which feels less racy. See: raspberrypi#6919 Signed-off-by: Phil Elwell <[email protected]>
Use the PIO hardware on RP1 to implement a PWM interface. Signed-off-by: Phil Elwell <[email protected]> pwm: rp1: use pwmchip_get_drvdata() instead of container_of() The PWM framework may not embed struct pwm_chip within the driver’s private data. Using container_of() can result in accessing invalid memory or NULL pointers, especially after recent kernel changes. Switch to pwmchip_get_drvdata() to reliably access the driver data. This resolves kernel warnings and probe failures seen after updating from kernel 6.12.28 to 6.12.34 [1] While at it remove the now obsolete `struct pwm_chip chip` member from `struct pwm_pio_rp1`. [1] raspberrypi#6971 Signed-off-by: Nicolai Buchwitz <[email protected]>
ws2812-pio-rp1 is a PIO-based driver for WS2812 LEDS. It creates a character device in /dev, the default name of which is /dev/leds<n>, where <n> is the instance number. The number of LEDS should be set in the DT overlay, as should whether it is RGB or RGBW, and the default brightness. Write data to the /dev/* entry in a 4 bytes-per-pixel format in RGBW order: RR GG BB WW RR GG BB WW ... The white values are ignored unless the rgbw flag is set for the device. To change the brightness, write a single byte to offset 0, 255 being full brightness and 0 being off. Signed-off-by: Phil Elwell <[email protected]>
Using increased bit depth for no reason increases power consumption, and differs from the behaviour prior to the conversion to use the HDMI helper functions. Initialise the state max_bpc and requested_max_bpc to the minimum value supported. This only affects Raspberry Pi, as the other users of the helpers (rockchip/inno_hdmi and sunx4i) only support a bit depth of 8. Signed-off-by: Dave Stevenson <[email protected]>
DSI0 and DSI1 have different widths for the command FIFO (24bit vs 32bit), but the driver was assuming the 32bit width of DSI1 in all cases. DSI0 also wants the data packed as 24bit big endian, so the formatting code needs updating. Handle the difference via the variant structure. Signed-off-by: Dave Stevenson <[email protected]>
The Raspberry Pi RP1 chip has the Cadence GEM ethernet controller, so add a compatible string for it. Signed-off-by: Dave Stevenson <[email protected]>
The RP1 chip has the Cadence GEM block, but wants the tx_clock to always run at 125MHz, in the same way as sama7g5. Add the relevant configuration. Signed-off-by: Dave Stevenson <[email protected]>
During normal operations, the cursor position update is done through an asynchronous plane update, which on the vc4 driver basically just modifies the right dlist word to move the plane to the new coordinates. However, when we have the overscan margins setup, we fall back to a regular commit when we are next to the edges. And since that commit happens to be on a cursor plane, it's considered a legacy cursor update by KMS. The main difference it makes is that it won't wait for its completion (ie, next vblank) before returning. This means if we have multiple commits happening in rapid succession, we can have several of them happening before the next vblank. In parallel, our dlist allocation is tied to a CRTC state, and each time we do a commit we end up with a new CRTC state, with the previous one being freed. This means that we free our previous dlist entry (but don't clear it though) every time a new one is being committed. Now, if we were to have two commits happening before the next vblank, we could end up freeing reusing the same dlist entries before the next vblank. Indeed, we would start from an initial state taking, for example, the dlist entries 10 to 20, then start a commit taking the entries 20 to 30 and setting the dlist pointer to 20, and freeing the dlist entries 10 to 20. However, since we haven't reach vblank yet, the HVS is still using the entries 10 to 20. If we were to make a new commit now, chances are the allocator are going to give the 10 to 20 entries back, and we would change their content to match the new state. If vblank hasn't happened yet, we just corrupted the active dlist entries. A first attempt to solve this was made by creating an intermediate dlist buffer to store the current (ie, as of the last commit) dlist content, that we would update each time the HVS is done with a frame. However, if the interrupt handler missed the vblank window, we would end up copying our intermediate dlist to the hardware one during the composition, essentially creating the same issue. Since making sure that our interrupt handler runs within a fixed, constrained, time window would require to make Linux a real-time kernel, this seems a bit out of scope. Instead, we can work around our original issue by keeping the dlist slots allocation longer. That way, we won't reuse a dlist slot while it's still in flight. In order to achieve this, instead of freeing the dlist slot when its associated CRTC state is destroyed, we'll queue it in a list. A naive implementation would free the buffers in that queue when we get our end of frame interrupt. However, there's still a race since, just like in the shadow dlist case, we don't control when the handler for that interrupt is going to run. Thus, we can end up with a commit adding an old dlist allocation to our queue during the window between our actual interrupt and when our handler will run. And since that buffer is still being used for the composition of the current frame, we can't free it right away, exposing us to the original bug. Fortunately for us, the hardware provides a frame counter that is increased each time the first line of a frame is being generated. Associating the frame counter the image is supposed to go away to the allocation, and then only deallocate buffers that have a counter below or equal to the one we see when the deallocation code should prevent the above race from occurring. Signed-off-by: Maxime Ripard <[email protected]>
Users are reporting running out of DLIST memory. Add a debugfs file to dump out all the allocations. Signed-off-by: Dave Stevenson <[email protected]>
We have a read-modify-write race when updating SCALER_DISPCTRL for underrun and end-of-frame interrupts. Ideally it would be fixed via a spinlock or similar, but that will require a reasonable amount of study to ensure we don't get deadlocks. The underrun reporting is only for debug, so disable it for now. Signed-off-by: Dave Stevenson <[email protected]>
The dmabuf import already checks that the backing buffer is contiguous and rejects it if it isn't. vc4 also requires that the buffer is in the bottom 1GB of RAM, and this is all correctly defined via dma-ranges. However the kernel silently uses swiotlb to bounce dma buffers around if they are in the wrong region. This relies on dma sync functions to be called in order to copy the data to/from the bounce buffer. DRM is based on all memory allocations being coherent with the GPU so that any updates to a framebuffer will be acted on without the need for any additional update. This is fairly fundamentally incompatible with needing to call dma_sync_ to handle the bounce buffer copies, and therefore we have to detect and reject mappings that use bounce buffers. Signed-off-by: Dave Stevenson <[email protected]>
DSI0 is misbehaving and needs to action things on vblank to work around it. Add a new hook to call across during vblank. Signed-off-by: Dave Stevenson <[email protected]>
The initialisation sequence differs slightly from the documentation in that the clocks are meant to be running before resets and similar. Signed-off-by: Dave Stevenson <[email protected]>
vc4_dsi_bridge_disable wasn't resetting things during shutdown, so add that in. Signed-off-by: Dave Stevenson <[email protected]>
The block must be enabled for the FIFO resets to be actioned, so ensure this is the case. Signed-off-by: Dave Stevenson <[email protected]>
This is largely to test a previous change that made IOMMU aperture configurable and allocated lazily; it may be useful in its own right. We expect IOMMU2 to be well-utilized e.g. when using 64MPix cameras. Signed-off-by: Nick Hollinghurst <[email protected]>
Register line_length_pix was being written by both the tables of registers and the control handler for V4L2_CID_HBLANK. Remove the duplication in the tables. Signed-off-by: Dave Stevenson <[email protected]>
line_length_pix is a value that the developer wants to know, so write the values in decimal. Signed-off-by: Dave Stevenson <[email protected]>
The frame length default value doesn't change dynamically, and neither does any of the other parameters that configure it, so precompute it instead of working from a frame duration to get to the value. The minimum value was also computed, when actually the sensor will take any value down to 4 lines. Signed-off-by: Dave Stevenson <[email protected]>
This removes a load of boilerplate code around how registers are grouped into multiple word values. Signed-off-by: Dave Stevenson <[email protected]>
There are a fair number of registers duplicated in all the mode tables, so move those into the common table. Signed-off-by: Dave Stevenson <[email protected]>
For 4k30 recording we want 16:9 output, so add a cropped mode to achieve this. Signed-off-by: Dave Stevenson <[email protected]>
Rather than the hard coded PLL settings for fixed frequencies, compute the PLL settings based on device tree, validating that the specified rate can be achieved. Signed-off-by: Dave Stevenson <[email protected]>
As we now support variable link frequency, compute the minimum line_length value that the sensor will work with, and set V4L2_CID_HBLANK based on that number. Signed-off-by: Dave Stevenson <[email protected]>
Now that the link frequency can be varied, write the link bit rate registers to reflect the speed being used. Signed-off-by: Dave Stevenson <[email protected]>
The timing registers configured are for 450MHz. If running at a different link frequency, use the automatic timing control. Signed-off-by: Dave Stevenson <[email protected]>
The sensor supports readout as 10 or 12 bit. As we are now computing the horizontal blanking limits dynamically, adding support for both readout modes falls out trivially, so add them both. Signed-off-by: Dave Stevenson <[email protected]>
8 bit readout is only a reconfiguration of the CSI2 block, and recomputation of horizontal blanking. Enable it. Signed-off-by: Dave Stevenson <[email protected]>
Change-Id: Ic95c8514271d246dd668631810e8dee210f7f1b4 Signed-off-by: Yanghaku <[email protected]>
Signed-off-by: Dave Stevenson <[email protected]>
Taken from https://github.com/chimera-linux/cports/blob/master/main/linux-stable/patches/xe-nonx86.patch Signed-off-by: Dave Stevenson <[email protected]>
…node") We lost a line in the forward port, which meant that it always used /dev/fb0, and complained that the sysfs nodes already existed. Fixes: 5769e04 ("fbdev: Allow client to request a particular /dev/fbN node") Signed-off-by: Dave Stevenson <[email protected]>
Signed-off-by: Dave Stevenson <[email protected]>
Resizing BARs can be blocked when a device in the bridge hierarchy itself consumes resources from the resized range. This scenario is common with Intel Arc DG2 GPUs where the following is a typical topology: +-[0000:5d]-+-00.0-[5e-61]----00.0-[5f-61]--+-01.0-[60]----00.0 Intel Corporation DG2 [Arc A380] \-04.0-[61]----00.0 Intel Corporation DG2 Audio Controller Here the system BIOS has provided a large 64bit, prefetchable window: pci_bus 0000:5d: root bus resource [mem 0xb000000000-0xbfffffffff window] But only a small portion is programmed into the root port aperture: pci 0000:5d:00.0: bridge window [mem 0xbfe0000000-0xbff07fffff 64bit pref] The upstream port then provides the following aperture: pci 0000:5e:00.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref] With the missing range found to be consumed by the switch port itself: pci 0000:5e:00.0: BAR 0 [mem 0xbff0000000-0xbff07fffff 64bit pref] The downstream port above the GPU provides the same aperture as upstream: pci 0000:5f:01.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref] Which is entirely consumed by the GPU: pci 0000:60:00.0: BAR 2 [mem 0xbfe0000000-0xbfefffffff 64bit pref] In summary, iomem reports the following: b000000000-bfffffffff : PCI Bus 0000:5d bfe0000000-bff07fffff : PCI Bus 0000:5e bfe0000000-bfefffffff : PCI Bus 0000:5f bfe0000000-bfefffffff : PCI Bus 0000:60 bfe0000000-bfefffffff : 0000:60:00.0 bff0000000-bff07fffff : 0000:5e:00.0 The GPU at 0000:60:00.0 supports a Resizable BAR: Capabilities: [420 v1] Physical Resizable BAR BAR 2: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB 8GB However when attempting a resize we get -ENOSPC: pci 0000:60:00.0: BAR 2 [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing pcieport 0000:5f:01.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing pcieport 0000:5e:00.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing pcieport 0000:5e:00.0: bridge window [mem size 0x200000000 64bit pref]: can't assign; no space pcieport 0000:5e:00.0: bridge window [mem size 0x200000000 64bit pref]: failed to assign pcieport 0000:5f:01.0: bridge window [mem size 0x200000000 64bit pref]: can't assign; no space pcieport 0000:5f:01.0: bridge window [mem size 0x200000000 64bit pref]: failed to assign pci 0000:60:00.0: BAR 2 [mem size 0x200000000 64bit pref]: can't assign; no space pci 0000:60:00.0: BAR 2 [mem size 0x200000000 64bit pref]: failed to assign pcieport 0000:5d:00.0: PCI bridge to [bus 5e-61] pcieport 0000:5d:00.0: bridge window [mem 0xb9000000-0xba0fffff] pcieport 0000:5d:00.0: bridge window [mem 0xbfe0000000-0xbff07fffff 64bit pref] pcieport 0000:5e:00.0: PCI bridge to [bus 5f-61] pcieport 0000:5e:00.0: bridge window [mem 0xb9000000-0xba0fffff] pcieport 0000:5e:00.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref] pcieport 0000:5f:01.0: PCI bridge to [bus 60] pcieport 0000:5f:01.0: bridge window [mem 0xb9000000-0xb9ffffff] pcieport 0000:5f:01.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref] pci 0000:60:00.0: BAR 2 [mem 0xbfe0000000-0xbfefffffff 64bit pref]: assigned In this example we need to resize all the way up to the root port aperture, but we refuse to change the root port aperture while resources are allocated for the upstream port BAR. The solution proposed here builds on the idea in commit 91fa127 ("PCI: Expose PCIe Resizable BAR support via sysfs") where the BAR can be resized while there is no driver attached. In this case, when there is no driver bound to the upstream switch port we'll release resources of the bridge which match the reallocation. Therefore we can achieve the below successful resize operation by unbinding 0000:5e:00.0 from the pcieport driver before invoking the resource2_resize interface on the GPU at 0000:60:00.0. pci 0000:60:00.0: BAR 2 [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing pcieport 0000:5f:01.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing pci 0000:5e:00.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing pci 0000:5e:00.0: BAR 0 [mem 0xbff0000000-0xbff07fffff 64bit pref]: releasing pcieport 0000:5d:00.0: bridge window [mem 0xbfe0000000-0xbff07fffff 64bit pref]: releasing pcieport 0000:5d:00.0: bridge window [mem 0xb000000000-0xb2ffffffff 64bit pref]: assigned pci 0000:5e:00.0: bridge window [mem 0xb000000000-0xb1ffffffff 64bit pref]: assigned pci 0000:5e:00.0: BAR 0 [mem 0xb200000000-0xb2007fffff 64bit pref]: assigned pcieport 0000:5f:01.0: bridge window [mem 0xb000000000-0xb1ffffffff 64bit pref]: assigned pci 0000:60:00.0: BAR 2 [mem 0xb000000000-0xb1ffffffff 64bit pref]: assigned pci 0000:5e:00.0: PCI bridge to [bus 5f-61] pci 0000:5e:00.0: bridge window [mem 0xb9000000-0xba0fffff] pci 0000:5e:00.0: bridge window [mem 0xb000000000-0xb1ffffffff 64bit pref] pcieport 0000:5d:00.0: PCI bridge to [bus 5e-61] pcieport 0000:5d:00.0: bridge window [mem 0xb9000000-0xba0fffff] pcieport 0000:5d:00.0: bridge window [mem 0xb000000000-0xb2ffffffff 64bit pref] pci 0000:5e:00.0: PCI bridge to [bus 5f-61] pci 0000:5e:00.0: bridge window [mem 0xb9000000-0xba0fffff] pci 0000:5e:00.0: bridge window [mem 0xb000000000-0xb1ffffffff 64bit pref] pcieport 0000:5f:01.0: PCI bridge to [bus 60] pcieport 0000:5f:01.0: bridge window [mem 0xb9000000-0xb9ffffff] pcieport 0000:5f:01.0: bridge window [mem 0xb000000000-0xb1ffffffff 64bit pref] Link: https://patchwork.kernel.org/project/linux-pci/patch/[email protected]/ Signed-off-by: Alex Williamson <[email protected]> Signed-off-by: Jonathan Bell <[email protected]>
8b1e2b1
to
fe3d35c
Compare
This was referenced Oct 8, 2025
df370fe
to
06b5122
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Wanted for the CI builds.