Skip to content

Conversation

6by9
Copy link
Contributor

@6by9 6by9 commented Sep 30, 2025

Wanted for the CI builds.

@geerlingguy
Copy link

👀

6by9 and others added 24 commits October 6, 2025 12:16
Some hardware will implement transpose as a rotation operation,
which when combined with X and Y reflect can result in a rotation,
but is a discrete operation in its own right.

Add an option for transpose only.

Signed-off-by: Dave Stevenson <[email protected]>
Some connectors, particularly writeback, can implement flip
or transpose operations as writing back to memory.

Add a connector rotation property to control this.

Signed-off-by: Dave Stevenson <[email protected]>
For devices where transfer lengths are not known upfront, there is a
danger when the destination is wider than the source that partial words
can be lost at the end of a transfer. Ideally the controller would be
able to flush the residue, but it can't - it's not even possible to tell
that there is any.

Instead, allow the client driver to avoid the problem by setting a
smaller width.

Signed-off-by: Phil Elwell <[email protected]>
SPI transfers are of defined length, unlike some UART traffic, so it is
safe to let the DMA controller choose a suitable memory width.

Signed-off-by: Phil Elwell <[email protected]>
In order to avoid losing residue bytes when a receive is terminated
early, set the destination width to single bytes.

Link: raspberrypi#6365

Signed-off-by: Phil Elwell <[email protected]>
The xHC may commence Host Initiated Data Moves for streaming endpoints -
see USB3.2 spec s8.12.1.4.2.4. However, this behaviour is typically
counterproductive as the submission of UAS URBs in {Status, Data,
Command} order and 1 outstanding IO per stream ID means the device never
enters Move Data after a HIMD for Status or Data stages with the same
stream ID. For OUT transfers this is especially inefficient as the host
will start transmitting multiple bulk packets as a burst, all of which
get NAKed by the device - wasting bandwidth.

Also, some buggy UAS adapters don't properly handle the EP flow control
state this creates - e.g. RTL9210.

Set Host Initiated Data Move Disable to always defer stream selection to
the device. xHC implementations may treat this field as "don't care,
forced to 1" anyway - xHCI 1.2 s4.12.1.

Signed-off-by: Jonathan Bell <[email protected]>
Attempting to start a non-idle channel causes an error message to be
logged, and is inefficient. Test for emptiness of the desc_issued list
before doing so.

Signed-off-by: Phil Elwell <[email protected]>
The Raspberry Pi RP1 includes 2 M3 cores running firmware. This driver
adds a mailbox communication channel to them via a doorbell and some
shared memory.

Signed-off-by: Phil Elwell <[email protected]>
The RP1 firmware runs a simple communications channel over some shared
memory and a mailbox. This driver provides access to that channel.

Signed-off-by: Phil Elwell <[email protected]>

firmware: rp1: Simplify rp1_firmware_get

Simplify the implementation of rp1_firmware_get, requiring its clients
to have a valid 'firmware' property. Also make it return NULL on error.

Link: raspberrypi#6593

Signed-off-by: Phil Elwell <[email protected]>

firmware: rp1: Linger on firmware failure

To avoid pointless retries, let the probe function succeed if the
firmware interface is configured correctly but the firmware is
incompatible. The value of the private drvdata field holds the outcome.

Link: raspberrypi#6642

Signed-off-by: Phil Elwell <[email protected]>

firmware: rp1: Rename to rp1-fw to avoid module name collision

There is already the driver in drivers/mfd/rp1.ko, so having
drivers/firmware/rp1.ko can cause issues when using modinfo
and similar, and we can get errors with "Module rp1 is already
loaded" when trying to load it.

Rename the module so that the name is unique.

Signed-off-by: Dave Stevenson <[email protected]>

mailbox: rp1: Don't claim channels in of_xlate

The of_xlate method saves the calculated event mask in the con_priv
field. It also rejects subsequent attempt to use that channel because
the mask is non-zero, which causes a repeated instantiation of a client
driver to fail.

The of_xlate method is not meant to be a point of resource acquisition.
Leave the con_priv initialisation, but drop the test that it was
previously zero.

Signed-off-by: Phil Elwell <[email protected]>
Provide remote access to the PIO hardware in RP1. There is a single
instance, with 4 state machines.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Support larger data transfers

Add a separate IOCTL for larger transfer with a 32-bit data_bytes
field.

See: raspberrypi/utils#107

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: More logical probe sequence

Sort the probe function initialisation into a more logical order.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Minor cosmetic tweaks

No functional change.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Add in-kernel DMA support

Add kernel-facing implementations of pio_sm_config_xfer and
pio_xm_xfer_data.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Handle probe errors

Ensure that rp1_pio_open fails if the device failed to probe.

Link: raspberrypi#6593

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: SM_CONFIG_XFER32 = larger DMA bufs

Add an ioctl type - SM_CONFIG_XFER32 - that takes uints for the buf_size
and buf_count values.

Signed-off-by: Phil Elwell <[email protected]>

misc/rp1-pio: Fix copy/paste error in pio_rp1.h

As per the subject, there was a copy/paste error that caused
pio_sm_unclaim from a driver to result in a call to
pio_sm_claim. Fix it.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Fix parameter checks wihout client

Passing bad parameters to an API call without a pio pointer will cause
a NULL pointer exception when the persistent error is set. Guard
against that.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Convert floats to 24.8 fixed point

Floating point arithmetic is not supported in the kernel, so use fixed
point instead.

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Error out on incompatible firmware

If the RP1 firmware has reported an error then return that from the PIO
probe function, otherwise defer the probing.

Link: raspberrypi#6642

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Demote fw probe error to warning

Support for the RP1 firmware mailbox API is rolling out to Pi 5 EEPROM
images. For most users, the fact that the PIO is not available is no
cause for alarm. Change the message to a warning, so that it does not
appear with "quiet" in cmdline.txt.

Link: raspberrypi#6642

Signed-off-by: Phil Elwell <[email protected]>

misc: rp1-pio: Don't just reuse the same DMA buf

A missing pointer increment meant that not only was the same buffer
being reused again and again, there was also no protection against
using it simultaneously for multiple transfers. Fix that basic bug, and
also move a similar increment to before the transfer is started, which
feels less racy.

See: raspberrypi#6919

Signed-off-by: Phil Elwell <[email protected]>
Use the PIO hardware on RP1 to implement a PWM interface.

Signed-off-by: Phil Elwell <[email protected]>

pwm: rp1: use pwmchip_get_drvdata() instead of container_of()

The PWM framework may not embed struct pwm_chip within the driver’s
private data. Using container_of() can result in accessing invalid
memory or NULL pointers, especially after recent kernel changes.

Switch to pwmchip_get_drvdata() to reliably access the driver data.
This resolves kernel warnings and probe failures seen after updating
from kernel 6.12.28 to 6.12.34 [1]

While at it remove the now obsolete `struct pwm_chip chip` member from
`struct pwm_pio_rp1`.

[1] raspberrypi#6971

Signed-off-by: Nicolai Buchwitz <[email protected]>
ws2812-pio-rp1 is a PIO-based driver for WS2812 LEDS. It creates a
character device in /dev, the default name of which is /dev/leds<n>,
where <n> is the instance number. The number of LEDS should be set
in the DT overlay, as should whether it is RGB or RGBW, and the default
brightness.

Write data to the /dev/* entry in a 4 bytes-per-pixel format in RGBW
order:

  RR GG BB WW RR GG BB WW ...

The white values are ignored unless the rgbw flag is set for the device.

To change the brightness, write a single byte to offset 0, 255 being
full brightness and 0 being off.

Signed-off-by: Phil Elwell <[email protected]>
Using increased bit depth for no reason increases power
consumption, and differs from the behaviour prior to the
conversion to use the HDMI helper functions.

Initialise the state max_bpc and requested_max_bpc to the
minimum value supported. This only affects Raspberry Pi,
as the other users of the helpers (rockchip/inno_hdmi and
sunx4i) only support a bit depth of 8.

Signed-off-by: Dave Stevenson <[email protected]>
DSI0 and DSI1 have different widths for the command FIFO (24bit
vs 32bit), but the driver was assuming the 32bit width of DSI1
in all cases.
DSI0 also wants the data packed as 24bit big endian, so the
formatting code needs updating.

Handle the difference via the variant structure.

Signed-off-by: Dave Stevenson <[email protected]>
The Raspberry Pi RP1 chip has the Cadence GEM ethernet
controller, so add a compatible string for it.

Signed-off-by: Dave Stevenson <[email protected]>
The RP1 chip has the Cadence GEM block, but wants the tx_clock
to always run at 125MHz, in the same way as sama7g5.
Add the relevant configuration.

Signed-off-by: Dave Stevenson <[email protected]>
During normal operations, the cursor position update is done through an
asynchronous plane update, which on the vc4 driver basically just
modifies the right dlist word to move the plane to the new coordinates.

However, when we have the overscan margins setup, we fall back to a
regular commit when we are next to the edges. And since that commit
happens to be on a cursor plane, it's considered a legacy cursor update
by KMS.

The main difference it makes is that it won't wait for its completion
(ie, next vblank) before returning. This means if we have multiple
commits happening in rapid succession, we can have several of them
happening before the next vblank.

In parallel, our dlist allocation is tied to a CRTC state, and each time
we do a commit we end up with a new CRTC state, with the previous one
being freed. This means that we free our previous dlist entry (but don't
clear it though) every time a new one is being committed.

Now, if we were to have two commits happening before the next vblank, we
could end up freeing reusing the same dlist entries before the next
vblank.

Indeed, we would start from an initial state taking, for example, the
dlist entries 10 to 20, then start a commit taking the entries 20 to 30
and setting the dlist pointer to 20, and freeing the dlist entries 10 to
20. However, since we haven't reach vblank yet, the HVS is still using
the entries 10 to 20.

If we were to make a new commit now, chances are the allocator are going
to give the 10 to 20 entries back, and we would change their content to
match the new state. If vblank hasn't happened yet, we just corrupted
the active dlist entries.

A first attempt to solve this was made by creating an intermediate dlist
buffer to store the current (ie, as of the last commit) dlist content,
that we would update each time the HVS is done with a frame. However, if
the interrupt handler missed the vblank window, we would end up copying
our intermediate dlist to the hardware one during the composition,
essentially creating the same issue.

Since making sure that our interrupt handler runs within a fixed,
constrained, time window would require to make Linux a real-time kernel,
this seems a bit out of scope.

Instead, we can work around our original issue by keeping the dlist
slots allocation longer. That way, we won't reuse a dlist slot while
it's still in flight. In order to achieve this, instead of freeing the
dlist slot when its associated CRTC state is destroyed, we'll queue it
in a list.

A naive implementation would free the buffers in that queue when we get
our end of frame interrupt. However, there's still a race since, just
like in the shadow dlist case, we don't control when the handler for
that interrupt is going to run. Thus, we can end up with a commit adding
an old dlist allocation to our queue during the window between our
actual interrupt and when our handler will run. And since that buffer is
still being used for the composition of the current frame, we can't free
it right away, exposing us to the original bug.

Fortunately for us, the hardware provides a frame counter that is
increased each time the first line of a frame is being generated.
Associating the frame counter the image is supposed to go away to the
allocation, and then only deallocate buffers that have a counter below
or equal to the one we see when the deallocation code should prevent the
above race from occurring.

Signed-off-by: Maxime Ripard <[email protected]>
Users are reporting running out of DLIST memory. Add a
debugfs file to dump out all the allocations.

Signed-off-by: Dave Stevenson <[email protected]>
We have a read-modify-write race when updating SCALER_DISPCTRL for
underrun and end-of-frame interrupts.
Ideally it would be fixed via a spinlock or similar, but that will
require a reasonable amount of study to ensure we don't get deadlocks.

The underrun reporting is only for debug, so disable it for now.

Signed-off-by: Dave Stevenson <[email protected]>
The dmabuf import already checks that the backing buffer is contiguous
and rejects it if it isn't. vc4 also requires that the buffer is
in the bottom 1GB of RAM, and this is all correctly defined via
dma-ranges.

However the kernel silently uses swiotlb to bounce dma buffers
around if they are in the wrong region. This relies on dma sync
functions to be called in order to copy the data to/from the
bounce buffer.

DRM is based on all memory allocations being coherent with the
GPU so that any updates to a framebuffer will be acted on without
the need for any additional update. This is fairly fundamentally
incompatible with needing to call dma_sync_ to handle the bounce
buffer copies, and therefore we have to detect and reject mappings
that use bounce buffers.

Signed-off-by: Dave Stevenson <[email protected]>
DSI0 is misbehaving and needs to action things on vblank to
work around it.
Add a new hook to call across during vblank.

Signed-off-by: Dave Stevenson <[email protected]>
The initialisation sequence differs slightly from the documentation
in that the clocks are meant to be running before resets and
similar.

Signed-off-by: Dave Stevenson <[email protected]>
vc4_dsi_bridge_disable wasn't resetting things during shutdown,
so add that in.

Signed-off-by: Dave Stevenson <[email protected]>
The block must be enabled for the FIFO resets to be actioned,
so ensure this is the case.

Signed-off-by: Dave Stevenson <[email protected]>
njhollinghurst and others added 13 commits October 6, 2025 12:17
This is largely to test a previous change that made IOMMU aperture
configurable and allocated lazily; it may be useful in its own right.
We expect IOMMU2 to be well-utilized e.g. when using 64MPix cameras.

Signed-off-by: Nick Hollinghurst <[email protected]>
Register line_length_pix was being written by both the tables
of registers and the control handler for V4L2_CID_HBLANK.

Remove the duplication in the tables.

Signed-off-by: Dave Stevenson <[email protected]>
line_length_pix is a value that the developer wants to know,
so write the values in decimal.

Signed-off-by: Dave Stevenson <[email protected]>
The frame length default value doesn't change dynamically, and
neither does any of the other parameters that configure it,
so precompute it instead of working from a frame duration to
get to the value.

The minimum value was also computed, when actually the sensor
will take any value down to 4 lines.

Signed-off-by: Dave Stevenson <[email protected]>
This removes a load of boilerplate code around how registers
are grouped into multiple word values.

Signed-off-by: Dave Stevenson <[email protected]>
There are a fair number of registers duplicated in all the mode
tables, so move those into the common table.

Signed-off-by: Dave Stevenson <[email protected]>
For 4k30 recording we want 16:9 output, so add a cropped mode
to achieve this.

Signed-off-by: Dave Stevenson <[email protected]>
Rather than the hard coded PLL settings for fixed frequencies,
compute the PLL settings based on device tree, validating that
the specified rate can be achieved.

Signed-off-by: Dave Stevenson <[email protected]>
As we now support variable link frequency, compute the minimum
line_length value that the sensor will work with, and set
V4L2_CID_HBLANK based on that number.

Signed-off-by: Dave Stevenson <[email protected]>
Now that the link frequency can be varied, write the link bit
rate registers to reflect the speed being used.

Signed-off-by: Dave Stevenson <[email protected]>
The timing registers configured are for 450MHz.
If running at a different link frequency, use the automatic
timing control.

Signed-off-by: Dave Stevenson <[email protected]>
The sensor supports readout as 10 or 12 bit. As we are now
computing the horizontal blanking limits dynamically, adding
support for both readout modes falls out trivially, so add
them both.

Signed-off-by: Dave Stevenson <[email protected]>
8 bit readout is only a reconfiguration of the CSI2 block,
and recomputation of horizontal blanking. Enable it.

Signed-off-by: Dave Stevenson <[email protected]>
yanghaku and others added 9 commits October 8, 2025 11:03
Change-Id: Ic95c8514271d246dd668631810e8dee210f7f1b4
Signed-off-by: Yanghaku <[email protected]>
…node")

We lost a line in the forward port, which meant that it always used
/dev/fb0, and complained that the sysfs nodes already existed.

Fixes: 5769e04 ("fbdev: Allow client to request a particular /dev/fbN node")

Signed-off-by: Dave Stevenson <[email protected]>
Resizing BARs can be blocked when a device in the bridge hierarchy
itself consumes resources from the resized range.  This scenario is
common with Intel Arc DG2 GPUs where the following is a typical
topology:

 +-[0000:5d]-+-00.0-[5e-61]----00.0-[5f-61]--+-01.0-[60]----00.0  Intel Corporation DG2 [Arc A380]
                                             \-04.0-[61]----00.0  Intel Corporation DG2 Audio Controller

Here the system BIOS has provided a large 64bit, prefetchable window:

pci_bus 0000:5d: root bus resource [mem 0xb000000000-0xbfffffffff window]

But only a small portion is programmed into the root port aperture:

pci 0000:5d:00.0:   bridge window [mem 0xbfe0000000-0xbff07fffff 64bit pref]

The upstream port then provides the following aperture:

pci 0000:5e:00.0:   bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref]

With the missing range found to be consumed by the switch port itself:

pci 0000:5e:00.0: BAR 0 [mem 0xbff0000000-0xbff07fffff 64bit pref]

The downstream port above the GPU provides the same aperture as upstream:

pci 0000:5f:01.0:   bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref]

Which is entirely consumed by the GPU:

pci 0000:60:00.0: BAR 2 [mem 0xbfe0000000-0xbfefffffff 64bit pref]

In summary, iomem reports the following:

b000000000-bfffffffff : PCI Bus 0000:5d
  bfe0000000-bff07fffff : PCI Bus 0000:5e
    bfe0000000-bfefffffff : PCI Bus 0000:5f
      bfe0000000-bfefffffff : PCI Bus 0000:60
        bfe0000000-bfefffffff : 0000:60:00.0
    bff0000000-bff07fffff : 0000:5e:00.0

The GPU at 0000:60:00.0 supports a Resizable BAR:

	Capabilities: [420 v1] Physical Resizable BAR
		BAR 2: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB 8GB

However when attempting a resize we get -ENOSPC:

pci 0000:60:00.0: BAR 2 [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing
pcieport 0000:5f:01.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing
pcieport 0000:5e:00.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing
pcieport 0000:5e:00.0: bridge window [mem size 0x200000000 64bit pref]: can't assign; no space
pcieport 0000:5e:00.0: bridge window [mem size 0x200000000 64bit pref]: failed to assign
pcieport 0000:5f:01.0: bridge window [mem size 0x200000000 64bit pref]: can't assign; no space
pcieport 0000:5f:01.0: bridge window [mem size 0x200000000 64bit pref]: failed to assign
pci 0000:60:00.0: BAR 2 [mem size 0x200000000 64bit pref]: can't assign; no space
pci 0000:60:00.0: BAR 2 [mem size 0x200000000 64bit pref]: failed to assign
pcieport 0000:5d:00.0: PCI bridge to [bus 5e-61]
pcieport 0000:5d:00.0:   bridge window [mem 0xb9000000-0xba0fffff]
pcieport 0000:5d:00.0:   bridge window [mem 0xbfe0000000-0xbff07fffff 64bit pref]
pcieport 0000:5e:00.0: PCI bridge to [bus 5f-61]
pcieport 0000:5e:00.0:   bridge window [mem 0xb9000000-0xba0fffff]
pcieport 0000:5e:00.0:   bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref]
pcieport 0000:5f:01.0: PCI bridge to [bus 60]
pcieport 0000:5f:01.0:   bridge window [mem 0xb9000000-0xb9ffffff]
pcieport 0000:5f:01.0:   bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref]
pci 0000:60:00.0: BAR 2 [mem 0xbfe0000000-0xbfefffffff 64bit pref]: assigned

In this example we need to resize all the way up to the root port
aperture, but we refuse to change the root port aperture while resources
are allocated for the upstream port BAR.

The solution proposed here builds on the idea in commit 91fa127
("PCI: Expose PCIe Resizable BAR support via sysfs") where the BAR can
be resized while there is no driver attached.  In this case, when there
is no driver bound to the upstream switch port we'll release resources
of the bridge which match the reallocation.  Therefore we can achieve
the below successful resize operation by unbinding 0000:5e:00.0 from the
pcieport driver before invoking the resource2_resize interface on the
GPU at 0000:60:00.0.

pci 0000:60:00.0: BAR 2 [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing
pcieport 0000:5f:01.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing
pci 0000:5e:00.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing
pci 0000:5e:00.0: BAR 0 [mem 0xbff0000000-0xbff07fffff 64bit pref]: releasing
pcieport 0000:5d:00.0: bridge window [mem 0xbfe0000000-0xbff07fffff 64bit pref]: releasing
pcieport 0000:5d:00.0: bridge window [mem 0xb000000000-0xb2ffffffff 64bit pref]: assigned
pci 0000:5e:00.0: bridge window [mem 0xb000000000-0xb1ffffffff 64bit pref]: assigned
pci 0000:5e:00.0: BAR 0 [mem 0xb200000000-0xb2007fffff 64bit pref]: assigned
pcieport 0000:5f:01.0: bridge window [mem 0xb000000000-0xb1ffffffff 64bit pref]: assigned
pci 0000:60:00.0: BAR 2 [mem 0xb000000000-0xb1ffffffff 64bit pref]: assigned
pci 0000:5e:00.0: PCI bridge to [bus 5f-61]
pci 0000:5e:00.0:   bridge window [mem 0xb9000000-0xba0fffff]
pci 0000:5e:00.0:   bridge window [mem 0xb000000000-0xb1ffffffff 64bit pref]
pcieport 0000:5d:00.0: PCI bridge to [bus 5e-61]
pcieport 0000:5d:00.0:   bridge window [mem 0xb9000000-0xba0fffff]
pcieport 0000:5d:00.0:   bridge window [mem 0xb000000000-0xb2ffffffff 64bit pref]
pci 0000:5e:00.0: PCI bridge to [bus 5f-61]
pci 0000:5e:00.0:   bridge window [mem 0xb9000000-0xba0fffff]
pci 0000:5e:00.0:   bridge window [mem 0xb000000000-0xb1ffffffff 64bit pref]
pcieport 0000:5f:01.0: PCI bridge to [bus 60]
pcieport 0000:5f:01.0:   bridge window [mem 0xb9000000-0xb9ffffff]
pcieport 0000:5f:01.0:   bridge window [mem 0xb000000000-0xb1ffffffff 64bit pref]

Link: https://patchwork.kernel.org/project/linux-pci/patch/[email protected]/

Signed-off-by: Alex Williamson <[email protected]>
Signed-off-by: Jonathan Bell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.