Skip to content

[pull] master from torvalds:master #1949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 69 commits into from
Jun 20, 2025
Merged

Conversation

pull[bot]
Copy link

@pull pull bot commented Jun 20, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

James A. MacInnes and others added 30 commits June 9, 2025 01:31
When widebus was enabled for DisplayPort in commit c7c4122
("drm/msm/dp: enable widebus on all relevant chipsets") it was clarified
that it is only supported on DPU 5.0.0 onwards which includes SC7180 on
DPU revision 6.2.  However, this patch missed that the description
structure for SC7180 is also reused for SDM845 (because of identical
io_start address) which is only DPU 4.0.0, leading to a wrongly enbled
widebus feature and corruption on that platform.

Create a separate msm_dp_desc_sdm845 structure for this SoC compatible,
with the wide_bus_supported flag turned off.

Fixes: c7c4122 ("drm/msm/dp: enable widebus on all relevant chipsets")
Signed-off-by: James A. MacInnes <[email protected]>
[DB: reworded commit text following Marijn's suggestion]
Reviewed-by: Abhinav Kumar <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/636944/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
Type-C DisplayPort inoperable due to incorrect porch settings.
- Re-used wide_bus_en as flag to prevent porch shifting

Fixes: c943b49 ("drm/msm/dp: add displayPort driver support")
Signed-off-by: James A. MacInnes <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/636945/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
Driver unconditionally saves current state on first init in
dsi_pll_10nm_init(), but does not save the VCO rate, only some of the
divider registers.  The state is then restored during probe/enable via
msm_dsi_phy_enable() -> msm_dsi_phy_pll_restore_state() ->
dsi_10nm_pll_restore_state().

Restoring calls dsi_pll_10nm_vco_set_rate() with
pll_10nm->vco_current_rate=0, which basically overwrites existing rate of
VCO and messes with clock hierarchy, by setting frequency to 0 to clock
tree.  This makes anyway little sense - VCO rate was not saved, so
should not be restored.

If PLL was not configured configure it to minimum rate to avoid glitches
and configuring entire in clock hierarchy to 0 Hz.

Suggested-by: Dmitry Baryshkov <[email protected]>
Link: https://lore.kernel.org/r/sz4kbwy5nwsebgf64ia7uq4ee7wbsa5uy3xmlqwcstsbntzcov@ew3dcyjdzmi2/
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Fixes: a4ccc37 ("drm/msm/dsi_pll_10nm: restore VCO rate during
Reviewed-by: Dmitry Baryshkov <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/654796/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
In error paths, we could unref the submit without calling
drm_sched_entity_push_job(), so msm_job_free() will never get
called.  Since drm_sched_job_cleanup() will NULL out the
s_fence, we can use that to detect this case.

Signed-off-by: Rob Clark <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/653584/
Signed-off-by: Rob Clark <[email protected]>
put_unused_fd() doesn't free the installed file, if we've already done
fd_install().  So we need to also free the sync_file.

Signed-off-by: Rob Clark <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/653583/
Signed-off-by: Rob Clark <[email protected]>
Now that we use a threaded IRQ, it should be safe to do this in the
fault handler.

We can also remove fault_info from struct msm_gpu and just pass it
directly.

Signed-off-by: Connor Abbott <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/654889/
Signed-off-by: Rob Clark <[email protected]>
Unused since the previous commit.

Signed-off-by: Connor Abbott <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/654890/
Signed-off-by: Rob Clark <[email protected]>
When things go wrong, the GPU is capable of quickly generating millions
of faulting translation requests per second. When that happens, in the
stall-on-fault model each access will stall until it wins the race to
signal the fault and then the RESUME register is written. This slows
processing page faults to a crawl as the GPU can generate faults much
faster than the CPU can acknowledge them. It also means that all
available resources in the SMMU are saturated waiting for the stalled
transactions, so that other transactions such as transactions generated
by the GMU, which shares translation resources with the GPU, cannot
proceed. This causes a GMU watchdog timeout, which leads to a failed
reset because GX cannot collapse when there is a transaction pending and
a permanently hung GPU.

On older platforms with qcom,smmu-v2, it seems that when one transaction
is stalled subsequent faulting transactions are terminated, which avoids
this problem, but the MMU-500 follows the spec here.

To work around these problems, disable stall-on-fault as soon as we get a
page fault until a cooldown period after pagefaults stop. This allows
the GMU some guaranteed time to continue working. We only use
stall-on-fault to halt the GPU while we collect a devcoredump and we
always terminate the transaction afterward, so it's fine to miss some
subsequent page faults. We also keep it disabled so long as the current
devcoredump hasn't been deleted, because in that case we likely won't
capture another one if there's a fault.

After this commit HFI messages still occasionally time out, because the
crashdump handler doesn't run fast enough to let the GMU resume, but the
driver seems to recover from it. This will probably go away after the
HFI timeout is increased.

Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/654891/
Signed-off-by: Rob Clark <[email protected]>
Based on kgsl.

Fixes: af66706 ("drm/msm/a6xx: Add skeleton A7xx support")
Signed-off-by: Connor Abbott <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/654922/
Signed-off-by: Rob Clark <[email protected]>
Calling this packet is necessary when we switch contexts because there
are various pieces of state used by userspace to synchronize between BR
and BV that are persistent across submits and we need to make sure that
they are in a "safe" state when switching contexts. Otherwise a
userspace submission in one context could cause another context to
function incorrectly and hang, effectively a denial of service (although
without leaking data). This was missed during initial a7xx bringup.

Fixes: af66706 ("drm/msm/a6xx: Add skeleton A7xx support")
Signed-off-by: Connor Abbott <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/654924/
Signed-off-by: Rob Clark <[email protected]>
The files generated by gen_header.py capture the source path to the
input files and the date.  While that can be informative, it varies
based on where and when the kernel was built as the full path is
captured.

Since all of the files that this tool is run on is under the drivers
directory, this modifies the application to strip all of the path before
drivers.  Additionally it prints <stripped> instead of the date.

Signed-off-by: Ryan Eatmon <[email protected]>
Signed-off-by: Bruce Ashfield <[email protected]>
Signed-off-by: Viswanath Kraleti <[email protected]>
Acked-by: Dmitry Baryshkov <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/655599/
Signed-off-by: Rob Clark <[email protected]>
To better match add_gpu_components().

Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/657700/
We are going to want to re-use this before the component is bound, when
we don't yet have the device pointer (but we do have the of node).

v2: use %pOF

Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/657705/
If we have a newer dtb than kernel, we could end up in a situation where
the GPU device is present in the dtb, but not in the drivers device
table.  We don't want this to prevent the display from probing.  So
check that we recognize the GPU before adding the GPU component.

v2: use %pOF

Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/657701/
The number of columns relates to the width, not the height.  Use the
correct variable.

Signed-off-by: John Keeping <[email protected]>
Reviewed-by: Javier Martinez Canillas <[email protected]>
Fixes: fdd591e ("drm/ssd130x: Add support for the SSD132x OLED controller family")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Javier Martinez Canillas <[email protected]>
When checking for unsupported expect an error is printed every time.
This spams the log for platforms where this is expected, e.g. ls1028a
having a Vivante (etnaviv) GPU and Mali display processor.

Signed-off-by: Alexander Stein <[email protected]>
Signed-off-by: Liviu Dudau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Fix the compile-time warning

  drivers/gpu/drm/ast/ast_mode.c: warning: EXPORT_SYMBOL() is not used, but #include <linux/export.h> is present

Signed-off-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Jocelyn Falempe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Fix the compile-time warning

  drivers/gpu/drm/mgag200/mgag200_ddc.c: warning: EXPORT_SYMBOL() is not used, but #include <linux/export.h> is present

Signed-off-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Jocelyn Falempe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Commit 698de82 ("crypto: testmgr - make it easier to enable the
full set of tests") removed support for building kernels that run only
the "fast" set of crypto self-tests by default.  This assumed that
nearly everyone actually wanted the full set of tests, *if* they had
already chosen to enable the tests at all.

Unfortunately, it turns out that both Debian and Fedora intentionally
have the crypto self-tests enabled in their production kernels.  And for
production kernels we do need to keep the testing time down, which
implies just running the "fast" tests, not the full set of tests.

For Fedora, a reason for enabling the tests in production is that they
are being (mis)used to meet the FIPS 140-3 pre-operational testing
requirement.

However, the other reason for enabling the tests in production, which
applies to both distros, is that they provide some value in protecting
users from buggy drivers.  Unfortunately, the crypto/ subsystem has many
buggy and untested drivers for off-CPU hardware accelerators on rare
platforms.  These broken drivers get shipped to users, and there have
been multiple examples of the tests preventing these buggy drivers from
being used.  So effectively, the tests are being relied on in production
kernels.  I think this is kind of crazy (untested drivers should just
not be enabled at all), but that seems to be how things work currently.

Thus, reintroduce a kconfig option that controls the level of testing.
Call it CRYPTO_SELFTESTS_FULL instead of the original name
CRYPTO_MANAGER_EXTRA_TESTS, which was slightly misleading.

Moreover, given the "production kernel" use case, make CRYPTO_SELFTESTS
depend on EXPERT instead of DEBUG_KERNEL.

I also haven't reinstated all the #ifdefs in crypto/testmgr.c.  Instead,
just rely on the compiler to optimize out unused code.

Fixes: 40b9969 ("crypto: testmgr - replace CRYPTO_MANAGER_DISABLE_TESTS with CRYPTO_SELFTESTS")
Fixes: 698de82 ("crypto: testmgr - make it easier to enable the full set of tests")
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
The left shift int 32 bit integer constants 1 is evaluated using 32 bit
arithmetic and then assigned to a 64 bit unsigned integer. In the case
where the shift is 32 or more this can lead to an overflow. Avoid this
by shifting using the BIT_ULL macro instead.

Fixes: 6c3ac7b ("drm/nouveau/gsp: support deeper page tables in COPY_SERVER_RESERVED_PDES")
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Danilo Krummrich <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
The RPC container is released after being passed to r535_gsp_rpc_send().

When sending the initial fragment of a large RPC and passing the
caller's RPC container, the container will be freed prematurely. Subsequent
attempts to send remaining fragments will therefore result in a
use-after-free.

Allocate a temporary RPC container for holding the initial fragment of a
large RPC when sending. Free the caller's container when all fragments
are successfully sent.

Fixes: 176fdcb ("drm/nouveau/gsp/r535: add support for booting GSP-RM")
Signed-off-by: Zhi Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Rebase onto Blackwell changes. - Danilo ]
Signed-off-by: Danilo Krummrich <[email protected]>
The nouveau_get_backlight_name() function generates a unique name for the
backlight interface, appending an id from 1 to 99 for all backlight devices
after the first.

GCC 15 (and likely other compilers) produce the following
-Wformat-truncation warning:

nouveau_backlight.c: In function ‘nouveau_backlight_init’:
nouveau_backlight.c:56:69: error: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size 3 [-Werror=format-truncation=]
   56 |                 snprintf(backlight_name, BL_NAME_SIZE, "nv_backlight%d", nb);
      |                                                                     ^~
In function ‘nouveau_get_backlight_name’,
    inlined from ‘nouveau_backlight_init’ at nouveau_backlight.c:351:7:
nouveau_backlight.c:56:56: note: directive argument in the range [1, 2147483647]
   56 |                 snprintf(backlight_name, BL_NAME_SIZE, "nv_backlight%d", nb);
      |                                                        ^~~~~~~~~~~~~~~~
nouveau_backlight.c:56:17: note: ‘snprintf’ output between 14 and 23 bytes into a destination of size 15
   56 |                 snprintf(backlight_name, BL_NAME_SIZE, "nv_backlight%d", nb);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The warning started appearing after commit ab244be ("drm/nouveau:
Fix a potential theorical leak in nouveau_get_backlight_name()") This fix
for the ida usage removed the explicit value check for ids larger than 99.
The compiler is unable to intuit that the ida_alloc_max() limits the
returned value range between 0 and 99.

Because the compiler can no longer infer that the number ranges from 0 to
99, it thinks that it could use as many as 11 digits (10 + the potential -
sign for negative numbers).

The warning has gone unfixed for some time, with at least one kernel test
robot report. The code breaks W=1 builds, which is especially frustrating
with the introduction of CONFIG_WERROR.

The string is stored temporarily on the stack and then copied into the
device name. Its not a big deal to use 11 more bytes of stack rounding out
to an even 24 bytes. Increase BL_NAME_SIZE to 24 to avoid the truncation
warning. This fixes the W=1 builds that include this driver.

Compile tested only.

Fixes: ab244be ("drm/nouveau: Fix a potential theorical leak in nouveau_get_backlight_name()")
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Suggested-by: Timur Tabi <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Link: https://lore.kernel.org/r/20250610-jk-nouveua-drm-bl-snprintf-fix-v2-1-7fdd4b84b48e@intel.com
Signed-off-by: Danilo Krummrich <[email protected]>
GSP message queue docs has been moved following RPC handling split in
commit 8a8b1ec ("drm/nouveau/gsp: split rpc handling out on its
own"), before GSP-RM implementation is versioned in commit c472d82
("drm/nouveau/gsp: move subdev/engine impls to subdev/gsp/rm/r535/").
However, the kernel-doc reference in nouveau docs is left behind, which
triggers htmldocs warnings:

ERROR: Cannot find file ./drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
WARNING: No kernel-doc for file ./drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c

Update the reference.

Fixes: c472d82 ("drm/nouveau/gsp: move subdev/engine impls to subdev/gsp/rm/r535/")
Fixes: 8a8b1ec ("drm/nouveau/gsp: split rpc handling out on its own")
Signed-off-by: Bagas Sanjaya <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Tested-by: Randy Dunlap <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Danilo Krummrich <[email protected]>
We want to WARN_ON() if info is NULL.

Suggested-by: Konrad Dybcio <[email protected]>
Fixes: 0838fc3 ("drm/msm/adreno: Check for recognized GPU before bind")
Tested-by: Neil Armstrong <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Reported-by: Alexey Klimov <[email protected]>
Reviewed-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/658631/
syzbot reports that it can trigger a WARN_ON() for kmalloc() attempt
that's too big:

WARNING: CPU: 0 PID: 6488 at mm/slub.c:5024 __kvmalloc_node_noprof+0x520/0x640 mm/slub.c:5024
Modules linked in:
CPU: 0 UID: 0 PID: 6488 Comm: syz-executor312 Not tainted 6.15.0-rc7-syzkaller-gd7fa1af5b33e #0 PREEMPT
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __kvmalloc_node_noprof+0x520/0x640 mm/slub.c:5024
lr : __do_kmalloc_node mm/slub.c:-1 [inline]
lr : __kvmalloc_node_noprof+0x3b4/0x640 mm/slub.c:5012
sp : ffff80009cfd7a90
x29: ffff80009cfd7ac0 x28: ffff0000dd52a120 x27: 0000000000412dc0
x26: 0000000000000178 x25: ffff7000139faf70 x24: 0000000000000000
x23: ffff800082f4cea8 x22: 00000000ffffffff x21: 000000010cd004a8
x20: ffff0000d75816c0 x19: ffff0000dd52a000 x18: 00000000ffffffff
x17: ffff800092f39000 x16: ffff80008adbe9e4 x15: 0000000000000005
x14: 1ffff000139faf1c x13: 0000000000000000 x12: 0000000000000000
x11: ffff7000139faf21 x10: 0000000000000003 x9 : ffff80008f27b938
x8 : 0000000000000002 x7 : 0000000000000000 x6 : 0000000000000000
x5 : 00000000ffffffff x4 : 0000000000400dc0 x3 : 0000000200000000
x2 : 000000010cd004a8 x1 : ffff80008b3ebc40 x0 : 0000000000000001
Call trace:
 __kvmalloc_node_noprof+0x520/0x640 mm/slub.c:5024 (P)
 kvmalloc_array_node_noprof include/linux/slab.h:1065 [inline]
 io_rsrc_data_alloc io_uring/rsrc.c:206 [inline]
 io_clone_buffers io_uring/rsrc.c:1178 [inline]
 io_register_clone_buffers+0x484/0xa14 io_uring/rsrc.c:1287
 __io_uring_register io_uring/register.c:815 [inline]
 __do_sys_io_uring_register io_uring/register.c:926 [inline]
 __se_sys_io_uring_register io_uring/register.c:903 [inline]
 __arm64_sys_io_uring_register+0x42c/0xea8 io_uring/register.c:903
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
 el0_svc+0x58/0x17c arch/arm64/kernel/entry-common.c:767
 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786
 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600

which is due to offset + buffer_count being too large. The registration
code checks only the total count of buffers, but given that the indexing
is an array, it should also check offset + count. That can't exceed
IORING_MAX_REG_BUFFERS either, as there's no way to reach buffers beyond
that limit.

There's no issue with registrering a table this large, outside of the
fact that it's pointless to register buffers that cannot be reached, and
that it can trigger this kmalloc() warning for attempting an allocation
that is too large.

Cc: [email protected]
Fixes: b16e920 ("io_uring/rsrc: allow cloning at an offset")
Reported-by: [email protected]
Link: https://lore.kernel.org/io-uring/[email protected]/
Signed-off-by: Jens Axboe <[email protected]>
Add missing put_task_struct() in the error path

Cc: [email protected]
Fixes: 0f8baa3 ("io-wq: fully initialize wqe before calling cpuhp_state_add_instance_nocalls()")
Signed-off-by: Penglei Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
i915_pmu.c may fail to build with GCOV and AutoFDO enabled.

../drivers/gpu/drm/i915/i915_pmu.c:116:3: error: call to '__compiletime_assert_487' declared with 'error' attribute: BUILD_BUG_ON failed: bit > BITS_PER_TYPE(typeof_member(struct i915_pmu, enable)) - 1
  116 |                 BUILD_BUG_ON(bit >
      |                 ^

Here is a way to reproduce the issue:
$ git checkout v6.15
$ mkdir build
$ ./scripts/kconfig/merge_config.sh -O build -n -m <(cat <<EOF
CONFIG_DRM=y
CONFIG_PCI=y
CONFIG_DRM_I915=y

CONFIG_PERF_EVENTS=y

CONFIG_DEBUG_FS=y
CONFIG_GCOV_KERNEL=y
CONFIG_GCOV_PROFILE_ALL=y

CONFIG_AUTOFDO_CLANG=y
EOF
)
$ PATH=${PATH}:${HOME}/llvm-20.1.5-x86_64/bin make LLVM=1 O=build \
       olddefconfig
$ PATH=${PATH}:${HOME}/llvm-20.1.5-x86_64/bin make LLVM=1 O=build \
       CLANG_AUTOFDO_PROFILE=...PATH_TO_SOME_AFDO_PROFILE... \
       drivers/gpu/drm/i915/i915_pmu.o

Although not super sure what happened, by reviewing the code, it should
depend on `__builtin_constant_p(bit)` directly instead of assuming
`__builtin_constant_p(config)` makes `bit` a builtin constant.

Also fix a nit, to reuse the `bit` local variable.

Fixes: a644fde ("drm/i915/pmu: Change bitmask of enabled events to u32")
Reviewed-by: Nathan Chancellor <[email protected]>
Signed-off-by: Tzung-Bi Shih <[email protected]>
Signed-off-by: Tvrtko Ursulin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit 686d773)
Signed-off-by: Joonas Lahtinen <[email protected]>
BXT_MIPI_TRANS_VTOTAL must be programmed with vtotal-1
instead of vtotal. Make it so.

Cc: [email protected]
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Jani Nikula <[email protected]>
(cherry picked from commit 7b3685c)
Signed-off-by: Joonas Lahtinen <[email protected]>
The following kernel Oops was recently reported by Mesa CI:

[  800.139824] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000588
[  800.148619] Mem abort info:
[  800.151402]   ESR = 0x0000000096000005
[  800.155141]   EC = 0x25: DABT (current EL), IL = 32 bits
[  800.160444]   SET = 0, FnV = 0
[  800.163488]   EA = 0, S1PTW = 0
[  800.166619]   FSC = 0x05: level 1 translation fault
[  800.171487] Data abort info:
[  800.174357]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[  800.179832]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  800.184873]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  800.190176] user pgtable: 4k pages, 39-bit VAs, pgdp=00000001014c2000
[  800.196607] [0000000000000588] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[  800.205305] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[  800.211564] Modules linked in: vc4 snd_soc_hdmi_codec drm_display_helper v3d cec gpu_sched drm_dma_helper drm_shmem_helper drm_kms_helper drm drm_panel_orientation_quirks snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm i2c_brcmstb snd_timer snd backlight
[  800.234448] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.25+rpt-rpi-v8 #1  Debian 1:6.12.25-1+rpt1
[  800.244182] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[  800.250005] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  800.256959] pc : v3d_job_update_stats+0x60/0x130 [v3d]
[  800.262112] lr : v3d_job_update_stats+0x48/0x130 [v3d]
[  800.267251] sp : ffffffc080003e60
[  800.270555] x29: ffffffc080003e60 x28: ffffffd842784980 x27: 0224012000000000
[  800.277687] x26: ffffffd84277f630 x25: ffffff81012fd800 x24: 0000000000000020
[  800.284818] x23: ffffff8040238b08 x22: 0000000000000570 x21: 0000000000000158
[  800.291948] x20: 0000000000000000 x19: ffffff8040238000 x18: 0000000000000000
[  800.299078] x17: ffffffa8c1bd2000 x16: ffffffc080000000 x15: 0000000000000000
[  800.306208] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[  800.313338] x11: 0000000000000040 x10: 0000000000001a40 x9 : ffffffd83b39757c
[  800.320468] x8 : ffffffd842786420 x7 : 7fffffffffffffff x6 : 0000000000ef32b0
[  800.327598] x5 : 00ffffffffffffff x4 : 0000000000000015 x3 : ffffffd842784980
[  800.334728] x2 : 0000000000000004 x1 : 0000000000010002 x0 : 000000ba4c0ca382
[  800.341859] Call trace:
[  800.344294]  v3d_job_update_stats+0x60/0x130 [v3d]
[  800.349086]  v3d_irq+0x124/0x2e0 [v3d]
[  800.352835]  __handle_irq_event_percpu+0x58/0x218
[  800.357539]  handle_irq_event+0x54/0xb8
[  800.361369]  handle_fasteoi_irq+0xac/0x240
[  800.365458]  handle_irq_desc+0x48/0x68
[  800.369200]  generic_handle_domain_irq+0x24/0x38
[  800.373810]  gic_handle_irq+0x48/0xd8
[  800.377464]  call_on_irq_stack+0x24/0x58
[  800.381379]  do_interrupt_handler+0x88/0x98
[  800.385554]  el1_interrupt+0x34/0x68
[  800.389123]  el1h_64_irq_handler+0x18/0x28
[  800.393211]  el1h_64_irq+0x64/0x68
[  800.396603]  default_idle_call+0x3c/0x168
[  800.400606]  do_idle+0x1fc/0x230
[  800.403827]  cpu_startup_entry+0x40/0x50
[  800.407742]  rest_init+0xe4/0xf0
[  800.410962]  start_kernel+0x5e8/0x790
[  800.414616]  __primary_switched+0x80/0x90
[  800.418622] Code: 8b170277 8b160296 11000421 b9000861 (b9401ac1)
[  800.424707] ---[ end trace 0000000000000000 ]---
[  800.457313] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

This issue happens when the file descriptor is closed before the jobs
submitted by it are completed. When the job completes, we update the
global GPU stats and the per-fd GPU stats, which are exposed through
fdinfo. If the file descriptor was closed, then the struct `v3d_file_priv`
and its stats were already freed and we can't update the per-fd stats.

Therefore, if the file descriptor was already closed, don't update the
per-fd GPU stats, only update the global ones.

Cc: [email protected] # v6.12+
Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Maíra Canal <[email protected]>
Commit 704d3d6 ("drm/etnaviv: don't block scheduler when GPU is still
active") ensured that active jobs are returned to the pending list when
extending the timeout. However, it didn't use the pending list's lock to
manipulate the list, which causes a race condition as the scheduler's
workqueues are running.

Hold the lock while manipulating the scheduler's pending list to prevent
a race.

Cc: [email protected]
Fixes: 704d3d6 ("drm/etnaviv: don't block scheduler when GPU is still active")
Reported-by: Philipp Stanner <[email protected]>
Closes: https://lore.kernel.org/dri-devel/[email protected]/
Reviewed-by: Lucas Stach <[email protected]>
Reviewed-by: Philipp Stanner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Maíra Canal <[email protected]>
Alex Hung and others added 28 commits June 18, 2025 13:05
[WHAT]
Severe video playback corruption is observed in the following setup:

weston 14.0.90 (built from source) + mpv v0.40.0 with command:
mpv bbb_sunflower_1080p_60fps_normal.mp4 --vo=gpu

[HOW]
ABGR16161616 needs to be included in dml2/2.1 translation.

Cc: Mario Limonciello <[email protected]>
Cc: Alex Deucher <[email protected]>
Acked-by: Aurabindo Pillai <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Reviewed-by: Austin Zheng <[email protected]>
Signed-off-by: Alex Hung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit d023de8)
Cc: [email protected]
[WHY & HOW]
Fix RMCM programming sequence errors and mapping issues to pass the RMCM
test.

Cc: Mario Limonciello <[email protected]>
Cc: Alex Deucher <[email protected]>
Reviewed-by: Dmytro Laktyushkin <[email protected]>
Signed-off-by: Yihan Zhu <[email protected]>
Signed-off-by: Alex Hung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 11baa49)
Cc: [email protected]
[WHY]
Backlight caps are read already in amdgpu_dm_update_backlight_caps().
They may be updated by update_connector_ext_caps(). Reading again when
registering backlight device may cause wrong values to be used.

[HOW]
Use backlight caps already registered to the dm.

Cc: Mario Limonciello <[email protected]>
Cc: Alex Deucher <[email protected]>
Reviewed-by: Roman Li <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Hung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 148144f)
Cc: [email protected]
[WHY]
Userspace currently is offered a range from 0-0xFF but the PWM is
programmed from 0-0xFFFF.  This can be limiting to some software
that wants to apply greater granularity.

[HOW]
Convert internally to firmware values only when mapping custom
brightness curves because these are in 0-0xFF range. Advertise full
PWM range to userspace.

Cc: Mario Limonciello <[email protected]>
Cc: Alex Deucher <[email protected]>
Reviewed-by: Roman Li <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Hung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 8dbd72c)
Cc: [email protected]
1. add kicker device list
2. add kicker device checking helper function

Signed-off-by: Frank Min <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 09aa2b4)
Cc: [email protected]
1. Add kicker firmwares loading for gfx11/smu13/psp13
2. Register additional MODULE_FIRMWARE entries for kicker fws
   - gc_11_0_0_rlc_kicker.bin
   - gc_11_0_0_imu_kicker.bin
   - psp_13_0_0_sos_kicker.bin
   - psp_13_0_0_ta_kicker.bin
   - smu_13_0_0_kicker.bin

Signed-off-by: Frank Min <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit fb5ec21)
Cc: [email protected]
This commit makes the following improvements to SDMA engine reset handling:

1. Clarifies in the function documentation that instance_id refers to a logical ID
2. Adds conversion from logical to physical instance ID before performing reset
   using GET_INST(SDMA0, instance_id) macro
3. Improves error messaging to indicate when a logical instance reset fails
4. Adds better code organization with blank lines for readability

The change ensures proper SDMA engine reset by using the correct physical
instance ID while maintaining the logical ID interface for callers.

V2: Remove harvest_config check and convert directly to physical instance (Lijo)

Suggested-by: Jonathan Kim <[email protected]>
Signed-off-by: Jesse Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 5efa621)
Cc: [email protected]
Simplify SDMA v4_4_2 queue reset and stop operations by:
1. Removing GET_INST(SDMA0) conversion for ring->me
2. Using the logical instance ID (ring->me) directly
3. Maintaining consistent behavior with other SDMA queue operations

This change aligns with the existing queue handling logic where
ring->me already represents the correct instance identifier.

Signed-off-by:  Lijo Lazar <[email protected]>
Signed-off-by: Jesse Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 3bab282)
Cc: [email protected]
Add a protection to ensure programming are all complete prior VCPU
starting. This is a WA for an unintended VCPU running.

Signed-off-by: Sonny Jiang <[email protected]>
Acked-by: Leo Liu <[email protected]>
Reviewed-by: Ruijing Dong <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit c29521b)
Cc: [email protected]
[WHAT]

hws was checked for null earlier in dce110_blank_stream, indicating hws
can be null, and should be checked whenever it is used.

Cc: Mario Limonciello <[email protected]>
Cc: Alex Deucher <[email protected]>
Reviewed-by: Aurabindo Pillai <[email protected]>
Signed-off-by: Alex Hung <[email protected]>
Signed-off-by: Aurabindo Pillai <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 79db436)
Cc: [email protected]
Make sure to release reset domain lock in case of failures.

Signed-off-by: Lijo Lazar <[email protected]>
Signed-off-by: Ce Sun <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Fixes: 11bb337 ("drm/amdgpu: refactor amdgpu_device_gpu_recover")
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 1ab11a8)
This commit makes two key fixes to SDMA v4.4.2 handling:

1. disable UTC_L1 in sdma_cntl register when stopping SDMA engines
   by reading the current value before modifying UTC_L1_ENABLE bit.

2. Ensure UTC_L1_ENABLE is consistently managed by:
   - Adding the missing register write when enabling UTC_L1 during start
   - Keeping UTC_L1 enabled by default as per hardware requirements

v2: Correct SDMA_CNTL setting (Philip)

Suggested-by: Jonathan Kim <[email protected]>
Signed-off-by: Jesse Zhang <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 375bf56)
Cc: [email protected]
Use the amdgpu fence container so we can store additional
data in the fence.  This also fixes the start_time handling
for MCBP since we were casting the fence to an amdgpu_fence
and it wasn't.

Fixes: 3f4c175 ("drm/amdgpu: MCBP based on DRM scheduler (v9)")
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit bf1cd14)
Cc: [email protected]
Missing the mutex init.

Fixes: e56d4bf ("drm/amdgpu/: drm/amdgpu: Register the new sdma function pointers for sdma_v5_0")
Reviewed-by: Jesse Zhang <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 3f4caf0)
q->gws is not updated atomically with qpd->mapped_gws_queue. If a
runlist is created between pqm_set_gws and update_queue it will
contain a queue which uses GWS in a process with no GWS allocated.
This will result in a scheduler hang.

Use q->properties.is_gws which is changed while holding the DQM lock.

Signed-off-by: Jay Cornwall <[email protected]>
Reviewed-by: Harish Kasiviswanathan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit b983702)
Cc: [email protected]
Missing the mutex init.

Fixes: 47454f2 ("drm/amdgpu: Register the new sdma function pointers for sdma_v5_2")
Reviewed-by: Jesse Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit ea685ff)
…org/drm/i915/kernel into drm-fixes

- Fix MIPI vtotal programming off by one on Broxton
- Fix PMU code for GCOV and AutoFDO enabled build

Signed-off-by: Dave Airlie <[email protected]>
From: Joonas Lahtinen <[email protected]>
Link: https://lore.kernel.org/r/aFJfykDpUwtmpilE@jlahtine-mobl
…op.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.16-2025-06-18:

amdgpu:
- DP tunneling fix
- LTTPR fix
- DSC fix
- DML2.x ABGR16161616 fix
- RMCM fix
- Backlight fixes
- GFX11 kicker support
- SDMA reset fixes
- VCN 5.0.1 fix
- Reset fix
- Misc small fixes

amdkfd:
- SDMA reset fix
- Fix race in GWS scheduling

Signed-off-by: Dave Airlie <[email protected]>

From: Alex Deucher <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Sanity check the values for queue depth and number of queues
we get from userspace when adding a device.

Signed-off-by: Ronnie Sahlberg <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Fixes: 71f28f3 ("ublk_drv: add io_uring based userspace block driver")
Fixes: 62fe99c ("ublk: add read()/write() support for ublk char device")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
This allows for additional L2 caching modes.

Fixes: 01570b4 ("drm/xe/bmg: implement Wa_16023588340")
Cc: Matthew Auld <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Signed-off-by: Vinay Belgaumkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
(cherry picked from commit 6ab42fa)
Signed-off-by: Thomas Hellström <[email protected]>
It should rather use xe_map_memset() as the BO is created with
XE_BO_FLAG_VRAM_IF_DGFX in xe_guc_pc_init().

Fixes: dd08ebf ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: [email protected]
Reviewed-by: Matthew Brost <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
(cherry picked from commit 21cf47d)
Signed-off-by: Thomas Hellström <[email protected]>
When the GuC fails to load we declare the device wedged. However, the
very first GuC load attempt on GT0 (from xe_gt_init_hwconfig) is done
before the GT1 GuC objects are initialized, so things go bad when the
wedge code attempts to cleanup GT1. To fix this, check the initialization
status in the functions called during wedge.

Fixes: 7dbe8af ("drm/xe: Wedge the entire device")
Signed-off-by: Daniele Ceraolo Spurio <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Jonathan Cavitt <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Zhanjun Dong <[email protected]>
Cc: [email protected] # v6.12+: 1e1981b: drm/xe: Fix taking invalid lock on wedge
Cc: [email protected] # v6.12+
Reviewed-by: Jonathan Cavitt <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
(cherry picked from commit 0b93b7d)
Signed-off-by: Thomas Hellström <[email protected]>
…rg/drm/misc/kernel into drm-fixes

drm-misc-fixes for v6.16-rc3:
- vivante scheduler fix.
- v3d null pointer crash fix.
- fix backlight, booting GSP-RM, and potential integer shift overflow in nouveau.
- fix compiler warnings about unused linux/export.h
- fix malidp unknown modifier spam.
- fix for ssd130x.

Signed-off-by: Dave Airlie <[email protected]>
From: Maarten Lankhorst <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
…/drm/xe/kernel into drm-fixes

Driver Changes:
- A workaround update (Vinay)
- Fix memset on iomem (Lucas)
- Fix early wedge on GuC Load failure (Daniele)

Signed-off-by: Dave Airlie <[email protected]>

From: Thomas Hellstrom <[email protected]>
Link: https://lore.kernel.org/r/aFQ03kNzhbiNK7gW@fedora
…/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "This fixes a regression in ahash (broken fallback finup) and
  reinstates a Kconfig option to control the extra self-tests"

* tag 'v6.16-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: ahash - Fix infinite recursion in ahash_def_finup
  crypto: testmgr - reinstate kconfig control over full self-tests
…m/kernel

Pull drm fixes from Dave Airlie:
 "Bit of an uptick in fixes for rc3, msm and amdgpu leading the way,
  with i915/xe/nouveau with a few each and then some scattered misc
  bits, nothing looks too crazy:

  msm:
   - Display:
      - Fixed DP output on SDM845
      - Fixed 10nm DSI PLL init
   - GPU:
      - SUBMIT ioctl error path leak fixes
      - drm half of stall-on-fault fixes
      - a7xx: Missing CP_RESET_CONTEXT_STATE
      - Skip GPU component bind if GPU is not in the device table

  i915:
   - Fix MIPI vtotal programming off by one on Broxton
   - Fix PMU code for GCOV and AutoFDO enabled build

  xe:
   - A workaround update
   - Fix memset on iomem
   - Fix early wedge on GuC Load failure

  amdgpu:
   - DP tunneling fix
   - LTTPR fix
   - DSC fix
   - DML2.x ABGR16161616 fix
   - RMCM fix
   - Backlight fixes
   - GFX11 kicker support
   - SDMA reset fixes
   - VCN 5.0.1 fix
   - Reset fix
   - Misc small fixes

  amdkfd:
   - SDMA reset fix
   - Fix race in GWS scheduling

  nouveau:
   - update docs reference
   - fix backlight name buffer size
   - fix UAF in r535 gsp rpc msg
   - fix undefined shift

  mgag200:
   - drop export header

  ast:
   - drop export header

  malidp:
   - drop informational error

  ssd130x:
   - fix clear columns

  etnaviv:
   - scheduler locking fix

  v3d:
   - null pointer crash fix"

* tag 'drm-fixes-2025-06-20' of https://gitlab.freedesktop.org/drm/kernel: (50 commits)
  drm/xe: Fix early wedge on GuC load failure
  drm/xe: Fix memset on iomem
  drm/xe/bmg: Update Wa_16023588340
  drm/amdgpu/sdma5.2: init engine reset mutex
  drm/amdkfd: Fix race in GWS queue scheduling
  drm/amdgpu/sdma5: init engine reset mutex
  drm/amdgpu: switch job hw_fence to amdgpu_fence
  drm/amdgpu: Fix SDMA UTC_L1 handling during start/stop sequences
  drm/amdgpu: Release reset locks during failures
  drm/amd/display: Check dce_hwseq before dereferencing it
  drm/amdgpu: VCN v5_0_1 to prevent FW checking RB during DPG pause
  drm/amdgpu: Use logical instance ID for SDMA v4_4_2 queue operations
  drm/amdgpu: Fix SDMA engine reset with logical instance ID
  drm/amdgpu: add kicker fws loading for gfx11/smu13/psp13
  drm/amdgpu: Add kicker device detection
  drm/amd/display: Export full brightness range to userspace
  drm/amd/display: Only read ACPI backlight caps once
  drm/amd/display: Fix RMCM programming seq errors
  drm/amd/display: Fix mpv playback corruption on weston
  drm/amd/display: Add more checks for DSC / HUBP ONO guarantees
  ...
Pull io_uring fixes from Jens Axboe:

 - Two fixes for error injection failures. One fixes a task leak issue
   introduced in this merge window, the other an older issue with
   handling allocation of a mapped buffer.

 - Fix for a syzbot issue that triggers a kmalloc warning on attempting
   an allocation that's too large

 - Fix for an error injection failure causing a double put of a task,
   introduced in this merge window

* tag 'io_uring-6.16-20250619' of git://git.kernel.dk/linux:
  io_uring: fix potential page leak in io_sqe_buffer_register()
  io_uring/sqpoll: don't put task_struct on tctx setup failure
  io_uring: remove duplicate io_uring_alloc_task_context() definition
  io_uring: fix task leak issue in io_wq_create()
  io_uring/rsrc: validate buffer count with offset for cloning
Pull block fixes from Jens Axboe:

 - Two fixes for aoe which fixes issues dating back to when this driver
   was converted to blk-mq

 - Fix for ublk, checking for valid queue depth and count values before
   setting up a device

* tag 'block-6.16-20250619' of git://git.kernel.dk/linux:
  ublk: santizize the arguments from userspace when adding a device
  aoe: defer rexmit timer downdev work to workqueue
  aoe: clean device rq_list in aoedev_downdev()
@pull pull bot added the ⤵️ pull label Jun 20, 2025
@pull pull bot merged commit 75f5f23 into AwesomeGitHubRepos:master Jun 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.