forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 4
[pull] master from torvalds:master #1949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When widebus was enabled for DisplayPort in commit c7c4122 ("drm/msm/dp: enable widebus on all relevant chipsets") it was clarified that it is only supported on DPU 5.0.0 onwards which includes SC7180 on DPU revision 6.2. However, this patch missed that the description structure for SC7180 is also reused for SDM845 (because of identical io_start address) which is only DPU 4.0.0, leading to a wrongly enbled widebus feature and corruption on that platform. Create a separate msm_dp_desc_sdm845 structure for this SoC compatible, with the wide_bus_supported flag turned off. Fixes: c7c4122 ("drm/msm/dp: enable widebus on all relevant chipsets") Signed-off-by: James A. MacInnes <[email protected]> [DB: reworded commit text following Marijn's suggestion] Reviewed-by: Abhinav Kumar <[email protected]> Reviewed-by: Dmitry Baryshkov <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/636944/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dmitry Baryshkov <[email protected]>
Type-C DisplayPort inoperable due to incorrect porch settings. - Re-used wide_bus_en as flag to prevent porch shifting Fixes: c943b49 ("drm/msm/dp: add displayPort driver support") Signed-off-by: James A. MacInnes <[email protected]> Reviewed-by: Dmitry Baryshkov <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/636945/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dmitry Baryshkov <[email protected]>
Driver unconditionally saves current state on first init in dsi_pll_10nm_init(), but does not save the VCO rate, only some of the divider registers. The state is then restored during probe/enable via msm_dsi_phy_enable() -> msm_dsi_phy_pll_restore_state() -> dsi_10nm_pll_restore_state(). Restoring calls dsi_pll_10nm_vco_set_rate() with pll_10nm->vco_current_rate=0, which basically overwrites existing rate of VCO and messes with clock hierarchy, by setting frequency to 0 to clock tree. This makes anyway little sense - VCO rate was not saved, so should not be restored. If PLL was not configured configure it to minimum rate to avoid glitches and configuring entire in clock hierarchy to 0 Hz. Suggested-by: Dmitry Baryshkov <[email protected]> Link: https://lore.kernel.org/r/sz4kbwy5nwsebgf64ia7uq4ee7wbsa5uy3xmlqwcstsbntzcov@ew3dcyjdzmi2/ Signed-off-by: Krzysztof Kozlowski <[email protected]> Fixes: a4ccc37 ("drm/msm/dsi_pll_10nm: restore VCO rate during Reviewed-by: Dmitry Baryshkov <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/654796/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dmitry Baryshkov <[email protected]>
In error paths, we could unref the submit without calling drm_sched_entity_push_job(), so msm_job_free() will never get called. Since drm_sched_job_cleanup() will NULL out the s_fence, we can use that to detect this case. Signed-off-by: Rob Clark <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/653584/ Signed-off-by: Rob Clark <[email protected]>
put_unused_fd() doesn't free the installed file, if we've already done fd_install(). So we need to also free the sync_file. Signed-off-by: Rob Clark <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/653583/ Signed-off-by: Rob Clark <[email protected]>
Now that we use a threaded IRQ, it should be safe to do this in the fault handler. We can also remove fault_info from struct msm_gpu and just pass it directly. Signed-off-by: Connor Abbott <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/654889/ Signed-off-by: Rob Clark <[email protected]>
Unused since the previous commit. Signed-off-by: Connor Abbott <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/654890/ Signed-off-by: Rob Clark <[email protected]>
When things go wrong, the GPU is capable of quickly generating millions of faulting translation requests per second. When that happens, in the stall-on-fault model each access will stall until it wins the race to signal the fault and then the RESUME register is written. This slows processing page faults to a crawl as the GPU can generate faults much faster than the CPU can acknowledge them. It also means that all available resources in the SMMU are saturated waiting for the stalled transactions, so that other transactions such as transactions generated by the GMU, which shares translation resources with the GPU, cannot proceed. This causes a GMU watchdog timeout, which leads to a failed reset because GX cannot collapse when there is a transaction pending and a permanently hung GPU. On older platforms with qcom,smmu-v2, it seems that when one transaction is stalled subsequent faulting transactions are terminated, which avoids this problem, but the MMU-500 follows the spec here. To work around these problems, disable stall-on-fault as soon as we get a page fault until a cooldown period after pagefaults stop. This allows the GMU some guaranteed time to continue working. We only use stall-on-fault to halt the GPU while we collect a devcoredump and we always terminate the transaction afterward, so it's fine to miss some subsequent page faults. We also keep it disabled so long as the current devcoredump hasn't been deleted, because in that case we likely won't capture another one if there's a fault. After this commit HFI messages still occasionally time out, because the crashdump handler doesn't run fast enough to let the GMU resume, but the driver seems to recover from it. This will probably go away after the HFI timeout is increased. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Rob Clark <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/654891/ Signed-off-by: Rob Clark <[email protected]>
Based on kgsl. Fixes: af66706 ("drm/msm/a6xx: Add skeleton A7xx support") Signed-off-by: Connor Abbott <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/654922/ Signed-off-by: Rob Clark <[email protected]>
Calling this packet is necessary when we switch contexts because there are various pieces of state used by userspace to synchronize between BR and BV that are persistent across submits and we need to make sure that they are in a "safe" state when switching contexts. Otherwise a userspace submission in one context could cause another context to function incorrectly and hang, effectively a denial of service (although without leaking data). This was missed during initial a7xx bringup. Fixes: af66706 ("drm/msm/a6xx: Add skeleton A7xx support") Signed-off-by: Connor Abbott <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/654924/ Signed-off-by: Rob Clark <[email protected]>
The files generated by gen_header.py capture the source path to the input files and the date. While that can be informative, it varies based on where and when the kernel was built as the full path is captured. Since all of the files that this tool is run on is under the drivers directory, this modifies the application to strip all of the path before drivers. Additionally it prints <stripped> instead of the date. Signed-off-by: Ryan Eatmon <[email protected]> Signed-off-by: Bruce Ashfield <[email protected]> Signed-off-by: Viswanath Kraleti <[email protected]> Acked-by: Dmitry Baryshkov <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/655599/ Signed-off-by: Rob Clark <[email protected]>
To better match add_gpu_components(). Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Dmitry Baryshkov <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/657700/
We are going to want to re-use this before the component is bound, when we don't yet have the device pointer (but we do have the of node). v2: use %pOF Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Dmitry Baryshkov <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/657705/
If we have a newer dtb than kernel, we could end up in a situation where the GPU device is present in the dtb, but not in the drivers device table. We don't want this to prevent the display from probing. So check that we recognize the GPU before adding the GPU component. v2: use %pOF Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Dmitry Baryshkov <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/657701/
The number of columns relates to the width, not the height. Use the correct variable. Signed-off-by: John Keeping <[email protected]> Reviewed-by: Javier Martinez Canillas <[email protected]> Fixes: fdd591e ("drm/ssd130x: Add support for the SSD132x OLED controller family") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Javier Martinez Canillas <[email protected]>
When checking for unsupported expect an error is printed every time. This spams the log for platforms where this is expected, e.g. ls1028a having a Vivante (etnaviv) GPU and Mali display processor. Signed-off-by: Alexander Stein <[email protected]> Signed-off-by: Liviu Dudau <[email protected]> Link: https://lore.kernel.org/r/[email protected]
Fix the compile-time warning drivers/gpu/drm/ast/ast_mode.c: warning: EXPORT_SYMBOL() is not used, but #include <linux/export.h> is present Signed-off-by: Thomas Zimmermann <[email protected]> Reviewed-by: Jocelyn Falempe <[email protected]> Link: https://lore.kernel.org/r/[email protected]
Fix the compile-time warning drivers/gpu/drm/mgag200/mgag200_ddc.c: warning: EXPORT_SYMBOL() is not used, but #include <linux/export.h> is present Signed-off-by: Thomas Zimmermann <[email protected]> Reviewed-by: Jocelyn Falempe <[email protected]> Link: https://lore.kernel.org/r/[email protected]
Commit 698de82 ("crypto: testmgr - make it easier to enable the full set of tests") removed support for building kernels that run only the "fast" set of crypto self-tests by default. This assumed that nearly everyone actually wanted the full set of tests, *if* they had already chosen to enable the tests at all. Unfortunately, it turns out that both Debian and Fedora intentionally have the crypto self-tests enabled in their production kernels. And for production kernels we do need to keep the testing time down, which implies just running the "fast" tests, not the full set of tests. For Fedora, a reason for enabling the tests in production is that they are being (mis)used to meet the FIPS 140-3 pre-operational testing requirement. However, the other reason for enabling the tests in production, which applies to both distros, is that they provide some value in protecting users from buggy drivers. Unfortunately, the crypto/ subsystem has many buggy and untested drivers for off-CPU hardware accelerators on rare platforms. These broken drivers get shipped to users, and there have been multiple examples of the tests preventing these buggy drivers from being used. So effectively, the tests are being relied on in production kernels. I think this is kind of crazy (untested drivers should just not be enabled at all), but that seems to be how things work currently. Thus, reintroduce a kconfig option that controls the level of testing. Call it CRYPTO_SELFTESTS_FULL instead of the original name CRYPTO_MANAGER_EXTRA_TESTS, which was slightly misleading. Moreover, given the "production kernel" use case, make CRYPTO_SELFTESTS depend on EXPERT instead of DEBUG_KERNEL. I also haven't reinstated all the #ifdefs in crypto/testmgr.c. Instead, just rely on the compiler to optimize out unused code. Fixes: 40b9969 ("crypto: testmgr - replace CRYPTO_MANAGER_DISABLE_TESTS with CRYPTO_SELFTESTS") Fixes: 698de82 ("crypto: testmgr - make it easier to enable the full set of tests") Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
The left shift int 32 bit integer constants 1 is evaluated using 32 bit arithmetic and then assigned to a 64 bit unsigned integer. In the case where the shift is 32 or more this can lead to an overflow. Avoid this by shifting using the BIT_ULL macro instead. Fixes: 6c3ac7b ("drm/nouveau/gsp: support deeper page tables in COPY_SERVER_RESERVED_PDES") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]> Link: https://lore.kernel.org/r/[email protected]
The RPC container is released after being passed to r535_gsp_rpc_send(). When sending the initial fragment of a large RPC and passing the caller's RPC container, the container will be freed prematurely. Subsequent attempts to send remaining fragments will therefore result in a use-after-free. Allocate a temporary RPC container for holding the initial fragment of a large RPC when sending. Free the caller's container when all fragments are successfully sent. Fixes: 176fdcb ("drm/nouveau/gsp/r535: add support for booting GSP-RM") Signed-off-by: Zhi Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Rebase onto Blackwell changes. - Danilo ] Signed-off-by: Danilo Krummrich <[email protected]>
The nouveau_get_backlight_name() function generates a unique name for the backlight interface, appending an id from 1 to 99 for all backlight devices after the first. GCC 15 (and likely other compilers) produce the following -Wformat-truncation warning: nouveau_backlight.c: In function ‘nouveau_backlight_init’: nouveau_backlight.c:56:69: error: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size 3 [-Werror=format-truncation=] 56 | snprintf(backlight_name, BL_NAME_SIZE, "nv_backlight%d", nb); | ^~ In function ‘nouveau_get_backlight_name’, inlined from ‘nouveau_backlight_init’ at nouveau_backlight.c:351:7: nouveau_backlight.c:56:56: note: directive argument in the range [1, 2147483647] 56 | snprintf(backlight_name, BL_NAME_SIZE, "nv_backlight%d", nb); | ^~~~~~~~~~~~~~~~ nouveau_backlight.c:56:17: note: ‘snprintf’ output between 14 and 23 bytes into a destination of size 15 56 | snprintf(backlight_name, BL_NAME_SIZE, "nv_backlight%d", nb); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The warning started appearing after commit ab244be ("drm/nouveau: Fix a potential theorical leak in nouveau_get_backlight_name()") This fix for the ida usage removed the explicit value check for ids larger than 99. The compiler is unable to intuit that the ida_alloc_max() limits the returned value range between 0 and 99. Because the compiler can no longer infer that the number ranges from 0 to 99, it thinks that it could use as many as 11 digits (10 + the potential - sign for negative numbers). The warning has gone unfixed for some time, with at least one kernel test robot report. The code breaks W=1 builds, which is especially frustrating with the introduction of CONFIG_WERROR. The string is stored temporarily on the stack and then copied into the device name. Its not a big deal to use 11 more bytes of stack rounding out to an even 24 bytes. Increase BL_NAME_SIZE to 24 to avoid the truncation warning. This fixes the W=1 builds that include this driver. Compile tested only. Fixes: ab244be ("drm/nouveau: Fix a potential theorical leak in nouveau_get_backlight_name()") Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Suggested-by: Timur Tabi <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/20250610-jk-nouveua-drm-bl-snprintf-fix-v2-1-7fdd4b84b48e@intel.com Signed-off-by: Danilo Krummrich <[email protected]>
GSP message queue docs has been moved following RPC handling split in commit 8a8b1ec ("drm/nouveau/gsp: split rpc handling out on its own"), before GSP-RM implementation is versioned in commit c472d82 ("drm/nouveau/gsp: move subdev/engine impls to subdev/gsp/rm/r535/"). However, the kernel-doc reference in nouveau docs is left behind, which triggers htmldocs warnings: ERROR: Cannot find file ./drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c WARNING: No kernel-doc for file ./drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c Update the reference. Fixes: c472d82 ("drm/nouveau/gsp: move subdev/engine impls to subdev/gsp/rm/r535/") Fixes: 8a8b1ec ("drm/nouveau/gsp: split rpc handling out on its own") Signed-off-by: Bagas Sanjaya <[email protected]> Acked-by: Randy Dunlap <[email protected]> Tested-by: Randy Dunlap <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Danilo Krummrich <[email protected]>
We want to WARN_ON() if info is NULL. Suggested-by: Konrad Dybcio <[email protected]> Fixes: 0838fc3 ("drm/msm/adreno: Check for recognized GPU before bind") Tested-by: Neil Armstrong <[email protected]> Signed-off-by: Rob Clark <[email protected]> Reported-by: Alexey Klimov <[email protected]> Reviewed-by: Konrad Dybcio <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/658631/
syzbot reports that it can trigger a WARN_ON() for kmalloc() attempt that's too big: WARNING: CPU: 0 PID: 6488 at mm/slub.c:5024 __kvmalloc_node_noprof+0x520/0x640 mm/slub.c:5024 Modules linked in: CPU: 0 UID: 0 PID: 6488 Comm: syz-executor312 Not tainted 6.15.0-rc7-syzkaller-gd7fa1af5b33e #0 PREEMPT Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __kvmalloc_node_noprof+0x520/0x640 mm/slub.c:5024 lr : __do_kmalloc_node mm/slub.c:-1 [inline] lr : __kvmalloc_node_noprof+0x3b4/0x640 mm/slub.c:5012 sp : ffff80009cfd7a90 x29: ffff80009cfd7ac0 x28: ffff0000dd52a120 x27: 0000000000412dc0 x26: 0000000000000178 x25: ffff7000139faf70 x24: 0000000000000000 x23: ffff800082f4cea8 x22: 00000000ffffffff x21: 000000010cd004a8 x20: ffff0000d75816c0 x19: ffff0000dd52a000 x18: 00000000ffffffff x17: ffff800092f39000 x16: ffff80008adbe9e4 x15: 0000000000000005 x14: 1ffff000139faf1c x13: 0000000000000000 x12: 0000000000000000 x11: ffff7000139faf21 x10: 0000000000000003 x9 : ffff80008f27b938 x8 : 0000000000000002 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 00000000ffffffff x4 : 0000000000400dc0 x3 : 0000000200000000 x2 : 000000010cd004a8 x1 : ffff80008b3ebc40 x0 : 0000000000000001 Call trace: __kvmalloc_node_noprof+0x520/0x640 mm/slub.c:5024 (P) kvmalloc_array_node_noprof include/linux/slab.h:1065 [inline] io_rsrc_data_alloc io_uring/rsrc.c:206 [inline] io_clone_buffers io_uring/rsrc.c:1178 [inline] io_register_clone_buffers+0x484/0xa14 io_uring/rsrc.c:1287 __io_uring_register io_uring/register.c:815 [inline] __do_sys_io_uring_register io_uring/register.c:926 [inline] __se_sys_io_uring_register io_uring/register.c:903 [inline] __arm64_sys_io_uring_register+0x42c/0xea8 io_uring/register.c:903 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151 el0_svc+0x58/0x17c arch/arm64/kernel/entry-common.c:767 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600 which is due to offset + buffer_count being too large. The registration code checks only the total count of buffers, but given that the indexing is an array, it should also check offset + count. That can't exceed IORING_MAX_REG_BUFFERS either, as there's no way to reach buffers beyond that limit. There's no issue with registrering a table this large, outside of the fact that it's pointless to register buffers that cannot be reached, and that it can trigger this kmalloc() warning for attempting an allocation that is too large. Cc: [email protected] Fixes: b16e920 ("io_uring/rsrc: allow cloning at an offset") Reported-by: [email protected] Link: https://lore.kernel.org/io-uring/[email protected]/ Signed-off-by: Jens Axboe <[email protected]>
Add missing put_task_struct() in the error path Cc: [email protected] Fixes: 0f8baa3 ("io-wq: fully initialize wqe before calling cpuhp_state_add_instance_nocalls()") Signed-off-by: Penglei Jiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
i915_pmu.c may fail to build with GCOV and AutoFDO enabled. ../drivers/gpu/drm/i915/i915_pmu.c:116:3: error: call to '__compiletime_assert_487' declared with 'error' attribute: BUILD_BUG_ON failed: bit > BITS_PER_TYPE(typeof_member(struct i915_pmu, enable)) - 1 116 | BUILD_BUG_ON(bit > | ^ Here is a way to reproduce the issue: $ git checkout v6.15 $ mkdir build $ ./scripts/kconfig/merge_config.sh -O build -n -m <(cat <<EOF CONFIG_DRM=y CONFIG_PCI=y CONFIG_DRM_I915=y CONFIG_PERF_EVENTS=y CONFIG_DEBUG_FS=y CONFIG_GCOV_KERNEL=y CONFIG_GCOV_PROFILE_ALL=y CONFIG_AUTOFDO_CLANG=y EOF ) $ PATH=${PATH}:${HOME}/llvm-20.1.5-x86_64/bin make LLVM=1 O=build \ olddefconfig $ PATH=${PATH}:${HOME}/llvm-20.1.5-x86_64/bin make LLVM=1 O=build \ CLANG_AUTOFDO_PROFILE=...PATH_TO_SOME_AFDO_PROFILE... \ drivers/gpu/drm/i915/i915_pmu.o Although not super sure what happened, by reviewing the code, it should depend on `__builtin_constant_p(bit)` directly instead of assuming `__builtin_constant_p(config)` makes `bit` a builtin constant. Also fix a nit, to reuse the `bit` local variable. Fixes: a644fde ("drm/i915/pmu: Change bitmask of enabled events to u32") Reviewed-by: Nathan Chancellor <[email protected]> Signed-off-by: Tzung-Bi Shih <[email protected]> Signed-off-by: Tvrtko Ursulin <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 686d773) Signed-off-by: Joonas Lahtinen <[email protected]>
BXT_MIPI_TRANS_VTOTAL must be programmed with vtotal-1 instead of vtotal. Make it so. Cc: [email protected] Signed-off-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Jani Nikula <[email protected]> (cherry picked from commit 7b3685c) Signed-off-by: Joonas Lahtinen <[email protected]>
The following kernel Oops was recently reported by Mesa CI: [ 800.139824] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000588 [ 800.148619] Mem abort info: [ 800.151402] ESR = 0x0000000096000005 [ 800.155141] EC = 0x25: DABT (current EL), IL = 32 bits [ 800.160444] SET = 0, FnV = 0 [ 800.163488] EA = 0, S1PTW = 0 [ 800.166619] FSC = 0x05: level 1 translation fault [ 800.171487] Data abort info: [ 800.174357] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 [ 800.179832] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 800.184873] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 800.190176] user pgtable: 4k pages, 39-bit VAs, pgdp=00000001014c2000 [ 800.196607] [0000000000000588] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 [ 800.205305] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP [ 800.211564] Modules linked in: vc4 snd_soc_hdmi_codec drm_display_helper v3d cec gpu_sched drm_dma_helper drm_shmem_helper drm_kms_helper drm drm_panel_orientation_quirks snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm i2c_brcmstb snd_timer snd backlight [ 800.234448] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.25+rpt-rpi-v8 #1 Debian 1:6.12.25-1+rpt1 [ 800.244182] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT) [ 800.250005] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 800.256959] pc : v3d_job_update_stats+0x60/0x130 [v3d] [ 800.262112] lr : v3d_job_update_stats+0x48/0x130 [v3d] [ 800.267251] sp : ffffffc080003e60 [ 800.270555] x29: ffffffc080003e60 x28: ffffffd842784980 x27: 0224012000000000 [ 800.277687] x26: ffffffd84277f630 x25: ffffff81012fd800 x24: 0000000000000020 [ 800.284818] x23: ffffff8040238b08 x22: 0000000000000570 x21: 0000000000000158 [ 800.291948] x20: 0000000000000000 x19: ffffff8040238000 x18: 0000000000000000 [ 800.299078] x17: ffffffa8c1bd2000 x16: ffffffc080000000 x15: 0000000000000000 [ 800.306208] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 800.313338] x11: 0000000000000040 x10: 0000000000001a40 x9 : ffffffd83b39757c [ 800.320468] x8 : ffffffd842786420 x7 : 7fffffffffffffff x6 : 0000000000ef32b0 [ 800.327598] x5 : 00ffffffffffffff x4 : 0000000000000015 x3 : ffffffd842784980 [ 800.334728] x2 : 0000000000000004 x1 : 0000000000010002 x0 : 000000ba4c0ca382 [ 800.341859] Call trace: [ 800.344294] v3d_job_update_stats+0x60/0x130 [v3d] [ 800.349086] v3d_irq+0x124/0x2e0 [v3d] [ 800.352835] __handle_irq_event_percpu+0x58/0x218 [ 800.357539] handle_irq_event+0x54/0xb8 [ 800.361369] handle_fasteoi_irq+0xac/0x240 [ 800.365458] handle_irq_desc+0x48/0x68 [ 800.369200] generic_handle_domain_irq+0x24/0x38 [ 800.373810] gic_handle_irq+0x48/0xd8 [ 800.377464] call_on_irq_stack+0x24/0x58 [ 800.381379] do_interrupt_handler+0x88/0x98 [ 800.385554] el1_interrupt+0x34/0x68 [ 800.389123] el1h_64_irq_handler+0x18/0x28 [ 800.393211] el1h_64_irq+0x64/0x68 [ 800.396603] default_idle_call+0x3c/0x168 [ 800.400606] do_idle+0x1fc/0x230 [ 800.403827] cpu_startup_entry+0x40/0x50 [ 800.407742] rest_init+0xe4/0xf0 [ 800.410962] start_kernel+0x5e8/0x790 [ 800.414616] __primary_switched+0x80/0x90 [ 800.418622] Code: 8b170277 8b160296 11000421 b9000861 (b9401ac1) [ 800.424707] ---[ end trace 0000000000000000 ]--- [ 800.457313] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- This issue happens when the file descriptor is closed before the jobs submitted by it are completed. When the job completes, we update the global GPU stats and the per-fd GPU stats, which are exposed through fdinfo. If the file descriptor was closed, then the struct `v3d_file_priv` and its stats were already freed and we can't update the per-fd stats. Therefore, if the file descriptor was already closed, don't update the per-fd GPU stats, only update the global ones. Cc: [email protected] # v6.12+ Reviewed-by: Jose Maria Casanova Crespo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maíra Canal <[email protected]>
Commit 704d3d6 ("drm/etnaviv: don't block scheduler when GPU is still active") ensured that active jobs are returned to the pending list when extending the timeout. However, it didn't use the pending list's lock to manipulate the list, which causes a race condition as the scheduler's workqueues are running. Hold the lock while manipulating the scheduler's pending list to prevent a race. Cc: [email protected] Fixes: 704d3d6 ("drm/etnaviv: don't block scheduler when GPU is still active") Reported-by: Philipp Stanner <[email protected]> Closes: https://lore.kernel.org/dri-devel/[email protected]/ Reviewed-by: Lucas Stach <[email protected]> Reviewed-by: Philipp Stanner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maíra Canal <[email protected]>
[WHAT] Severe video playback corruption is observed in the following setup: weston 14.0.90 (built from source) + mpv v0.40.0 with command: mpv bbb_sunflower_1080p_60fps_normal.mp4 --vo=gpu [HOW] ABGR16161616 needs to be included in dml2/2.1 translation. Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Acked-by: Aurabindo Pillai <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Reviewed-by: Austin Zheng <[email protected]> Signed-off-by: Alex Hung <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit d023de8) Cc: [email protected]
[WHY & HOW] Fix RMCM programming sequence errors and mapping issues to pass the RMCM test. Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Reviewed-by: Dmytro Laktyushkin <[email protected]> Signed-off-by: Yihan Zhu <[email protected]> Signed-off-by: Alex Hung <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 11baa49) Cc: [email protected]
[WHY] Backlight caps are read already in amdgpu_dm_update_backlight_caps(). They may be updated by update_connector_ext_caps(). Reading again when registering backlight device may cause wrong values to be used. [HOW] Use backlight caps already registered to the dm. Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Reviewed-by: Roman Li <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Hung <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 148144f) Cc: [email protected]
[WHY] Userspace currently is offered a range from 0-0xFF but the PWM is programmed from 0-0xFFFF. This can be limiting to some software that wants to apply greater granularity. [HOW] Convert internally to firmware values only when mapping custom brightness curves because these are in 0-0xFF range. Advertise full PWM range to userspace. Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Reviewed-by: Roman Li <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Hung <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 8dbd72c) Cc: [email protected]
1. add kicker device list 2. add kicker device checking helper function Signed-off-by: Frank Min <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 09aa2b4) Cc: [email protected]
1. Add kicker firmwares loading for gfx11/smu13/psp13 2. Register additional MODULE_FIRMWARE entries for kicker fws - gc_11_0_0_rlc_kicker.bin - gc_11_0_0_imu_kicker.bin - psp_13_0_0_sos_kicker.bin - psp_13_0_0_ta_kicker.bin - smu_13_0_0_kicker.bin Signed-off-by: Frank Min <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit fb5ec21) Cc: [email protected]
This commit makes the following improvements to SDMA engine reset handling: 1. Clarifies in the function documentation that instance_id refers to a logical ID 2. Adds conversion from logical to physical instance ID before performing reset using GET_INST(SDMA0, instance_id) macro 3. Improves error messaging to indicate when a logical instance reset fails 4. Adds better code organization with blank lines for readability The change ensures proper SDMA engine reset by using the correct physical instance ID while maintaining the logical ID interface for callers. V2: Remove harvest_config check and convert directly to physical instance (Lijo) Suggested-by: Jonathan Kim <[email protected]> Signed-off-by: Jesse Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 5efa621) Cc: [email protected]
Simplify SDMA v4_4_2 queue reset and stop operations by: 1. Removing GET_INST(SDMA0) conversion for ring->me 2. Using the logical instance ID (ring->me) directly 3. Maintaining consistent behavior with other SDMA queue operations This change aligns with the existing queue handling logic where ring->me already represents the correct instance identifier. Signed-off-by: Lijo Lazar <[email protected]> Signed-off-by: Jesse Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 3bab282) Cc: [email protected]
Add a protection to ensure programming are all complete prior VCPU starting. This is a WA for an unintended VCPU running. Signed-off-by: Sonny Jiang <[email protected]> Acked-by: Leo Liu <[email protected]> Reviewed-by: Ruijing Dong <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit c29521b) Cc: [email protected]
[WHAT] hws was checked for null earlier in dce110_blank_stream, indicating hws can be null, and should be checked whenever it is used. Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Reviewed-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Hung <[email protected]> Signed-off-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 79db436) Cc: [email protected]
Make sure to release reset domain lock in case of failures. Signed-off-by: Lijo Lazar <[email protected]> Signed-off-by: Ce Sun <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Fixes: 11bb337 ("drm/amdgpu: refactor amdgpu_device_gpu_recover") Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 1ab11a8)
This commit makes two key fixes to SDMA v4.4.2 handling: 1. disable UTC_L1 in sdma_cntl register when stopping SDMA engines by reading the current value before modifying UTC_L1_ENABLE bit. 2. Ensure UTC_L1_ENABLE is consistently managed by: - Adding the missing register write when enabling UTC_L1 during start - Keeping UTC_L1 enabled by default as per hardware requirements v2: Correct SDMA_CNTL setting (Philip) Suggested-by: Jonathan Kim <[email protected]> Signed-off-by: Jesse Zhang <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 375bf56) Cc: [email protected]
Use the amdgpu fence container so we can store additional data in the fence. This also fixes the start_time handling for MCBP since we were casting the fence to an amdgpu_fence and it wasn't. Fixes: 3f4c175 ("drm/amdgpu: MCBP based on DRM scheduler (v9)") Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit bf1cd14) Cc: [email protected]
Missing the mutex init. Fixes: e56d4bf ("drm/amdgpu/: drm/amdgpu: Register the new sdma function pointers for sdma_v5_0") Reviewed-by: Jesse Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 3f4caf0)
q->gws is not updated atomically with qpd->mapped_gws_queue. If a runlist is created between pqm_set_gws and update_queue it will contain a queue which uses GWS in a process with no GWS allocated. This will result in a scheduler hang. Use q->properties.is_gws which is changed while holding the DQM lock. Signed-off-by: Jay Cornwall <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit b983702) Cc: [email protected]
Missing the mutex init. Fixes: 47454f2 ("drm/amdgpu: Register the new sdma function pointers for sdma_v5_2") Reviewed-by: Jesse Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit ea685ff)
…org/drm/i915/kernel into drm-fixes - Fix MIPI vtotal programming off by one on Broxton - Fix PMU code for GCOV and AutoFDO enabled build Signed-off-by: Dave Airlie <[email protected]> From: Joonas Lahtinen <[email protected]> Link: https://lore.kernel.org/r/aFJfykDpUwtmpilE@jlahtine-mobl
…op.org/agd5f/linux into drm-fixes amd-drm-fixes-6.16-2025-06-18: amdgpu: - DP tunneling fix - LTTPR fix - DSC fix - DML2.x ABGR16161616 fix - RMCM fix - Backlight fixes - GFX11 kicker support - SDMA reset fixes - VCN 5.0.1 fix - Reset fix - Misc small fixes amdkfd: - SDMA reset fix - Fix race in GWS scheduling Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://lore.kernel.org/r/[email protected]
Sanity check the values for queue depth and number of queues we get from userspace when adding a device. Signed-off-by: Ronnie Sahlberg <[email protected]> Reviewed-by: Ming Lei <[email protected]> Fixes: 71f28f3 ("ublk_drv: add io_uring based userspace block driver") Fixes: 62fe99c ("ublk: add read()/write() support for ublk char device") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
This allows for additional L2 caching modes. Fixes: 01570b4 ("drm/xe/bmg: implement Wa_16023588340") Cc: Matthew Auld <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Signed-off-by: Vinay Belgaumkar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lucas De Marchi <[email protected]> (cherry picked from commit 6ab42fa) Signed-off-by: Thomas Hellström <[email protected]>
It should rather use xe_map_memset() as the BO is created with XE_BO_FLAG_VRAM_IF_DGFX in xe_guc_pc_init(). Fixes: dd08ebf ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: [email protected] Reviewed-by: Matthew Brost <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lucas De Marchi <[email protected]> (cherry picked from commit 21cf47d) Signed-off-by: Thomas Hellström <[email protected]>
When the GuC fails to load we declare the device wedged. However, the very first GuC load attempt on GT0 (from xe_gt_init_hwconfig) is done before the GT1 GuC objects are initialized, so things go bad when the wedge code attempts to cleanup GT1. To fix this, check the initialization status in the functions called during wedge. Fixes: 7dbe8af ("drm/xe: Wedge the entire device") Signed-off-by: Daniele Ceraolo Spurio <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Matthew Brost <[email protected]> Cc: Jonathan Cavitt <[email protected]> Cc: Lucas De Marchi <[email protected]> Cc: Zhanjun Dong <[email protected]> Cc: [email protected] # v6.12+: 1e1981b: drm/xe: Fix taking invalid lock on wedge Cc: [email protected] # v6.12+ Reviewed-by: Jonathan Cavitt <[email protected]> Reviewed-by: Lucas De Marchi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lucas De Marchi <[email protected]> (cherry picked from commit 0b93b7d) Signed-off-by: Thomas Hellström <[email protected]>
…rg/drm/misc/kernel into drm-fixes drm-misc-fixes for v6.16-rc3: - vivante scheduler fix. - v3d null pointer crash fix. - fix backlight, booting GSP-RM, and potential integer shift overflow in nouveau. - fix compiler warnings about unused linux/export.h - fix malidp unknown modifier spam. - fix for ssd130x. Signed-off-by: Dave Airlie <[email protected]> From: Maarten Lankhorst <[email protected]> Link: https://lore.kernel.org/r/[email protected]
…/drm/xe/kernel into drm-fixes Driver Changes: - A workaround update (Vinay) - Fix memset on iomem (Lucas) - Fix early wedge on GuC Load failure (Daniele) Signed-off-by: Dave Airlie <[email protected]> From: Thomas Hellstrom <[email protected]> Link: https://lore.kernel.org/r/aFQ03kNzhbiNK7gW@fedora
…/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes a regression in ahash (broken fallback finup) and reinstates a Kconfig option to control the extra self-tests" * tag 'v6.16-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: ahash - Fix infinite recursion in ahash_def_finup crypto: testmgr - reinstate kconfig control over full self-tests
…m/kernel Pull drm fixes from Dave Airlie: "Bit of an uptick in fixes for rc3, msm and amdgpu leading the way, with i915/xe/nouveau with a few each and then some scattered misc bits, nothing looks too crazy: msm: - Display: - Fixed DP output on SDM845 - Fixed 10nm DSI PLL init - GPU: - SUBMIT ioctl error path leak fixes - drm half of stall-on-fault fixes - a7xx: Missing CP_RESET_CONTEXT_STATE - Skip GPU component bind if GPU is not in the device table i915: - Fix MIPI vtotal programming off by one on Broxton - Fix PMU code for GCOV and AutoFDO enabled build xe: - A workaround update - Fix memset on iomem - Fix early wedge on GuC Load failure amdgpu: - DP tunneling fix - LTTPR fix - DSC fix - DML2.x ABGR16161616 fix - RMCM fix - Backlight fixes - GFX11 kicker support - SDMA reset fixes - VCN 5.0.1 fix - Reset fix - Misc small fixes amdkfd: - SDMA reset fix - Fix race in GWS scheduling nouveau: - update docs reference - fix backlight name buffer size - fix UAF in r535 gsp rpc msg - fix undefined shift mgag200: - drop export header ast: - drop export header malidp: - drop informational error ssd130x: - fix clear columns etnaviv: - scheduler locking fix v3d: - null pointer crash fix" * tag 'drm-fixes-2025-06-20' of https://gitlab.freedesktop.org/drm/kernel: (50 commits) drm/xe: Fix early wedge on GuC load failure drm/xe: Fix memset on iomem drm/xe/bmg: Update Wa_16023588340 drm/amdgpu/sdma5.2: init engine reset mutex drm/amdkfd: Fix race in GWS queue scheduling drm/amdgpu/sdma5: init engine reset mutex drm/amdgpu: switch job hw_fence to amdgpu_fence drm/amdgpu: Fix SDMA UTC_L1 handling during start/stop sequences drm/amdgpu: Release reset locks during failures drm/amd/display: Check dce_hwseq before dereferencing it drm/amdgpu: VCN v5_0_1 to prevent FW checking RB during DPG pause drm/amdgpu: Use logical instance ID for SDMA v4_4_2 queue operations drm/amdgpu: Fix SDMA engine reset with logical instance ID drm/amdgpu: add kicker fws loading for gfx11/smu13/psp13 drm/amdgpu: Add kicker device detection drm/amd/display: Export full brightness range to userspace drm/amd/display: Only read ACPI backlight caps once drm/amd/display: Fix RMCM programming seq errors drm/amd/display: Fix mpv playback corruption on weston drm/amd/display: Add more checks for DSC / HUBP ONO guarantees ...
Pull io_uring fixes from Jens Axboe: - Two fixes for error injection failures. One fixes a task leak issue introduced in this merge window, the other an older issue with handling allocation of a mapped buffer. - Fix for a syzbot issue that triggers a kmalloc warning on attempting an allocation that's too large - Fix for an error injection failure causing a double put of a task, introduced in this merge window * tag 'io_uring-6.16-20250619' of git://git.kernel.dk/linux: io_uring: fix potential page leak in io_sqe_buffer_register() io_uring/sqpoll: don't put task_struct on tctx setup failure io_uring: remove duplicate io_uring_alloc_task_context() definition io_uring: fix task leak issue in io_wq_create() io_uring/rsrc: validate buffer count with offset for cloning
Pull block fixes from Jens Axboe: - Two fixes for aoe which fixes issues dating back to when this driver was converted to blk-mq - Fix for ublk, checking for valid queue depth and count values before setting up a device * tag 'block-6.16-20250619' of git://git.kernel.dk/linux: ublk: santizize the arguments from userspace when adding a device aoe: defer rexmit timer downdev work to workqueue aoe: clean device rq_list in aoedev_downdev()
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )