forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 4
[pull] master from torvalds:master #1958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
LSM hooks such as security_path_mknod() and security_inode_rename() have access to newly allocated negative dentry, which has NULL d_inode. Therefore, it is necessary to do the NULL pointer check for d_inode. Also add selftests that checks the verifier enforces the NULL pointer check. Signed-off-by: Song Liu <[email protected]> Reviewed-by: Matt Bobrowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
BPF_MAP_TYPE_LRU_HASH can recycle most recent elements well before the map is full, due to percpu reservations and force shrink before neighbor stealing. Once a CPU is unable to borrow from the global map, it will once steal one elem from a neighbor and after that each time flush this one element to the global list and immediately recycle it. Batch value LOCAL_FREE_TARGET (128) will exhaust a 10K element map with 79 CPUs. CPU 79 will observe this behavior even while its neighbors hold 78 * 127 + 1 * 15 == 9921 free elements (99%). CPUs need not be active concurrently. The issue can appear with affinity migration, e.g., irqbalance. Each CPU can reserve and then hold onto its 128 elements indefinitely. Avoid global list exhaustion by limiting aggregate percpu caches to half of map size, by adjusting LOCAL_FREE_TARGET based on cpu count. This change has no effect on sufficiently large tables. Similar to LOCAL_NR_SCANS and lru->nr_scans, introduce a map variable lru->free_target. The extra field fits in a hole in struct bpf_lru. The cacheline is already warm where read in the hot path. The field is only accessed with the lru lock held. Tested-by: Anton Protopopov <[email protected]> Signed-off-by: Willem de Bruijn <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
collect_mounts() has several problems - one can't iterate over the results directly, so it has to be done with callback passed to iterate_mounts(); it has an oopsable race with d_invalidate(); it creates temporary clones of mounts invisibly for sync umount (IOW, you can have non-lazy umount succeed leaving filesystem not mounted anywhere and yet still busy). A saner approach is to give caller an array of struct path that would pin every mount in a subtree, without cloning any mounts. * collect_mounts()/drop_collected_mounts()/iterate_mounts() is gone * collect_paths(where, preallocated, size) gives either ERR_PTR(-E...) or a pointer to array of struct path, one for each chunk of tree visible under 'where' (i.e. the first element is a copy of where, followed by (mount,root) for everything mounted under it - the same set collect_mounts() would give). Unlike collect_mounts(), the mounts are *not* cloned - we just get pinning references to the roots of subtrees in the caller's namespace. Array is terminated by {NULL, NULL} struct path. If it fits into preallocated array (on-stack, normally), that's where it goes; otherwise it's allocated by kmalloc_array(). Passing 0 as size means that 'preallocated' is ignored (and expected to be NULL). * drop_collected_paths(paths, preallocated) is given the array returned by an earlier call of collect_paths() and the preallocated array passed to that call. All mount/dentry references are dropped and array is kfree'd if it's not equal to 'preallocated'. * instead of iterate_mounts(), users should just iterate over array of struct path - nothing exotic is needed for that. Existing users (all in audit_tree.c) are converted. [folded a fix for braino reported by Venkat Rao Bagalkote <[email protected]>] Fixes: 80b5dce ("vfs: Add a function to lazily unmount all mounts from any dentry") Tested-by: Venkat Rao Bagalkote <[email protected]> Signed-off-by: Al Viro <[email protected]>
…mething under it If we are propagating across the userns boundary, we need to lock the mounts added there. However, in case when something has already been mounted there and we end up sliding a new tree under that, the stuff that had been there before should not get locked. IOW, lock_mnt_tree() should be called before we reparent the preexisting tree on top of what we are adding. Fixes: 3bd045c ("separate copying and locking mount tree on cross-userns copies") Signed-off-by: Al Viro <[email protected]>
…failure When btf_dump__new() fails to allocate memory for the internal hashmap (btf_dump->type_names), it returns an error code. However, the cleanup function btf_dump__free() does not check if btf_dump->type_names is NULL before attempting to free it. This leads to a null pointer dereference when btf_dump__free() is called on a btf_dump object. Fixes: 351131b ("libbpf: add btf_dump API for BTF-to-C conversion") Signed-off-by: Yuan Chen <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
The second argument of bpf_sysctl_get_name() helper is a pointer to a buffer that is being written to. However that isn't specify in the prototype. Until commit 37cce22 ("bpf: verifier: Refactor helper access type tracking"), all helper accesses were considered as a possible write access by the verifier, so no big harm was done. However, since then, the verifier might make wrong asssumption about the content of that address which might lead it to make faulty optimizations (such as removing code that was wrongly labeled dead). This is what happens in test_sysctl selftest to the tests related to sysctl_get_name. Add MEM_WRITE flag the second argument of bpf_sysctl_get_name(). Signed-off-by: Jerome Marchand <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
Convert test_sysctl test to prog_tests with minimal change to the tests themselves. Signed-off-by: Jerome Marchand <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
Jerome Marchand says: ==================== bpf: Specify access type of bpf_sysctl_get_name args The second argument of bpf_sysctl_get_name() helper is a pointer to a buffer that is being written to. However that isn't specify in the prototype. Until commit 37cce22 ("bpf: verifier: Refactor helper access type tracking") that mistake was hidden by the way the verifier treated helper accesses. Since then, the verifier, working on wrong infromation from the prototype, can make faulty optimization that would had been caught by the test_sysctl selftests if it was run by the CI. The first patch fixes bpf_sysctl_get_name prototype. The second patch converts the test_sysctl to prog_tests so that it will be run by the CI and catch similar issues in the future. Changes in v3: - Use ASSERT* macro instead of CHECK_FAIL. - Remove useless code. Changes in v2: - Replace ARG_PTR_TO_UNINIT_MEM by ARG_PTR_TO_MEM | MEM_WRITE. - Converts test_sysctl to prog_tests. ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
Once want_mount_setattr() has returned a positive, it does require finish_mount_kattr() to release ->mnt_userns. Failing do_mount_setattr() does not change that. As the result, we can end up leaking userns and possibly mnt_idmap as well. Fixes: c4a1682 ("fs: add open_tree_attr()") Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]>
The `name` field in `obj->externs` points into the BTF data at initial open time. However, some functions may invalidate this after opening and before loading (e.g. `bpf_map__set_value_size`), which results in pointers into freed memory and undefined behavior. The simplest solution is to simply `strdup` these strings, similar to the `essent_name`, and free them at the same time. In order to test this path, the `global_map_resize` BPF selftest is modified slightly to ensure the presence of an extern, which causes this test to fail prior to the fix. Given there isn't an obvious API or error to test against, I opted to add this to the existing test as an aspect of the resizing feature rather than duplicate the test. Fixes: 9d0a233 ("libbpf: Add capability for resizing datasec maps") Signed-off-by: Adin Scannell <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
…free The below commit that updated BPF_MAP_TYPE_LRU_HASH free target, also updated tools/testing/selftests/bpf/test_lru_map to match. But that missed one case that passes with 4 cores, but fails at higher cpu counts. Update test_lru_sanity3 to also adjust its expectation of target_free. This time tested with 1, 4, 16, 64 and 384 cpu count. Fixes: d4adf1c ("bpf: Adjust free target to avoid global starvation of LRU map") Signed-off-by: Willem de Bruijn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
…it/viro/vfs Pull mount fixes from Al Viro: "Several mount-related fixes" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: userns and mnt_idmap leak in open_tree_attr(2) attach_recursive_mnt(): do not lock the covering tree when sliding something under it replace collect_mounts()/drop_collected_mounts() with a safer variant
…t/bpf/bpf Pull bpf fixes from Alexei Starovoitov: - Fix use-after-free in libbpf when map is resized (Adin Scannell) - Fix verifier assumptions about 2nd argument of bpf_sysctl_get_name (Jerome Marchand) - Fix verifier assumption of nullness of d_inode in dentry (Song Liu) - Fix global starvation of LRU map (Willem de Bruijn) - Fix potential NULL dereference in btf_dump__free (Yuan Chen) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: adapt one more case in test_lru_map to the new target_free libbpf: Fix possible use-after-free for externs selftests/bpf: Convert test_sysctl to prog_tests bpf: Specify access type of bpf_sysctl_get_name args libbpf: Fix null pointer dereference in btf_dump__free on allocation failure bpf: Adjust free target to avoid global starvation of LRU map bpf: Mark dentry->d_inode as trusted_or_null
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )