-
Notifications
You must be signed in to change notification settings - Fork 634
cmd/snap-update-ns/change.go: sort needed, desired and not reused mount entries #10676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/snap-update-ns/change.go: sort needed, desired and not reused mount entries #10676
Conversation
Codecov Report
@@ Coverage Diff @@
## master #10676 +/- ##
==========================================
+ Coverage 78.26% 78.27% +0.01%
==========================================
Files 936 937 +1
Lines 108784 109160 +376
==========================================
+ Hits 85136 85446 +310
- Misses 18356 18399 +43
- Partials 5292 5315 +23
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
🎉 spread does seem to be happy here at least |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggestions. Keep in mind that I don't know what a "mimic" is, so take my review for what it is :-)
Signed-off-by: Ian Johnson <[email protected]>
…nt entries Sort new mount entries by their mimic creation directories, such that the mimic directories that end up being created are done so in lexographical order. Also update a single unit test where there were multiple mimic directories being created because now all mount entries that create mimic directories are performed first. Signed-off-by: Ian Johnson <[email protected]>
0429a88
to
1e47eac
Compare
This experimental flag is not necessary anymore, and in fact is actively harmful in that it is causing snaps to crash when they are running and an update happens either to snapd or to their content snap dependencies and we end up completely discarding the per-snap namespace, which leads to some destructive effects inside the "sort of inheriting" per-user namespaces, that then later do not get undone and thus recreated in the per-user namespace as those namespaces aren't properly setup to inherit the constructive updates. Signed-off-by: Ian Johnson <[email protected]>
1e47eac
to
df6bbd5
Compare
It's not used anymore, so we can just delete this code wholesale. Also undo a typo fix, "s" is the British spelling so this can be left as-is. Thanks to Alberto for spotting that this was leftover. Signed-off-by: Ian Johnson <[email protected]>
Signed-off-by: Ian Johnson <[email protected]>
Seems there are real regressions with this PR in tests/main/parallel-install-interfaces-content and tests/main/parallel-install-layout but I haven't had time to look deeply at those yet |
I'm investigating the failure on
on the second iteration of the for loop, that is when
and indeed even for this snap the
The source file exists and has the right contents:
The directory
and the file has the expected contents:
Entering the snap namespace (
But for some reason, the SNAP_COMMON directory is empty (I'm still running this from the snap namespace):
So maybe there is something wrong with the Update after more debuggingThe problem persists even when calling snap-discard-ns on
because, on the host,
It might be an issue with the ordering: in our intentions the mount of |
Yes it is expected, inside a parallel snap instance, the snap does not get different directories inside the mount namespace for various reasons, mainly for compatibility to handle snaps hard-coding the value of SNAP_COMMON etc to use the SNAP_NAME rather than SNAP_INSTANCE_NAME. This issue smells somewhat like the issue that was fixed in #9751, and in fact that specific regression test from that PR is now failing. The unit tests about that situation however are not failing which is a bit puzzling... I'm not 100% certain and need to keep looking at this but I think the issue here is that I think my changes need to be modified to handle the specific mount namespace setup that happens for parallel installs first (see AddOvername() in interfaces/apparmor/spec.go), because actually I think we need to perform those changes first, then do everything else on top of those changes for them to work, however I am not yet convinced that if we do that we couldn't still introduce the sort of un-undoable change that this PR is meant to fully eliminate. |
Thanks Ian, I pushed one commit to this branch which does what you suggested, and those two spread tests are now passing. Now let's wait and see about the rest of the spread tests... :-) |
3f33fdc
to
2225c1a
Compare
changes := update.NeededChanges(&osutil.MountProfile{}, desired) | ||
c.Assert(changes, DeepEquals, []*update.Change{ | ||
{Entry: osutil.MountEntry{Dir: "/foo/bar", Name: "/foo/bar_bar", Options: []string{osutil.XSnapdOriginOvername()}}, Action: update.Mount}, | ||
{Entry: osutil.MountEntry{Dir: "/snap/foo", Name: "/snap/foo_bar", Options: []string{osutil.XSnapdOriginOvername()}}, Action: update.Mount}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah so the reason the unit tests were passing was because I "fixed" them 🤦
+0:+1 / /proc/sys/fs/binfmt_misc rw,relatime shared:+1 - autofs systemd-1 rw,fd=0,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=0 | ||
+0:+1 / /run rw,nosuid,noexec,relatime shared:+1 - tmpfs tmpfs rw,size=VARIABLE,mode=755 | ||
+0:+1 / /run/lock rw,nosuid,nodev,noexec,relatime shared:+1 - tmpfs tmpfs rw,size=VARIABLE | ||
+0:+1 / /run/qemu rw,nosuid,nodev,relatime shared:+1 - tmpfs none rw,mode=755 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally we would make these changes separate from this PR, since they are large and not actually related to the other changes here and may distract/confuse about the changes in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then we'd break the master branch, unless I'm missing something. Or is this test currently failing in master too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the mount-ns:inherit (or maybe the reboot variant) test is currently failing on master a lot
// in umount(2). | ||
err = sysMount("none", c.Entry.Dir, "", syscall.MS_REC|syscall.MS_PRIVATE, "") | ||
logger.Debugf("mount --make-rprivate %q (error: %v)", c.Entry.Dir, err) | ||
err = clearMissingMountError(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, this change makes sense to me and the unit tests seem happy now
When unmounting, we can get the EINVAL error if the given mount point does not exist. Previously, this code was handling this fine for the umount() syscall, but we do also need the same logic when attempting to remount a mount as private.
699445f
to
2bb982d
Compare
When supporting appstream-metadata interface, snap-update-ns will mount directories labeled as usr_t (eg. /usr/share/metainfo, /usr/share/appdata) and fwdupd_cache_t (eg. /var/cache/app-info). Signed-off-by: Maciej Borzecki <[email protected]>
…ts too Test that with parallel installs and layouts which trigger mounts on top of $SNAP/... (which itself will be an overname mount in a parallel install snap) still work and we can still refresh such mount setups. This is successful because we always handle overname mounts first when creating the mounts and any such mounts underneath the overname are then ordered properly. Signed-off-by: Ian Johnson <[email protected]>
I finally got around to testing the situation I was worried about with parallel instances and can't reproduce any issue, so I extended the spread test here to test that situation (both with the miscompatible layouts underneath and overname and "above" an overname mount). We could still use more unit tests here, I noticed that almost all of the unit tests for NeededChanges use non-existent files so the checks to see if we need to create the mimic in the new sorting code here isn't really being exercised, so we need new unit tests which mock files so that they can be skipped if the mimics don't need to be created, etc. I didn't get to that tonight unfortunately |
With commit df6bbd5 (cmd/snap-update-ns/change.go: stop using experimental flag) a bunch of tests which were nearly identical save for the fact that they were exercising different implementations of the NeededChanges() function, have become exact duplicates, since now there's only one implementation. So, let's keep only one copy of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did a pass, thank you, some comments and questions
if kind == "" && (err == syscall.ENOTEMPTY || err == syscall.EEXIST) { | ||
return nil | ||
} | ||
if features.RobustMountNamespaceUpdates.IsEnabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we remove the feature fully in a follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think we could, not sure what effects that has for systems where the feature is already set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to double check, we didn't need to for any of them so far. Anyway as I said, follow up material.
Thanks to Samuele for pointing out the inconsistency in the comment here. Signed-off-by: Ian Johnson <[email protected]>
…x crash This ensures that files which are shared via mounts in the MountConnectedPlug method in an interface like the desktop interface remain shared in the per-user mount namespace when the content snap is refreshed (not the main snap itself even). We don't expect this situation to happen much when refresh app awareness is fully enabled by default, but it is still important to test that the snap-update-ns isn't horribly breaking apps when refreshes happen to take place when apps are still running (this could be the case for desktop systems which have a running app for more than 14 days for example). Signed-off-by: Ian Johnson <[email protected]>
Signed-off-by: Ian Johnson <[email protected]>
Signed-off-by: Ian Johnson <[email protected]>
Just a reminder that since this will go into 2.55 we should squash merge this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of comments about the spread test
…on test To actually reproduce the crash, we need to use layouts with sources from the files that the content interface is sharing with the snap. Additionally, create the fonts dir and restart snapd before installing the snap, actually exit 1 if the process died and kill the parent process last with the other child processes in the restore section, and fix the shellcheck issue. Signed-off-by: Ian Johnson <[email protected]>
Alright the spread test is fully working now, you can see it "properly failing" in the other PR: #11530 |
The rootfs is read-only and can't have the fonts directory created there. Signed-off-by: Ian Johnson <[email protected]>
…case It works much better to have the loop just exit itself and then kill the process too just in case. Finally, limit to 10 minutes in case we do get something wrong so we don't waste too much time waiting for processes to exit. Signed-off-by: Ian Johnson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as Ian's and Maciek's changes are concerned, they LGTM!
The spread test needs to have the YAML reordered apparently:
Anybody mind doing this for me? Thanks |
The `-p` option to `ps` was missing, and we can just use `wait` for checking process termination.
Done! |
…tead On armhf and arm64 LP builders, /opt/runtime exists and is a symlink, so the new code added in canonical#10676 would detect this and thus generate a different mount operation order. Using a name like foo-runtime should shield us from such issues going forward. Signed-off-by: Ian Johnson <[email protected]>
…tests cmd/snap-update-ns/change_test.go: use non-exist name foo-runtime instead On armhf and arm64 LP builders, /opt/runtime exists and is a symlink, so the new code added in #10676 would detect this and thus generate a different mount operation order. Using a name like foo-runtime should shield us from such issues going forward.
Sort new mount entries by their mimic creation directories, such that the mimic
directories that end up being created are done so in lexographical order.
Also remove the experimental option which was actually causing crashes in snaps
like firefox when their content snaps were updated. See commit message for full
explanation.