cmd/snap-update-ns/change.go: sort needed, desired and not reused mount entries #10676

anonymouse64 · 2021-08-25T03:06:09Z

Sort new mount entries by their mimic creation directories, such that the mimic
directories that end up being created are done so in lexographical order.

Also remove the experimental option which was actually causing crashes in snaps
like firefox when their content snaps were updated. See commit message for full
explanation.

codecov-commenter · 2021-08-25T03:25:22Z

Codecov Report

Merging #10676 (4720ec0) into master (6514a34) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #10676      +/-   ##
==========================================
+ Coverage   78.26%   78.27%   +0.01%     
==========================================
  Files         936      937       +1     
  Lines      108784   109160     +376     
==========================================
+ Hits        85136    85446     +310     
- Misses      18356    18399      +43     
- Partials     5292     5315      +23

Flag	Coverage Δ
unittests	`78.27% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
cmd/snap-update-ns/bootstrap.go	`87.50% <ø> (ø)`
cmd/snap-update-ns/change.go	`94.63% <100.00%> (+3.08%)`	⬆️
boot/reboot.go	`65.21% <0.00%> (-4.56%)`	⬇️
osutil/synctree.go	`76.41% <0.00%> (-2.84%)`	⬇️
snap/quota/quota.go	`94.78% <0.00%> (-2.05%)`	⬇️
gadget/gadget.go	`88.10% <0.00%> (-0.85%)`	⬇️
cmd/snap/cmd_debug_state.go	`70.18% <0.00%> (-0.46%)`	⬇️
cmd/snap-bootstrap/cmd_initramfs_mounts.go	`75.12% <0.00%> (-0.22%)`	⬇️
overlord/state/change.go	`95.80% <0.00%> (-0.03%)`	⬇️
... and 8 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

anonymouse64 · 2021-08-25T10:47:48Z

🎉 spread does seem to be happy here at least

mardy

A few suggestions. Keep in mind that I don't know what a "mimic" is, so take my review for what it is :-)

tests/regression/lp-1844496/task.yaml

cmd/snap-update-ns/change.go

Signed-off-by: Ian Johnson <[email protected]>

…nt entries Sort new mount entries by their mimic creation directories, such that the mimic directories that end up being created are done so in lexographical order. Also update a single unit test where there were multiple mimic directories being created because now all mount entries that create mimic directories are performed first. Signed-off-by: Ian Johnson <[email protected]>

This experimental flag is not necessary anymore, and in fact is actively harmful in that it is causing snaps to crash when they are running and an update happens either to snapd or to their content snap dependencies and we end up completely discarding the per-snap namespace, which leads to some destructive effects inside the "sort of inheriting" per-user namespaces, that then later do not get undone and thus recreated in the per-user namespace as those namespaces aren't properly setup to inherit the constructive updates. Signed-off-by: Ian Johnson <[email protected]>

cmd/snap-update-ns/bootstrap.go

cmd/snap-update-ns/change.go

It's not used anymore, so we can just delete this code wholesale. Also undo a typo fix, "s" is the British spelling so this can be left as-is. Thanks to Alberto for spotting that this was leftover. Signed-off-by: Ian Johnson <[email protected]>

Signed-off-by: Ian Johnson <[email protected]>

anonymouse64 · 2022-03-01T03:20:58Z

Seems there are real regressions with this PR in tests/main/parallel-install-interfaces-content and tests/main/parallel-install-layout but I haven't had time to look deeply at those yet

mardy · 2022-03-01T08:55:43Z

Seems there are real regressions with this PR in tests/main/parallel-install-interfaces-content and tests/main/parallel-install-layout but I haven't had time to look deeply at those yet

I'm investigating the failure on tests/main/parallel-install-layout. The failing line in the test is this:

$name.sh -c 'cat $SNAP_COMMON/etc/demo/writable' | MATCH "$name foo-1"

on the second iteration of the for loop, that is when $name is test-snapd-layout_foo. The /run/snapd/ns/snap.test-snapd-layout_foo.fstab file contains the line

/var/snap/test-snapd-layout/common/etc/demo /etc/demo none rbind,rw,x-snapd.origin=layout 0 0

and indeed even for this snap the SNAP_COMMON variable expands to the first snap's directory (is this correct??):

qemu:ubuntu-20.04-64 .../tests/main/parallel-install-layout# test-snapd-layout_foo.sh -c 'echo $SNAP_COMMON'
/var/snap/test-snapd-layout/common

The source file exists and has the right contents:

qemu:ubuntu-20.04-64 .../tests/main/parallel-install-layout# cat /var/snap/test-snapd-layout/common/etc/demo/writable 
test-snapd-layout_foo foo-1

The directory /etc/demo seems to be there:

qemu:ubuntu-20.04-64 .../tests/main/parallel-install-layout# test-snapd-layout_foo.sh -c 'ls -l /etc/demo'
total 4
-rw-r--r-- 1 root root 28 Mar  1 07:28 writable

and the file has the expected contents:

qemu:ubuntu-20.04-64 .../tests/main/parallel-install-layout# test-snapd-layout_foo.sh -c 'cat /etc/demo/writable'
test-snapd-layout_foo foo-1

Entering the snap namespace (nsenter --mount=/run/snapd/ns/test-snapd-layout_foo.mnt) shows that the mount exists:

qemu:ubuntu-20.04-64 /# cat /proc/self/mountinfo | grep etc.demo
1535 1539 8:1 /var/snap/test-snapd-layout/common/etc/demo.conf /etc/demo.conf rw,relatime master:1 - ext4 /dev/sda1 rw
1536 1539 8:1 /var/snap/test-snapd-layout/common/etc/demo /etc/demo rw,relatime master:1 - ext4 /dev/sda1 rw

But for some reason, the SNAP_COMMON directory is empty (I'm still running this from the snap namespace):

qemu:ubuntu-20.04-64 /# ls -l /var/snap/test-snapd-layout/common/
total 0

So maybe there is something wrong with the /var/snap mount? By the way, looking at the full /proc/self/mountinfo from inside the snap namespace we can see that the /var/snap /var/lib/snapd/hostfs/var/snap mount occurs twice. Could this be problematic?

Update after more debugging

The problem persists even when calling snap-discard-ns on test-snapd-layout_foo. I suspect that the error is due to the lst line of the mountinfo file linked above, which says:

1825 1437 8:1 /var/snap/test-snapd-layout_foo /var/snap/test-snapd-layout rw,relatime master:1 - ext4 /dev/sda1 rw

because, on the host, /var/snap/test-snapd-layout is the location where the common/etc/demo directory is, and not /var/snap/test-snapd-layout_foo. Indeed, this matches the "desired" fstab file:

qemu:ubuntu-20.04-64 .../tests/main/parallel-install-layout# cat /var/lib/snapd/mount/snap.test-snapd-layout_foo.fstab 
/snap/test-snapd-layout_foo /snap/test-snapd-layout none rbind,x-snapd.origin=overname 0 0
/var/snap/test-snapd-layout_foo /var/snap/test-snapd-layout none rbind,x-snapd.origin=overname 0 0
...

It might be an issue with the ordering: in our intentions the mount of /var/snap/test-snapd-layout_foo on /var/snap/test-snapd-layout should probably executed first, but as it turns out, it's the last one happening.

anonymouse64 · 2022-03-02T00:09:35Z

and indeed even for this snap the SNAP_COMMON variable expands to the first snap's directory (is this correct??):

Yes it is expected, inside a parallel snap instance, the snap does not get different directories inside the mount namespace for various reasons, mainly for compatibility to handle snaps hard-coding the value of SNAP_COMMON etc to use the SNAP_NAME rather than SNAP_INSTANCE_NAME.

This issue smells somewhat like the issue that was fixed in #9751, and in fact that specific regression test from that PR is now failing. The unit tests about that situation however are not failing which is a bit puzzling...

I'm not 100% certain and need to keep looking at this but I think the issue here is that I think my changes need to be modified to handle the specific mount namespace setup that happens for parallel installs first (see AddOvername() in interfaces/apparmor/spec.go), because actually I think we need to perform those changes first, then do everything else on top of those changes for them to work, however I am not yet convinced that if we do that we couldn't still introduce the sort of un-undoable change that this PR is meant to fully eliminate.

mardy · 2022-03-02T08:35:44Z

I'm not 100% certain and need to keep looking at this but I think the issue here is that I think my changes need to be modified to handle the specific mount namespace setup that happens for parallel installs first (see AddOvername() in interfaces/apparmor/spec.go), because actually I think we need to perform those changes first, then do everything else on top of those changes for them to work [...]

Thanks Ian, I pushed one commit to this branch which does what you suggested, and those two spread tests are now passing. Now let's wait and see about the rest of the spread tests... :-)

anonymouse64 · 2022-03-02T15:31:40Z

cmd/snap-update-ns/change_test.go

 	changes := update.NeededChanges(&osutil.MountProfile{}, desired)
 	c.Assert(changes, DeepEquals, []*update.Change{
 		{Entry: osutil.MountEntry{Dir: "/foo/bar", Name: "/foo/bar_bar", Options: []string{osutil.XSnapdOriginOvername()}}, Action: update.Mount},
+		{Entry: osutil.MountEntry{Dir: "/snap/foo", Name: "/snap/foo_bar", Options: []string{osutil.XSnapdOriginOvername()}}, Action: update.Mount},


ah so the reason the unit tests were passing was because I "fixed" them 🤦

anonymouse64 · 2022-03-02T15:37:50Z

tests/main/mount-ns/google.ubuntu-18.04-64/HOST.expected.txt

 +0:+1 / /proc/sys/fs/binfmt_misc rw,relatime shared:+1 - autofs systemd-1 rw,fd=0,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=0
 +0:+1 / /run rw,nosuid,noexec,relatime shared:+1 - tmpfs tmpfs rw,size=VARIABLE,mode=755
 +0:+1 / /run/lock rw,nosuid,nodev,noexec,relatime shared:+1 - tmpfs tmpfs rw,size=VARIABLE
+0:+1 / /run/qemu rw,nosuid,nodev,relatime shared:+1 - tmpfs none rw,mode=755


ideally we would make these changes separate from this PR, since they are large and not actually related to the other changes here and may distract/confuse about the changes in this PR

But then we'd break the master branch, unless I'm missing something. Or is this test currently failing in master too?

yes, the mount-ns:inherit (or maybe the reboot variant) test is currently failing on master a lot

anonymouse64 · 2022-03-04T02:54:49Z

cmd/snap-update-ns/change.go

 				// in umount(2).
 				err = sysMount("none", c.Entry.Dir, "", syscall.MS_REC|syscall.MS_PRIVATE, "")
 				logger.Debugf("mount --make-rprivate %q (error: %v)", c.Entry.Dir, err)
+				err = clearMissingMountError(err)


To be clear, this change makes sense to me and the unit tests seem happy now

When unmounting, we can get the EINVAL error if the given mount point does not exist. Previously, this code was handling this fine for the umount() syscall, but we do also need the same logic when attempting to remount a mount as private.

When supporting appstream-metadata interface, snap-update-ns will mount directories labeled as usr_t (eg. /usr/share/metainfo, /usr/share/appdata) and fwdupd_cache_t (eg. /var/cache/app-info). Signed-off-by: Maciej Borzecki <[email protected]>

…ts too Test that with parallel installs and layouts which trigger mounts on top of $SNAP/... (which itself will be an overname mount in a parallel install snap) still work and we can still refresh such mount setups. This is successful because we always handle overname mounts first when creating the mounts and any such mounts underneath the overname are then ordered properly. Signed-off-by: Ian Johnson <[email protected]>

anonymouse64 · 2022-03-10T03:24:18Z

I finally got around to testing the situation I was worried about with parallel instances and can't reproduce any issue, so I extended the spread test here to test that situation (both with the miscompatible layouts underneath and overname and "above" an overname mount).

We could still use more unit tests here, I noticed that almost all of the unit tests for NeededChanges use non-existent files so the checks to see if we need to create the mimic in the new sorting code here isn't really being exercised, so we need new unit tests which mock files so that they can be skipped if the mimics don't need to be created, etc. I didn't get to that tonight unfortunately

With commit df6bbd5 (cmd/snap-update-ns/change.go: stop using experimental flag) a bunch of tests which were nearly identical save for the fact that they were exercising different implementations of the NeededChanges() function, have become exact duplicates, since now there's only one implementation. So, let's keep only one copy of them.

pedronis

did a pass, thank you, some comments and questions

cmd/snap-update-ns/change.go

pedronis · 2022-03-14T13:20:18Z

cmd/snap-update-ns/change.go

 			if kind == "" && (err == syscall.ENOTEMPTY || err == syscall.EEXIST) {
 				return nil
 			}
-			if features.RobustMountNamespaceUpdates.IsEnabled() {


should we remove the feature fully in a follow up?

Yes I think we could, not sure what effects that has for systems where the feature is already set?

We need to double check, we didn't need to for any of them so far. Anyway as I said, follow up material.

tests/regression/lp-1844496/task.yaml

Thanks to Samuele for pointing out the inconsistency in the comment here. Signed-off-by: Ian Johnson <[email protected]>

…x crash This ensures that files which are shared via mounts in the MountConnectedPlug method in an interface like the desktop interface remain shared in the per-user mount namespace when the content snap is refreshed (not the main snap itself even). We don't expect this situation to happen much when refresh app awareness is fully enabled by default, but it is still important to test that the snap-update-ns isn't horribly breaking apps when refreshes happen to take place when apps are still running (this could be the case for desktop systems which have a running app for more than 14 days for example). Signed-off-by: Ian Johnson <[email protected]>

Signed-off-by: Ian Johnson <[email protected]>

anonymouse64 · 2022-03-18T14:25:43Z

Just a reminder that since this will go into 2.55 we should squash merge this

pedronis

couple of comments about the spread test

tests/main/mounts-persist-refresh-content-snap/task.yaml

…on test To actually reproduce the crash, we need to use layouts with sources from the files that the content interface is sharing with the snap. Additionally, create the fonts dir and restart snapd before installing the snap, actually exit 1 if the process died and kill the parent process last with the other child processes in the restore section, and fix the shellcheck issue. Signed-off-by: Ian Johnson <[email protected]>

anonymouse64 · 2022-03-18T22:51:55Z

Alright the spread test is fully working now, you can see it "properly failing" in the other PR: #11530

The rootfs is read-only and can't have the fonts directory created there. Signed-off-by: Ian Johnson <[email protected]>

…case It works much better to have the loop just exit itself and then kill the process too just in case. Finally, limit to 10 minutes in case we do get something wrong so we don't waste too much time waiting for processes to exit. Signed-off-by: Ian Johnson <[email protected]>

mardy

As far as Ian's and Maciek's changes are concerned, they LGTM!

tests/main/mounts-persist-refresh-content-snap/task.yaml

anonymouse64 · 2022-03-21T07:10:09Z

The spread test needs to have the YAML reordered apparently:

Checking tests formatting
Format checks failed for task ./tests/main/mounts-persist-refresh-content-snap/task.yaml
 - Keys 'kill-timeout' and 'systems' do not follow the desired order: ['summary', 'details', 'backends', 'systems', 'manual', 'priority', 'warn-timeout', 'kill-timeout', 'environment', 'prepare', 'restore', 'debug', 'execute']

Anybody mind doing this for me? Thanks

The `-p` option to `ps` was missing, and we can just use `wait` for checking process termination.

mardy · 2022-03-21T07:37:46Z

The spread test needs to have the YAML reordered apparently:
[...]
Anybody mind doing this for me? Thanks

Done!

…tead On armhf and arm64 LP builders, /opt/runtime exists and is a symlink, so the new code added in canonical#10676 would detect this and thus generate a different mount operation order. Using a name like foo-runtime should shield us from such issues going forward. Signed-off-by: Ian Johnson <[email protected]>

…tests cmd/snap-update-ns/change_test.go: use non-exist name foo-runtime instead On armhf and arm64 LP builders, /opt/runtime exists and is a symlink, so the new code added in #10676 would detect this and thus generate a different mount operation order. Using a name like foo-runtime should shield us from such issues going forward.

mardy reviewed Aug 25, 2021

View reviewed changes

anonymouse64 added 2 commits February 23, 2022 07:10

cmd/snap-update-ns/bootstrap.go: fix comment typos

1c7f2a1

Signed-off-by: Ian Johnson <[email protected]>

anonymouse64 force-pushed the feature/mount-ns-refactor-proto-wip branch from 0429a88 to 1e47eac Compare February 23, 2022 13:37

anonymouse64 force-pushed the feature/mount-ns-refactor-proto-wip branch from 1e47eac to df6bbd5 Compare February 24, 2022 01:24

mardy reviewed Feb 25, 2022

View reviewed changes

cmd/snap-update-ns/bootstrap.go Outdated Show resolved Hide resolved

cmd/snap-update-ns/change.go Outdated Show resolved Hide resolved

anonymouse64 added 2 commits February 28, 2022 16:23

Merge branch 'master' into feature/mount-ns-refactor-proto-wip

5dba309

Signed-off-by: Ian Johnson <[email protected]>

cmd/snap-update-ns: unconditionally perform the "overname" mounts first

2225c1a

mardy force-pushed the feature/mount-ns-refactor-proto-wip branch from 3f33fdc to 2225c1a Compare March 2, 2022 09:50

anonymouse64 commented Mar 2, 2022

View reviewed changes

anonymouse64 commented Mar 4, 2022

View reviewed changes

mardy force-pushed the feature/mount-ns-refactor-proto-wip branch from 699445f to 2bb982d Compare March 4, 2022 13:34

anonymouse64 marked this pull request as ready for review March 4, 2022 13:56

Merge branch 'master' into feature/mount-ns-refactor-proto-wip

9a518a1

mardy mentioned this pull request Mar 7, 2022

snap-confine: ground work for homedirs support #11367

Merged

bboozzoo self-requested a review March 9, 2022 13:20

mvo5 added this to the 2.55 milestone Mar 11, 2022

pedronis self-requested a review March 11, 2022 14:51

pedronis reviewed Mar 14, 2022

View reviewed changes

mardy and others added 5 commits March 15, 2022 15:46

cmd/snap-update-ns: address review comments

6e7ede5

cmd/snap-update-ns: update doc-comments, eliminate code duplication

c8fab16

Thanks to Samuele for pointing out the inconsistency in the comment here. Signed-off-by: Ian Johnson <[email protected]>

Merge branch 'master' into feature/mount-ns-refactor-proto-wip

7e22709

Signed-off-by: Ian Johnson <[email protected]>

tests/main/mounts-persist-refresh-content-snap: fix newline at EOF

e078cfd

Signed-off-by: Ian Johnson <[email protected]>

anonymouse64 mentioned this pull request Mar 18, 2022

tests/mounts-persist-refresh-content-snap: regression test for firefox crash #11530

Closed

anonymouse64 requested a review from pedronis March 18, 2022 03:32

anonymouse64 added the Squash-merge Please squash this PR when merging. label Mar 18, 2022

pedronis approved these changes Mar 18, 2022

View reviewed changes

tests/main/mounts-persist-refresh-content-snap/task.yaml Outdated Show resolved Hide resolved

tests/main/mounts-persist-refresh-content-snap/task.yaml Outdated Show resolved Hide resolved

anonymouse64 requested a review from mardy March 18, 2022 22:51

anonymouse64 added 2 commits March 20, 2022 17:00

tests/main/mounts-persist-refresh-content-snap: don't run on core

6f039c7

The rootfs is read-only and can't have the fonts directory created there. Signed-off-by: Ian Johnson <[email protected]>

mardy approved these changes Mar 21, 2022

View reviewed changes

tests/main/mounts-persist-refresh-content-snap/task.yaml Outdated Show resolved Hide resolved

mardy added 2 commits March 21, 2022 10:13

tests: improve process handling

040b60e

The `-p` option to `ps` was missing, and we can just use `wait` for checking process termination.

tests: sort yaml keys according to predefined order

4720ec0

mvo5 merged commit a6ffa57 into canonical:master Mar 21, 2022

anonymouse64 deleted the feature/mount-ns-refactor-proto-wip branch March 21, 2022 14:46

anonymouse64 mentioned this pull request Mar 22, 2022

cmd/snap-update-ns/change_test.go: use non-exist name foo-runtime instead #11552

Merged

ZeyadYasser mentioned this pull request Jun 24, 2024

many: remove robust-mount-namespace-updates experimental feature #14117

Merged

ernestl mentioned this pull request Feb 13, 2025

release: 2.68 #15069

Merged

cmd/snap-update-ns/change.go: sort needed, desired and not reused mount entries #10676

cmd/snap-update-ns/change.go: sort needed, desired and not reused mount entries #10676

Uh oh!

Conversation

anonymouse64 commented Aug 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Aug 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

anonymouse64 commented Aug 25, 2021

Uh oh!

mardy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anonymouse64 commented Mar 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mardy commented Mar 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update after more debugging

Uh oh!

anonymouse64 commented Mar 2, 2022

Uh oh!

mardy commented Mar 2, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anonymouse64 commented Mar 10, 2022

Uh oh!

pedronis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

anonymouse64 commented Mar 18, 2022

Uh oh!

pedronis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

anonymouse64 commented Mar 18, 2022

Uh oh!

mardy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

anonymouse64 commented Mar 21, 2022

Uh oh!

mardy commented Mar 21, 2022

Uh oh!

Uh oh!

anonymouse64 commented Aug 25, 2021 •

edited

Loading

codecov-commenter commented Aug 25, 2021 •

edited

Loading

anonymouse64 commented Mar 1, 2022 •

edited

Loading

mardy commented Mar 1, 2022 •

edited

Loading