Skip to content

Conversation

anonymouse64
Copy link
Contributor

@anonymouse64 anonymouse64 commented Aug 25, 2021

Sort new mount entries by their mimic creation directories, such that the mimic
directories that end up being created are done so in lexographical order.

Also remove the experimental option which was actually causing crashes in snaps
like firefox when their content snaps were updated. See commit message for full
explanation.

@codecov-commenter
Copy link

codecov-commenter commented Aug 25, 2021

Codecov Report

Merging #10676 (4720ec0) into master (6514a34) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #10676      +/-   ##
==========================================
+ Coverage   78.26%   78.27%   +0.01%     
==========================================
  Files         936      937       +1     
  Lines      108784   109160     +376     
==========================================
+ Hits        85136    85446     +310     
- Misses      18356    18399      +43     
- Partials     5292     5315      +23     
Flag Coverage Δ
unittests 78.27% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cmd/snap-update-ns/bootstrap.go 87.50% <ø> (ø)
cmd/snap-update-ns/change.go 94.63% <100.00%> (+3.08%) ⬆️
boot/reboot.go 65.21% <0.00%> (-4.56%) ⬇️
osutil/synctree.go 76.41% <0.00%> (-2.84%) ⬇️
snap/quota/quota.go 94.78% <0.00%> (-2.05%) ⬇️
gadget/gadget.go 88.10% <0.00%> (-0.85%) ⬇️
cmd/snap/cmd_debug_state.go 70.18% <0.00%> (-0.46%) ⬇️
cmd/snap-bootstrap/cmd_initramfs_mounts.go 75.12% <0.00%> (-0.22%) ⬇️
overlord/state/change.go 95.80% <0.00%> (-0.03%) ⬇️
... and 8 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@anonymouse64
Copy link
Contributor Author

🎉 spread does seem to be happy here at least

Copy link
Contributor

@mardy mardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions. Keep in mind that I don't know what a "mimic" is, so take my review for what it is :-)

…nt entries

Sort new mount entries by their mimic creation directories, such that the mimic
directories that end up being created are done so in lexographical order.

Also update a single unit test where there were multiple mimic directories
being created because now all mount entries that create mimic directories are
performed first.

Signed-off-by: Ian Johnson <[email protected]>
@anonymouse64 anonymouse64 force-pushed the feature/mount-ns-refactor-proto-wip branch from 0429a88 to 1e47eac Compare February 23, 2022 13:37
This experimental flag is not necessary anymore, and in fact is actively
harmful in that it is causing snaps to crash when they are running and an
update happens either to snapd or to their content snap dependencies and we end
up completely discarding the per-snap namespace, which leads to some
destructive effects inside the "sort of inheriting" per-user namespaces, that
then later do not get undone and thus recreated in the per-user namespace as
those namespaces aren't properly setup to inherit the constructive updates.

Signed-off-by: Ian Johnson <[email protected]>
@anonymouse64 anonymouse64 force-pushed the feature/mount-ns-refactor-proto-wip branch from 1e47eac to df6bbd5 Compare February 24, 2022 01:24
It's not used anymore, so we can just delete this code wholesale.

Also undo a typo fix, "s" is the British spelling so this can be left as-is.

Thanks to Alberto for spotting that this was leftover.

Signed-off-by: Ian Johnson <[email protected]>
@anonymouse64
Copy link
Contributor Author

anonymouse64 commented Mar 1, 2022

Seems there are real regressions with this PR in tests/main/parallel-install-interfaces-content and tests/main/parallel-install-layout but I haven't had time to look deeply at those yet

@mardy
Copy link
Contributor

mardy commented Mar 1, 2022

Seems there are real regressions with this PR in tests/main/parallel-install-interfaces-content and tests/main/parallel-install-layout but I haven't had time to look deeply at those yet

I'm investigating the failure on tests/main/parallel-install-layout. The failing line in the test is this:

$name.sh -c 'cat $SNAP_COMMON/etc/demo/writable' | MATCH "$name foo-1"

on the second iteration of the for loop, that is when $name is test-snapd-layout_foo. The /run/snapd/ns/snap.test-snapd-layout_foo.fstab file contains the line

/var/snap/test-snapd-layout/common/etc/demo /etc/demo none rbind,rw,x-snapd.origin=layout 0 0

and indeed even for this snap the SNAP_COMMON variable expands to the first snap's directory (is this correct??):

qemu:ubuntu-20.04-64 .../tests/main/parallel-install-layout# test-snapd-layout_foo.sh -c 'echo $SNAP_COMMON'
/var/snap/test-snapd-layout/common

The source file exists and has the right contents:

qemu:ubuntu-20.04-64 .../tests/main/parallel-install-layout# cat /var/snap/test-snapd-layout/common/etc/demo/writable 
test-snapd-layout_foo foo-1

The directory /etc/demo seems to be there:

qemu:ubuntu-20.04-64 .../tests/main/parallel-install-layout# test-snapd-layout_foo.sh -c 'ls -l /etc/demo'
total 4
-rw-r--r-- 1 root root 28 Mar  1 07:28 writable

and the file has the expected contents:

qemu:ubuntu-20.04-64 .../tests/main/parallel-install-layout# test-snapd-layout_foo.sh -c 'cat /etc/demo/writable'
test-snapd-layout_foo foo-1

Entering the snap namespace (nsenter --mount=/run/snapd/ns/test-snapd-layout_foo.mnt) shows that the mount exists:

qemu:ubuntu-20.04-64 /# cat /proc/self/mountinfo | grep etc.demo
1535 1539 8:1 /var/snap/test-snapd-layout/common/etc/demo.conf /etc/demo.conf rw,relatime master:1 - ext4 /dev/sda1 rw
1536 1539 8:1 /var/snap/test-snapd-layout/common/etc/demo /etc/demo rw,relatime master:1 - ext4 /dev/sda1 rw

But for some reason, the SNAP_COMMON directory is empty (I'm still running this from the snap namespace):

qemu:ubuntu-20.04-64 /# ls -l /var/snap/test-snapd-layout/common/
total 0

So maybe there is something wrong with the /var/snap mount? By the way, looking at the full /proc/self/mountinfo from inside the snap namespace we can see that the /var/snap /var/lib/snapd/hostfs/var/snap mount occurs twice. Could this be problematic?

Update after more debugging

The problem persists even when calling snap-discard-ns on test-snapd-layout_foo. I suspect that the error is due to the lst line of the mountinfo file linked above, which says:

1825 1437 8:1 /var/snap/test-snapd-layout_foo /var/snap/test-snapd-layout rw,relatime master:1 - ext4 /dev/sda1 rw

because, on the host, /var/snap/test-snapd-layout is the location where the common/etc/demo directory is, and not /var/snap/test-snapd-layout_foo. Indeed, this matches the "desired" fstab file:

qemu:ubuntu-20.04-64 .../tests/main/parallel-install-layout# cat /var/lib/snapd/mount/snap.test-snapd-layout_foo.fstab 
/snap/test-snapd-layout_foo /snap/test-snapd-layout none rbind,x-snapd.origin=overname 0 0
/var/snap/test-snapd-layout_foo /var/snap/test-snapd-layout none rbind,x-snapd.origin=overname 0 0
...

It might be an issue with the ordering: in our intentions the mount of /var/snap/test-snapd-layout_foo on /var/snap/test-snapd-layout should probably executed first, but as it turns out, it's the last one happening.

@anonymouse64
Copy link
Contributor Author

and indeed even for this snap the SNAP_COMMON variable expands to the first snap's directory (is this correct??):

Yes it is expected, inside a parallel snap instance, the snap does not get different directories inside the mount namespace for various reasons, mainly for compatibility to handle snaps hard-coding the value of SNAP_COMMON etc to use the SNAP_NAME rather than SNAP_INSTANCE_NAME.

This issue smells somewhat like the issue that was fixed in #9751, and in fact that specific regression test from that PR is now failing. The unit tests about that situation however are not failing which is a bit puzzling...

I'm not 100% certain and need to keep looking at this but I think the issue here is that I think my changes need to be modified to handle the specific mount namespace setup that happens for parallel installs first (see AddOvername() in interfaces/apparmor/spec.go), because actually I think we need to perform those changes first, then do everything else on top of those changes for them to work, however I am not yet convinced that if we do that we couldn't still introduce the sort of un-undoable change that this PR is meant to fully eliminate.

@mardy
Copy link
Contributor

mardy commented Mar 2, 2022

I'm not 100% certain and need to keep looking at this but I think the issue here is that I think my changes need to be modified to handle the specific mount namespace setup that happens for parallel installs first (see AddOvername() in interfaces/apparmor/spec.go), because actually I think we need to perform those changes first, then do everything else on top of those changes for them to work [...]

Thanks Ian, I pushed one commit to this branch which does what you suggested, and those two spread tests are now passing. Now let's wait and see about the rest of the spread tests... :-)

@mardy mardy force-pushed the feature/mount-ns-refactor-proto-wip branch from 3f33fdc to 2225c1a Compare March 2, 2022 09:50
changes := update.NeededChanges(&osutil.MountProfile{}, desired)
c.Assert(changes, DeepEquals, []*update.Change{
{Entry: osutil.MountEntry{Dir: "/foo/bar", Name: "/foo/bar_bar", Options: []string{osutil.XSnapdOriginOvername()}}, Action: update.Mount},
{Entry: osutil.MountEntry{Dir: "/snap/foo", Name: "/snap/foo_bar", Options: []string{osutil.XSnapdOriginOvername()}}, Action: update.Mount},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah so the reason the unit tests were passing was because I "fixed" them 🤦

+0:+1 / /proc/sys/fs/binfmt_misc rw,relatime shared:+1 - autofs systemd-1 rw,fd=0,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=0
+0:+1 / /run rw,nosuid,noexec,relatime shared:+1 - tmpfs tmpfs rw,size=VARIABLE,mode=755
+0:+1 / /run/lock rw,nosuid,nodev,noexec,relatime shared:+1 - tmpfs tmpfs rw,size=VARIABLE
+0:+1 / /run/qemu rw,nosuid,nodev,relatime shared:+1 - tmpfs none rw,mode=755
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we would make these changes separate from this PR, since they are large and not actually related to the other changes here and may distract/confuse about the changes in this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then we'd break the master branch, unless I'm missing something. Or is this test currently failing in master too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the mount-ns:inherit (or maybe the reboot variant) test is currently failing on master a lot

// in umount(2).
err = sysMount("none", c.Entry.Dir, "", syscall.MS_REC|syscall.MS_PRIVATE, "")
logger.Debugf("mount --make-rprivate %q (error: %v)", c.Entry.Dir, err)
err = clearMissingMountError(err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this change makes sense to me and the unit tests seem happy now

When unmounting, we can get the EINVAL error if the given mount point
does not exist. Previously, this code was handling this fine for the
umount() syscall, but we do also need the same logic when attempting to
remount a mount as private.
@mardy mardy force-pushed the feature/mount-ns-refactor-proto-wip branch from 699445f to 2bb982d Compare March 4, 2022 13:34
@anonymouse64 anonymouse64 marked this pull request as ready for review March 4, 2022 13:56
When supporting appstream-metadata interface, snap-update-ns will mount
directories labeled as usr_t (eg. /usr/share/metainfo, /usr/share/appdata) and
fwdupd_cache_t (eg. /var/cache/app-info).

Signed-off-by: Maciej Borzecki <[email protected]>
@bboozzoo bboozzoo self-requested a review March 9, 2022 13:20
…ts too

Test that with parallel installs and layouts which trigger mounts on top of
$SNAP/... (which itself will be an overname mount in a parallel install snap)
still work and we can still refresh such mount setups.

This is successful because we always handle overname mounts first when creating
the mounts and any such mounts underneath the overname are then ordered
properly.

Signed-off-by: Ian Johnson <[email protected]>
@anonymouse64
Copy link
Contributor Author

I finally got around to testing the situation I was worried about with parallel instances and can't reproduce any issue, so I extended the spread test here to test that situation (both with the miscompatible layouts underneath and overname and "above" an overname mount).

We could still use more unit tests here, I noticed that almost all of the unit tests for NeededChanges use non-existent files so the checks to see if we need to create the mimic in the new sorting code here isn't really being exercised, so we need new unit tests which mock files so that they can be skipped if the mimics don't need to be created, etc. I didn't get to that tonight unfortunately

With commit df6bbd5
(cmd/snap-update-ns/change.go: stop using experimental flag) a bunch of
tests which were nearly identical save for the fact that they were
exercising different implementations of the NeededChanges() function,
have become exact duplicates, since now there's only one implementation.

So, let's keep only one copy of them.
@mvo5 mvo5 added this to the 2.55 milestone Mar 11, 2022
@pedronis pedronis self-requested a review March 11, 2022 14:51
Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a pass, thank you, some comments and questions

if kind == "" && (err == syscall.ENOTEMPTY || err == syscall.EEXIST) {
return nil
}
if features.RobustMountNamespaceUpdates.IsEnabled() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove the feature fully in a follow up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think we could, not sure what effects that has for systems where the feature is already set?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to double check, we didn't need to for any of them so far. Anyway as I said, follow up material.

mardy and others added 5 commits March 15, 2022 15:46
Thanks to Samuele for pointing out the inconsistency in the comment here.

Signed-off-by: Ian Johnson <[email protected]>
…x crash

This ensures that files which are shared via mounts in the MountConnectedPlug
method in an interface like the desktop interface remain shared in the per-user
mount namespace when the content snap is refreshed (not the main snap itself
even). We don't expect this situation to happen much when refresh app awareness
is fully enabled by default, but it is still important to test that the
snap-update-ns isn't horribly breaking apps when refreshes happen to take place
when apps are still running (this could be the case for desktop systems which
have a running app for more than 14 days for example).

Signed-off-by: Ian Johnson <[email protected]>
@anonymouse64
Copy link
Contributor Author

Just a reminder that since this will go into 2.55 we should squash merge this

Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of comments about the spread test

…on test

To actually reproduce the crash, we need to use layouts with sources from the
files that the content interface is sharing with the snap.

Additionally, create the fonts dir and restart snapd before installing the
snap, actually exit 1 if the process died and kill the parent process last with
the other child processes in the restore section, and fix the shellcheck issue.

Signed-off-by: Ian Johnson <[email protected]>
@anonymouse64 anonymouse64 requested a review from mardy March 18, 2022 22:51
@anonymouse64
Copy link
Contributor Author

Alright the spread test is fully working now, you can see it "properly failing" in the other PR: #11530

The rootfs is read-only and can't have the fonts directory created there.

Signed-off-by: Ian Johnson <[email protected]>
…case

It works much better to have the loop just exit itself and then kill the
process too just in case.

Finally, limit to 10 minutes in case we do get something wrong so we don't 
waste too much time waiting for processes to exit.

Signed-off-by: Ian Johnson <[email protected]>
Copy link
Contributor

@mardy mardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as Ian's and Maciek's changes are concerned, they LGTM!

@anonymouse64
Copy link
Contributor Author

The spread test needs to have the YAML reordered apparently:

Checking tests formatting
Format checks failed for task ./tests/main/mounts-persist-refresh-content-snap/task.yaml
 - Keys 'kill-timeout' and 'systems' do not follow the desired order: ['summary', 'details', 'backends', 'systems', 'manual', 'priority', 'warn-timeout', 'kill-timeout', 'environment', 'prepare', 'restore', 'debug', 'execute']

Anybody mind doing this for me? Thanks

mardy added 2 commits March 21, 2022 10:13
The `-p` option to `ps` was missing, and we can just use `wait` for
checking process termination.
@mardy
Copy link
Contributor

mardy commented Mar 21, 2022

The spread test needs to have the YAML reordered apparently:
[...]
Anybody mind doing this for me? Thanks

Done!

@mvo5 mvo5 merged commit a6ffa57 into canonical:master Mar 21, 2022
@anonymouse64 anonymouse64 deleted the feature/mount-ns-refactor-proto-wip branch March 21, 2022 14:46
anonymouse64 added a commit to anonymouse64/snapd that referenced this pull request Mar 22, 2022
…tead

On armhf and arm64 LP builders, /opt/runtime exists and is a symlink, so the
new code added in canonical#10676 would detect
this and thus generate a different mount operation order.

Using a name like foo-runtime should shield us from such issues going forward.

Signed-off-by: Ian Johnson <[email protected]>
anonymouse64 added a commit that referenced this pull request Mar 22, 2022
…tests

cmd/snap-update-ns/change_test.go: use non-exist name foo-runtime instead

On armhf and arm64 LP builders, /opt/runtime exists and is a symlink, so the
new code added in #10676 would detect
this and thus generate a different mount operation order.

Using a name like foo-runtime should shield us from such issues going forward.
@ernestl ernestl mentioned this pull request Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Samuele review Needs a review from Samuele before it can land Squash-merge Please squash this PR when merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants