-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Ensure config reload on ..data symlink switch for CSI driver support #127628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5d4a284
to
26e5d5a
Compare
@@ -498,6 +503,57 @@ public void testSettingsAppliedOnMasterReElection() throws Exception { | |||
assertClusterStateSaveOK(savedClusterState.v1(), savedClusterState.v2(), "43mb"); | |||
} | |||
|
|||
public void testSymlinkUpdateTriggerReload() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails on linux without the changes to AbstractFileWatchingService
.
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Hi @jfreden, I've created a changelog YAML for you. |
Just wanted to mention that the |
.collect(Collectors.toSet()); | ||
for (var changedPath : changedPaths) { | ||
// If a symlinked dir changed in the settings dir, it could be linked to other symlinks, so reprocess all files | ||
if (Files.isDirectory(changedPath) && Files.isSymbolicLink(changedPath)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe isDirectory and isSymbolic link need to be extracted to helper methods the way other Files methods are, so that these operations can be granted via entitlements for modules outside server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the file operations to their own methods. I'm trying to understand why this works with the MultiProjectFileSettingsService
and I think it might be because getDeclaringClass
is used to determine the caller. In this case the caller is MultiProjectFileSettingsService
but since we don't override the methods in FileSettingsService
the calling class becomes FileSettingsService
since that's where the method is declared. FileSettingsService
in turn is part of server, so that's why this works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without overriding it, the calling class appears to be in the server module, so server policies will apply. Extracting methods and overriding them in the module is the current way to make the call appear as originated from the module, so the module's policy appears.
It could make no difference today e.g. all modules are granted read access to the config directory, but that might change, so better to structure the code so it resolves to the correct module/policy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(we are designing ways to make this easier and more explicit in the future, but atm this is the way with entitlements)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining @ldematte!
For the serverless-multi-project
module, is the entitlement-policy inherited from the extended plugin (core in this case) and is that why it works without an explicit policy? I've updated the MultiProjectFileSettingsService
to override the files methods in https://github.com/elastic/elasticsearch-serverless/pull/3881
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works because this class, AbstractFileWatchingService, is in server, so the calls made in this class have the entitlements of server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a Files.newInputStream(file)
call in the MultiProjectFileSettingsService
and that's why my logging captured the MultiProjectFileSettingsService
as a the caller and I got confused. I've changed it to use the filesNewInputStream
instead. Thanks for explaining, it makes sense to me now! 👍
for (var changedPath : changedPaths) { | ||
// If a symlinked dir changed in the settings dir, it could be linked to other symlinks, so reprocess all files | ||
if (Files.isDirectory(changedPath) && Files.isSymbolicLink(changedPath)) { | ||
reprocessAllChangedFilesInSettingsDir(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to reprocess the entire settings dir, instead of just what the symbolic link points at?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if we only want to reprocess what's in the ..data
directory symlink (or other directory symlinks are present in the settings dir) we need to iterate all files in the directory ..data
points to and then resolve the file names against the settings directory, to find all the symlinks that changed (and also filter any files that are not present).
This works under the assumption that the target file and symlink has the same name, which I think is acceptable in this case since it's a directory symlink. In practice (at least in serverless) all files in the settings directory (config/operator
) are managed through the ..data
symlink, so when it's switched we always need to reprocess all of them. For that reason adding additional logic (resolving the file names against the settings dir) would add complexity for little benefit. For the general case (ECK?) it might make sense to only try to resolve the files in ..data
against the settings directory though. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR fixes an issue that was surfaced when using the new CSI driver (not yet in production) used by the Elasticsearch controller to mount config files as secrets. Unlike the current kubelet-based approach, the CSI driver doesn't trigger an
ENTRY_MODIFY
event on theconfig
file watcher when there are symlink updates inconfig/operator
, causing file changes to go unnoticed by the file-watching service—so the fix ensures a full re-read of theconfig/operator
directory when directory symlinks are updated.The controller mounts
settings.json
,project-xyz.json
andproject-xyz.secrets.json
on the Elastcisearch pods in serverless as kubernetes secrets. This is currently done using the kublet that run on each node. The files are are represented by symlinks in theconfig/operator
directory that point to the files in a symlinked directory. For example:In Elasticsearch,
MultiProjectFileSettingsService
(andFileSettingsService
in non-multi-project environments) is listening for file updates in theconfig
andconfig/operator
through theAbstractFileWatchingService
(by registering a WatchService for the config and operator directories and listen forENTRY_MODIFY
,ENTRY_CREATE
andENTRY_DELETE
). Because of the symlink approach, the symlinks for the files that are monitored are not updated, only the..data
symlink is switched to a different timestamped directory, to allow for an atomic update of all files. When running with kublet the..data
symlink is updated (recreated) in theconfig/operator
directory, theAbstractFileWatchingService
gets an update for theconfig
directory (ENTRY_MODIFY
on theconfig
watch key), this in turn trigger a re-register of theconfig/operator
watch key and a re-read of all files in theconfig/operator
, because we don't know if the file name maps to the same native file system file id.With the new csi-driver, the
..data
symlink switch no longer triggersENTRY_MODIFY
on theconfig
directory aftersettings.json
is changed, so we don't re-register theconfig/operator
watch key and the file changes are never picked up. We're not entirely sure what the difference is in the standard kublet flow vs the csi-driver, one difference is that kublet uses atmpfs
file system and the csi-driver usesext4
. I any case, I think thatENTRY_MODIFY
on the config directory for the..data
symlink switch might have been accidental and that we should always re-read (check if a file has been updated) the fullconfig/operator
directory every time a directory symlink is updated in the settings directory.I've added a
FileSettingsServiceIT
test to reproduce the issue.This only happens on Linux. On Mac (my local) this doesn't happen because the
ENTRY_MODIFY
event is triggered for theconfig
watcher when updating theconfig/operator/..data
symlink. I'm guessing this is because mac doesn't use inotify so things work differently.