Skip to content

Ensure config reload on ..data symlink switch for CSI driver support #127628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 23, 2025

Conversation

jfreden
Copy link
Contributor

@jfreden jfreden commented May 2, 2025

This PR fixes an issue that was surfaced when using the new CSI driver (not yet in production) used by the Elasticsearch controller to mount config files as secrets. Unlike the current kubelet-based approach, the CSI driver doesn't trigger an ENTRY_MODIFY event on the config file watcher when there are symlink updates in config/operator, causing file changes to go unnoticed by the file-watching service—so the fix ensures a full re-read of the config/operator directory when directory symlinks are updated.

The controller mounts settings.json, project-xyz.json and project-xyz.secrets.json on the Elastcisearch pods in serverless as kubernetes secrets. This is currently done using the kublet that run on each node. The files are are represented by symlinks in the config/operator directory that point to the files in a symlinked directory. For example:

drwxr-xr-x    2 root     root          4096 Apr 29 13:51 ..2025_04_29_13_51_31.1372434508
lrwxrwxrwx    1 root     root            32 Apr 29 13:51 ..data -> ..2025_04_29_13_51_31.1372434508
lrwxrwxrwx    1 root     root            52 Apr 29 13:22 project-ff08458fe99c4e37be0829600b4b5022.json -> ..data/project-ff08458fe99c4e37be0829600b4b5022.json
lrwxrwxrwx    1 root     root            57 Apr 29 13:22 project-ff08458fe99c4e37be0829600b4b5022.roles.yml -> ..data/project-ff08458fe99c4e37be0829600b4b5022.roles.yml
lrwxrwxrwx    1 root     root            60 Apr 29 13:22 project-ff08458fe99c4e37be0829600b4b5022.secrets.json -> ..data/project-ff08458fe99c4e37be0829600b4b5022.secrets.json
lrwxrwxrwx    1 root     root            20 Apr 29 13:22 settings.json -> ..data/settings.json

In Elasticsearch, MultiProjectFileSettingsService (and FileSettingsService in non-multi-project environments) is listening for file updates in the config and config/operator through the AbstractFileWatchingService (by registering a WatchService for the config and operator directories and listen for ENTRY_MODIFY, ENTRY_CREATE and ENTRY_DELETE). Because of the symlink approach, the symlinks for the files that are monitored are not updated, only the ..data symlink is switched to a different timestamped directory, to allow for an atomic update of all files. When running with kublet the ..data symlink is updated (recreated) in the config/operator directory, the AbstractFileWatchingService gets an update for the config directory (ENTRY_MODIFY on the config watch key), this in turn trigger a re-register of the config/operator watch key and a re-read of all files in the config/operator, because we don't know if the file name maps to the same native file system file id.

With the new csi-driver, the ..data symlink switch no longer triggers ENTRY_MODIFY on the config directory after settings.json is changed, so we don't re-register the config/operator watch key and the file changes are never picked up. We're not entirely sure what the difference is in the standard kublet flow vs the csi-driver, one difference is that kublet uses a tmpfs file system and the csi-driver uses ext4. I any case, I think that ENTRY_MODIFY on the config directory for the ..data symlink switch might have been accidental and that we should always re-read (check if a file has been updated) the full config/operator directory every time a directory symlink is updated in the settings directory.

I've added a FileSettingsServiceIT test to reproduce the issue.

This only happens on Linux. On Mac (my local) this doesn't happen because the ENTRY_MODIFY event is triggered for the config watcher when updating the config/operator/..data symlink. I'm guessing this is because mac doesn't use inotify so things work differently.

@jfreden jfreden force-pushed the file_watcher_fix branch 2 times, most recently from 5d4a284 to 26e5d5a Compare May 2, 2025 10:58
@jfreden jfreden changed the title Test Reprocess all changed files in settings dir if symlink dir is updated May 2, 2025
@@ -498,6 +503,57 @@ public void testSettingsAppliedOnMasterReElection() throws Exception {
assertClusterStateSaveOK(savedClusterState.v1(), savedClusterState.v2(), "43mb");
}

public void testSymlinkUpdateTriggerReload() throws Exception {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails on linux without the changes to AbstractFileWatchingService.

@jfreden jfreden added the :Core/Infra/Settings Settings infrastructure and APIs label May 2, 2025
@jfreden jfreden changed the title Reprocess all changed files in settings dir if symlink dir is updated Ensure config reload on ..data symlink switch for CSI driver support May 2, 2025
@jfreden jfreden marked this pull request as ready for review May 2, 2025 11:56
@jfreden jfreden requested a review from a team as a code owner May 2, 2025 11:56
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label May 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine
Copy link
Collaborator

Hi @jfreden, I've created a changelog YAML for you.

@jfreden jfreden marked this pull request as draft May 2, 2025 13:30
@barkbay
Copy link
Contributor

barkbay commented May 7, 2025

When running with kublet the ..data symlink is updated (recreated) in the config/operator directory

Just wanted to mention that the ..data directory is updated using os.Rename. This is done the same way by both the Kubelet and the Elastic CSI Driver:

@jfreden jfreden marked this pull request as ready for review May 8, 2025 08:37
.collect(Collectors.toSet());
for (var changedPath : changedPaths) {
// If a symlinked dir changed in the settings dir, it could be linked to other symlinks, so reprocess all files
if (Files.isDirectory(changedPath) && Files.isSymbolicLink(changedPath)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe isDirectory and isSymbolic link need to be extracted to helper methods the way other Files methods are, so that these operations can be granted via entitlements for modules outside server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the file operations to their own methods. I'm trying to understand why this works with the MultiProjectFileSettingsService and I think it might be because getDeclaringClass is used to determine the caller. In this case the caller is MultiProjectFileSettingsService but since we don't override the methods in FileSettingsService the calling class becomes FileSettingsService since that's where the method is declared. FileSettingsService in turn is part of server, so that's why this works?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without overriding it, the calling class appears to be in the server module, so server policies will apply. Extracting methods and overriding them in the module is the current way to make the call appear as originated from the module, so the module's policy appears.
It could make no difference today e.g. all modules are granted read access to the config directory, but that might change, so better to structure the code so it resolves to the correct module/policy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(we are designing ways to make this easier and more explicit in the future, but atm this is the way with entitlements)

Copy link
Contributor Author

@jfreden jfreden May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining @ldematte!

For the serverless-multi-project module, is the entitlement-policy inherited from the extended plugin (core in this case) and is that why it works without an explicit policy? I've updated the MultiProjectFileSettingsService to override the files methods in https://github.com/elastic/elasticsearch-serverless/pull/3881

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works because this class, AbstractFileWatchingService, is in server, so the calls made in this class have the entitlements of server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a Files.newInputStream(file) call in the MultiProjectFileSettingsService and that's why my logging captured the MultiProjectFileSettingsService as a the caller and I got confused. I've changed it to use the filesNewInputStream instead. Thanks for explaining, it makes sense to me now! 👍

for (var changedPath : changedPaths) {
// If a symlinked dir changed in the settings dir, it could be linked to other symlinks, so reprocess all files
if (Files.isDirectory(changedPath) && Files.isSymbolicLink(changedPath)) {
reprocessAllChangedFilesInSettingsDir();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to reprocess the entire settings dir, instead of just what the symbolic link points at?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if we only want to reprocess what's in the ..data directory symlink (or other directory symlinks are present in the settings dir) we need to iterate all files in the directory ..data points to and then resolve the file names against the settings directory, to find all the symlinks that changed (and also filter any files that are not present).

This works under the assumption that the target file and symlink has the same name, which I think is acceptable in this case since it's a directory symlink. In practice (at least in serverless) all files in the settings directory (config/operator) are managed through the ..data symlink, so when it's switched we always need to reprocess all of them. For that reason adding additional logic (resolving the file names against the settings dir) would add complexity for little benefit. For the general case (ECK?) it might make sense to only try to resolve the files in ..data against the settings directory though. WDYT?

@jfreden jfreden requested a review from a team as a code owner May 12, 2025 11:42
@jfreden jfreden force-pushed the file_watcher_fix branch from 823a7dc to d68fbfb Compare May 12, 2025 11:49
@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label May 12, 2025
@jfreden jfreden removed the request for review from a team May 12, 2025 12:11
@jfreden jfreden requested a review from rjernst May 20, 2025 05:12
Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jfreden jfreden merged commit cc5aa91 into elastic:main May 23, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Settings Settings infrastructure and APIs >enhancement serverless-linked Added by automation, don't add manually Team:Core/Infra Meta label for core/infra team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants