Skip to content

Ensure config reload on ..data symlink switch for CSI driver support #127628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jfreden
Copy link
Contributor

@jfreden jfreden commented May 2, 2025

This PR fixes an issue that was surfaced when using the new CSI driver (not yet in production) used by the Elasticsearch controller to mount config files as secrets. Unlike the current kubelet-based approach, the CSI driver doesn't trigger an ENTRY_MODIFY event on the config file watcher when there are symlink updates in config/operator, causing file changes to go unnoticed by the file-watching service—so the fix ensures a full re-read of the config/operator directory when directory symlinks are updated.

The controller mounts settings.json, project-xyz.json and project-xyz.secrets.json on the Elastcisearch pods in serverless as kubernetes secrets. This is currently done using the kublet that run on each node. The files are are represented by symlinks in the config/operator directory that point to the files in a symlinked directory. For example:

drwxr-xr-x    2 root     root          4096 Apr 29 13:51 ..2025_04_29_13_51_31.1372434508
lrwxrwxrwx    1 root     root            32 Apr 29 13:51 ..data -> ..2025_04_29_13_51_31.1372434508
lrwxrwxrwx    1 root     root            52 Apr 29 13:22 project-ff08458fe99c4e37be0829600b4b5022.json -> ..data/project-ff08458fe99c4e37be0829600b4b5022.json
lrwxrwxrwx    1 root     root            57 Apr 29 13:22 project-ff08458fe99c4e37be0829600b4b5022.roles.yml -> ..data/project-ff08458fe99c4e37be0829600b4b5022.roles.yml
lrwxrwxrwx    1 root     root            60 Apr 29 13:22 project-ff08458fe99c4e37be0829600b4b5022.secrets.json -> ..data/project-ff08458fe99c4e37be0829600b4b5022.secrets.json
lrwxrwxrwx    1 root     root            20 Apr 29 13:22 settings.json -> ..data/settings.json

In Elasticsearch, MultiProjectFileSettingsService (and FileSettingsService in non-multi-project environments) is listening for file updates in the config and config/operator through the AbstractFileWatchingService (by registering a WatchService for the config and operator directories and listen for ENTRY_MODIFY, ENTRY_CREATE and ENTRY_DELETE). Because of the symlink approach, the symlinks for the files that are monitored are not updated, only the ..data symlink is switched to a different timestamped directory, to allow for an atomic update of all files. When running with kublet the ..data symlink is updated (recreated) in the config/operator directory, the AbstractFileWatchingService gets an update for the config directory (ENTRY_MODIFY on the config watch key), this in turn trigger a re-register of the config/operator watch key and a re-read of all files in the config/operator, because we don't know if the file name maps to the same native file system file id.

With the new csi-driver, the ..data symlink switch no longer triggers ENTRY_MODIFY on the config directory after settings.json is changed, so we don't re-register the config/operator watch key and the file changes are never picked up. We're not entirely sure what the difference is in the standard kublet flow vs the csi-driver but I think that ENTRY_MODIFY on the config directory for the ..data symlink switch might have been accidental and that we should always re-read (check if a file has been updated) the full config/operator directory every time a directory symlink is updated in the settings directory.

I've added a FileSettingsServiceIT test to reproduce the issue. This only happens on Linux. On Mac (my local) this doesn't happen because the ENTRY_MODIFY event is triggered for the config watcher when updating the config/operator/..data symlink.

@jfreden jfreden force-pushed the file_watcher_fix branch from 5d4a284 to 26e5d5a Compare May 2, 2025 10:58
@jfreden jfreden changed the title Test Reprocess all changed files in settings dir if symlink dir is updated May 2, 2025
@@ -498,6 +503,57 @@ public void testSettingsAppliedOnMasterReElection() throws Exception {
assertClusterStateSaveOK(savedClusterState.v1(), savedClusterState.v2(), "43mb");
}

public void testSymlinkUpdateTriggerReload() throws Exception {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails on linux without the changes to AbstractFileWatchingService.

@jfreden jfreden added the :Core/Infra/Settings Settings infrastructure and APIs label May 2, 2025
@jfreden jfreden changed the title Reprocess all changed files in settings dir if symlink dir is updated Ensure config reload on ..data symlink switch for CSI driver support May 2, 2025
@jfreden jfreden marked this pull request as ready for review May 2, 2025 11:56
@jfreden jfreden requested a review from a team as a code owner May 2, 2025 11:56
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label May 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine
Copy link
Collaborator

Hi @jfreden, I've created a changelog YAML for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Settings Settings infrastructure and APIs >enhancement Team:Core/Infra Meta label for core/infra team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants