-
Notifications
You must be signed in to change notification settings - Fork 25.2k
POC: Add support for per project repo client #126584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
POC: Add support for per project repo client #126584
Conversation
Relates: ES-11383
/** | ||
* Delegates to {@link #client(RepositoryMetadata)} when | ||
* 1. per-project client is disabled | ||
* 2. or when the blobstore is cluster level (projectId = null) | ||
* Otherwise, attempts to retrieve a per-project client by the project-id and repository metadata from the | ||
* per-project client manager. Throws if project-id or the client does not exist. The client maybe initialized lazily. | ||
*/ | ||
public AmazonS3Reference client(@Nullable ProjectId projectId, RepositoryMetadata repositoryMetadata) { | ||
if (perProjectClientManager == null) { | ||
// Multi-Project is disabled and we have a single default project | ||
assert ProjectId.DEFAULT.equals(projectId) : projectId; | ||
return client(repositoryMetadata); | ||
} else if (projectId == null) { | ||
// Multi-Project is enabled and we are retrieving a client for the cluster level blobstore | ||
return client(repositoryMetadata); | ||
} else { | ||
return perProjectClientManager.client(projectId, repositoryMetadata); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main interface to access repo client for different scopes. Note the usage of null
projectId for cluster level blobstore similar to how cluster level persistent task is handled.
@DaveCTurner @henningandersen I updated the PR with changes to GCS repository clients as well. Also tightened up clientReference management for S3. Overall the S3 changes have more complexity due to refcounting for the clients. I ended up using synchronized blocks similar to the existing code. The sychronization is per-project and needed only when there are project creation/deletion and clients setting update so that the contention should be rather low. GCS repo does not allow overriding client settings with repo settings. So it might be feasible to merge the per-project client manager with the existing clientsCache. But for now, I am having them separate since it feels easier to reason with the changes. It might also make sense to keep them separate long term because the clients are refreshed with different mechanisms, reloadable plugin vs cluster state listener. Since Azure does not have a clients cache, I think the changes should be relatively straightforward. It also does not allow overriding client settings with repo settings. Given this the draft status of the PR, I am mostly seeking for consensus of the overall approach, e.g. separate per-project and existing client management, how the per-project manager is hooked and how project-id is passed around and used. Thanks! |
…ject-secrets-listener
I'm not a huge fan of creating/closing the clients while holding a mutex, I think we might need something more refined here, especially since with this change we now might block the cluster applier thread on this work. The CHM-based approach in GCS is better, but still blocking. Creating a client should be cheap enough that if we cannot find a cached client then we can optimistically create a fresh one before trying to replace it in the map, accepting that if multiple threads do this then we'd want to discard the clients created by the threads that lost the race. Note that with SDKv2 we will own the lifecycle of the HTTP client independently of the S3 SDK client, so could potentially share the HTTP client across multiple projects. For my education, are we going to be accepting credentials via file-based settings and putting them in the cluster state, or do those still use the keystore? |
Thanks for the feedback, David. For CHM usages, there are two levels. The first one is I agree that the synchronized usage is not desirable. I tried to avoid it initially but it proved to be difficult while maintaining the current semantics. It might be possible if we are ok with creating the same client more than once as you suggested. But I wonder whether you think it would be an acceptable solution to simply fork client closing to a different thread? Closing the clients is the only use case in the applier thread where it runs into synchronization. Since there is no requirement to close the clients synchronously, I feel this could be a simple solution while maintain all existing semantics. In fact, I think we do want to move closing the clients to a different thread regardless of the synchronization since it might be itself expensive (by closing the underlying s3 sdk client). What do you think?
Per-project credentials will be via per-project file-based secrets (see |
Yes I agree. We should probably block the |
I assume we are OK with the overall approach and I will go ahead to raise the formal PR for s3 first. Thanks for the discussion! |
The intention of this draft PR is to get agreement on the approach for managing per-project repository clients. The current changes are for s3 only. But similar change should be possible for Azure and GCS as well. I do plan to have PoC for at least one of them as well. Raising the initial PR for early feedback.
A few things worth note for the per-project clients are:
Relates: ES-11383