Skip to content

Support GKE Workload Identity for Searchable Snapshots #82974

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

arteam
Copy link
Contributor

@arteam arteam commented Jan 24, 2022

Searchable snapshots perform naked calls of GoogleCloudStorageBlobContainer#readBlob without the Security Manager. The
client fails to get Compute Engine credentials because of that. It does work for for normal snapshot/restore we
do a privileged call of GoogleCloudStorageBlobStore.writeBlob during the verification of the repository.

The simplest fix is just to make sure that both ServiceOptions.getDefaultProjectId and GoogleCredentials::getApplicationDefault get called under the Security Manager (which they should because they perform network calls).

Unfortunately, we can't write an integration test for the issue, because the test framework does the repo verification
automatically which kind works around the bug. Writing a unit test also seems not possible, because
ComputeEngineCredentials#getMetadataServerUrl relies on the GCE_METADATA_HOST environment variable.

See elastic/cloud-on-k8s#5230

Resolves #82702

Searchable snapshots perform naked calls of `GoogleCloudStorageBlobContainer#readBlob` without the Security Manager. The
client fails to get Compute Engine credentials because of that. It works for normal snapshot/restore because they
do a privileged call of `GoogleCloudStorageBlobStore.writeBlob` during the verification of the repo.

The simplest fix is just to make sure `ServiceOptions.getDefaultProjectId` and `GoogleCredentials::getApplicationDefault`
are get called under the SecurityManager (which they should because they perform network calls).

Unfortunately, we can't write an integration test for the issue, because the test framework does the repo verification
automatically, which works around the bug. Writing a unit test also seems not possible, because
`ComputeEngineCredentials#getMetadataServerUrl` relies on the `GCE_METADATA_HOST` environment variable.

See elastic/cloud-on-k8s#5230

Resolves elastic#82702
@arteam arteam added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 auto-backport Automatically create backport pull requests when merged v8.1.0 v7.17.0 labels Jan 24, 2022
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 24, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@arteam arteam requested a review from tlrx January 24, 2022 18:33
@tlrx
Copy link
Member

tlrx commented Jan 25, 2022

Unfortunately, we can't write an integration test for the issue, because the test framework does the repo verification
automatically which kind works around the bug. Writing a unit test also seems not possible, because
ComputeEngineCredentials#getMetadataServerUrl relies on the GCE_METADATA_HOST environment variable.

I wonder if we can workaround this and add a QA test for searchable snapshots that would reuse the AbstractSearchableSnapshotsRestTestCase (without verifyng the repository) and the fixture used in discovery-gce ?

I think it is worth to have such a test so that we don't reintroduce such bugs in the future. Note that this can be done in a follow up PR. What do you think?

@elasticsearchmachine
Copy link
Collaborator

Hi @arteam, I've created a changelog YAML for you.

@arteam
Copy link
Contributor Author

arteam commented Jan 25, 2022

I wonder if we can workaround this and add a QA test for searchable snapshots that would reuse the AbstractSearchableSnapshotsRestTestCase (without verifyng the repository) and the fixture used in discovery-gce ?
I think it is worth to have such a test so that we don't reintroduce such bugs in the future. Note that this can be done in a follow up PR. What do you think?

I think it's a good idea! Unfortunately, I've been unsuccessful in trying to come up with a test reproducing the failure. Even i I disable the repository verification, the client still gets called correctly under the SecurityManager via

org.elasticsearch.repositories.gcs.SocketAccess.doPrivilegedVoidIOException(SocketAccess.java:42)
org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore.listBlobsByPrefix(GoogleCloudStorageBlobStore.java:156)
org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobContainer.listBlobsByPrefix(GoogleCloudStorageBlobContainer.java:58)
org.elasticsearch.repositories.blobstore.BlobStoreRepository.listBlobsToGetLatestIndexId(BlobStoreRepository.java:2591)
org.elasticsearch.repositories.blobstore.BlobStoreRepository.latestIndexBlobId(BlobStoreRepository.java:2563)
org.elasticsearch.repositories.blobstore.BlobStoreRepository.doGetRepositoryData(BlobStoreRepository.java:1885)

We probably need some ILM based test for recovery if want to reproduce the issue to trigger the following failing path

"at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobContainer.readBlob(GoogleCloudStorageBlobContainer.java:63) [repository-gcs-7.16.1.jar:7.16.1]",
"at org.elasticsearch.common.blobstore.support.FilterBlobContainer.readBlob(FilterBlobContainer.java:48) [elasticsearch-7.16.1.jar:7.16.1]",
"at org.elasticsearch.xpack.searchablesnapshots.store.SearchableSnapshotDirectory$RateLimitingBlobContainer.readBlob(SearchableSnapshotDirectory.java:763) [searchable-snapshots-7.16.1.jar:7.16.1]",
"at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.read(ChecksumBlobStoreFormat.java:88) [elasticsearch-7.16.1.jar:7.16.1]",
"at org.elasticsearch.repositories.blobstore.BlobStoreRepository.loadShardSnapshot(BlobStoreRepository.java:3367) [elasticsearch-7.16.1.jar:7.16.1]",
"at org.elasticsearch.xpack.searchablesnapshots.store.SearchableSnapshotDirectory.lambda$create$15(SearchableSnapshotDirectory.java:667) [searchable-snapshots-7.16.1.jar:7.16.1]",
"at org.elasticsearch.common.util.LazyInitializable.maybeCompute(LazyInitializable.java:92) [elasticsearch-7.16.1.jar:7.16.1]",
"at org.elasticsearch.common.util.LazyInitializable.getOrCompute(LazyInitializable.java:70) [elasticsearch-7.16.1.jar:7.16.1]",
"at org.elasticsearch.xpack.searchablesnapshots.store.SearchableSnapshotDirectory.loadSnapshot(SearchableSnapshotDirectory.java:228) [searchable-snapshots-7.16.1.jar:7.16.1]",
"at org.elasticsearch.xpack.searchablesnapshots.allocation.SearchableSnapshotIndexEventListener.ensureSnapshotIsLoaded(SearchableSnapshotIndexEventListener.java:75) [searchable-snapshots-7.16.1.jar:7.16.1]",
"at org.elasticsearch.xpack.searchablesnapshots.allocation.SearchableSnapshotIndexEventListener.beforeIndexShardRecovery(SearchableSnapshotIndexEventListener.java:67) [searchable-snapshots-7.16.1.jar:7.16.1]",
"at org.elasticsearch.index.CompositeIndexEventListener.beforeIndexShardRecovery(CompositeIndexEventListener.java:276) [elasticsearch-7.16.1.jar:7.16.1]",
"at org.elasticsearch.index.shard.IndexShard.preRecovery(IndexShard.java:1733) [elasticsearch-7.16.1.jar:7.16.1]",
"at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:505) [elasticsearch-7.16.1.jar:7.16.1]",
"at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:301) [elasticsearch-7.16.1.jar:7.16.1]",
"at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:2358) [elasticsearch-7.16.1.jar:7.16.1]",
"at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$17(IndexShard.java:3160) [elasticsearch-7.16.1.jar:7.16.1]",

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we can merge the fix and work on a more specialized test in a follow up

@arteam arteam merged commit 706281a into elastic:master Jan 26, 2022
@arteam
Copy link
Contributor Author

arteam commented Jan 26, 2022

Thanks, Tanguy!

arteam added a commit to arteam/elasticsearch that referenced this pull request Jan 26, 2022
* Support GKE Workload Identity for Searchable Snapshots

Searchable snapshots perform naked calls of `GoogleCloudStorageBlobContainer#readBlob` without the Security Manager. The
client fails to get Compute Engine credentials because of that. It works for normal snapshot/restore because they
do a privileged call of `GoogleCloudStorageBlobStore.writeBlob` during the verification of the repo.

The simplest fix is just to make sure `ServiceOptions.getDefaultProjectId` and `GoogleCredentials::getApplicationDefault`
are get called under the SecurityManager (which they should because they perform network calls).

Unfortunately, we can't write an integration test for the issue, because the test framework does the repo verification
automatically, which works around the bug. Writing a unit test also seems not possible, because
`ComputeEngineCredentials#getMetadataServerUrl` relies on the `GCE_METADATA_HOST` environment variable.

See elastic/cloud-on-k8s#5230

Resolves elastic#82702
arteam added a commit to arteam/elasticsearch that referenced this pull request Jan 26, 2022
* Support GKE Workload Identity for Searchable Snapshots

Searchable snapshots perform naked calls of `GoogleCloudStorageBlobContainer#readBlob` without the Security Manager. The
client fails to get Compute Engine credentials because of that. It works for normal snapshot/restore because they
do a privileged call of `GoogleCloudStorageBlobStore.writeBlob` during the verification of the repo.

The simplest fix is just to make sure `ServiceOptions.getDefaultProjectId` and `GoogleCredentials::getApplicationDefault`
are get called under the SecurityManager (which they should because they perform network calls).

Unfortunately, we can't write an integration test for the issue, because the test framework does the repo verification
automatically, which works around the bug. Writing a unit test also seems not possible, because
`ComputeEngineCredentials#getMetadataServerUrl` relies on the `GCE_METADATA_HOST` environment variable.

See elastic/cloud-on-k8s#5230

Resolves elastic#82702
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.0
7.17

@arteam arteam deleted the support-worload-idenity-tokens-searchable-snapshots branch January 26, 2022 09:07
arteam added a commit that referenced this pull request Jan 26, 2022
* Support GKE Workload Identity for Searchable Snapshots

Searchable snapshots perform naked calls of `GoogleCloudStorageBlobContainer#readBlob` without the Security Manager. The
client fails to get Compute Engine credentials because of that. It works for normal snapshot/restore because they
do a privileged call of `GoogleCloudStorageBlobStore.writeBlob` during the verification of the repo.

The simplest fix is just to make sure `ServiceOptions.getDefaultProjectId` and `GoogleCredentials::getApplicationDefault`
are get called under the SecurityManager (which they should because they perform network calls).

Unfortunately, we can't write an integration test for the issue, because the test framework does the repo verification
automatically, which works around the bug. Writing a unit test also seems not possible, because
`ComputeEngineCredentials#getMetadataServerUrl` relies on the `GCE_METADATA_HOST` environment variable.

See elastic/cloud-on-k8s#5230

Resolves #82702
arteam added a commit to arteam/elasticsearch that referenced this pull request Jan 26, 2022
… explicetly

After we fixed getting application credentials in a GCE environment in elastic#82974, we can actually
get credentials set automatically when creating a new GCS client

Fixes elastic#83131
arteam added a commit that referenced this pull request Jan 26, 2022
… explicetly (#83139)

After we fixed getting application credentials in a GCE environment in #82974, we can actually
get credentials set automatically when creating a new GCS client

Fixes #83131
arteam added a commit to arteam/elasticsearch that referenced this pull request Jan 26, 2022
… explicetly (elastic#83139)

After we fixed getting application credentials in a GCE environment in elastic#82974, we can actually
get credentials set automatically when creating a new GCS client

Fixes elastic#83131
arteam added a commit that referenced this pull request Jan 26, 2022
… explicetly (#83139) (#83147)

After we fixed getting application credentials in a GCE environment in #82974, we can actually
get credentials set automatically when creating a new GCS client

Fixes #83131
arteam added a commit that referenced this pull request Jan 26, 2022
Searchable snapshots perform naked calls of `GoogleCloudStorageBlobContainer#readBlob` without the Security Manager. The
client fails to get Compute Engine credentials because of that. It works for normal snapshot/restore because they
do a privileged call of `GoogleCloudStorageBlobStore.writeBlob` during the verification of the repo.

The simplest fix is just to make sure `ServiceOptions.getDefaultProjectId` and `GoogleCredentials::getApplicationDefault`
are get called under the SecurityManager (which they should because they perform network calls).

Unfortunately, we can't write an integration test for the issue, because the test framework does the repo verification
automatically, which works around the bug. Writing a unit test also seems not possible, because
`ComputeEngineCredentials#getMetadataServerUrl` relies on the `GCE_METADATA_HOST` environment variable.

See elastic/cloud-on-k8s#5230

Resolves #82702
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.17.0 v8.0.0-rc2 v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SearchableSnapshot: GKE Workload Identity and GCS repository plugin
6 participants