[DRAFT] Retry storage credential create/update to allow for IAM propagation #1454

neinkeinkaffee · 2022-07-12T21:06:22Z

Retry the create/update operation for storage credentials if the error indicates that we may have to allow more time for IAM propagation. The timeout of 2 minutes could be further decreased, the current number is taken from corresponding timeouts for the IAM propagation case in the AWS Terraform provider.

codecov-commenter · 2022-07-12T21:10:41Z

Codecov Report

Merging #1454 (f08833a) into master (73d5b17) will decrease coverage by 0.40%.
The diff coverage is 98.07%.

@@            Coverage Diff             @@
##           master    #1454      +/-   ##
==========================================
- Coverage   90.12%   89.71%   -0.41%     
==========================================
  Files         126      129       +3     
  Lines       10218    10446     +228     
==========================================
+ Hits         9209     9372     +163     
- Misses        642      695      +53     
- Partials      367      379      +12

Impacted Files	Coverage Δ
common/http.go	`98.01% <ø> (-0.01%)`	⬇️
mws/resource_mws_credentials.go	`86.95% <96.29%> (+2.95%)`	⬆️
catalog/resource_storage_credential.go	`100.00% <100.00%> (ø)`
exporter/importables.go	`85.31% <0.00%> (-5.92%)`	⬇️
exporter/util.go	`79.55% <0.00%> (-1.13%)`	⬇️
pipelines/resource_pipeline.go	`92.17% <0.00%> (-0.97%)`	⬇️
sql/resource_sql_widget.go	`87.50% <0.00%> (-0.70%)`	⬇️
sql/resource_sql_endpoint.go	`97.43% <0.00%> (-0.13%)`	⬇️
mws/resource_mws_workspaces.go	`90.15% <0.00%> (-0.12%)`	⬇️
... and 14 more

nfx · 2022-07-12T21:11:50Z

catalog/resource_storage_credential.go

+	return resource.RetryContext(ctx, IAMPropagationTimeout,
+		func() *resource.RetryError {
+			cerr := f()
+			if e, ok := cerr.(common.APIError); ok && e.StatusCode == 403 && strings.Contains(cerr.Error(), "IAM role") {


Please make error message matching a bit more concrete

I'm now matching on a larger portion of the error message, maybe still not ideal, but hopefully a bit better than before

nfx · 2022-07-12T21:14:31Z

catalog/resource_storage_credential.go

 }
+
+func waitIAMPropagation(ctx context.Context, c *common.DatabricksClient, f func() error) error {
+	return resource.RetryContext(ctx, IAMPropagationTimeout,


There’s a concept of “resource timeouts” that are configurable on instance levels. Take a look at sql endpoints, library, cluster, workspace resources to see how to add them. You’ll have to change method signature to accept that.

Now using the resource timeouts which I've set to 2 minutes. That should be fine for storage credential, I'm just wondering whether in general the IAM propagation timeout might potentially sometimes have to be more conservative than the general resource timeouts. If the IAM role takes more than 2 minutes than maybe something's really wrong, and the role isn't there or not properly configured. Whereas some resources may want to have longer timeouts for creation and update.

nfx · 2022-07-12T21:15:23Z

catalog/resource_storage_credential.go

 	}.ToResource()
 }
+
+func waitIAMPropagation(ctx context.Context, c *common.DatabricksClient, f func() error) error {


Rename method to include retry

neinkeinkaffee · 2022-07-23T18:37:27Z

I have moved the retryOnError method into http.go in the common package in order to reuse it for the MWS credential. I'm not sure if that's the best location. I'm also wondering whether it might be better to keep the value of the timeout duration for IAM propagation separate from the resource timeouts for creation and update, given that one might not want to retry on IAM error until the overall resource creation/update timeout limit is reached. It would have the additional benefit of not having to pass down the resource timeout.

nfx · 2022-07-23T19:05:27Z

@neinkeinkaffee I’m not sure if common package is the right place for it. Cost of refactoring/supporting it in the common package is higher than copy paste of approximately 10 lines. So far i see it used only two times (for IAM roles). This is really retry-and-wait for error that might be a valid error after one or two minutes. Common package does another kind of error retries, which are immediate. More than that, common package is being refactored into standalone SDK as we speak, so I don’t want more things added over there.

bottom line: these retries are great, but I think they belong to mws and catalog packages.

neinkeinkaffee · 2022-07-23T20:01:39Z

Makes sense! I've moved the retry code back into the catalog and mws.

oleksandr-gubchenko · 2023-04-19T14:14:55Z

any updates on this?

Retry storage credential create/update to allow for IAM propagation

e0946a2

neinkeinkaffee mentioned this pull request Jul 12, 2022

[ISSUE] Cannot create databricks_mws_credentials because aws_iam not ready #1424

Open

nfx reviewed Jul 12, 2022

View reviewed changes

neinkeinkaffee added 5 commits July 13, 2022 08:29

Use resource timeout and match on larger portion of error message

97e7d08

Set resource timeout of 2 minutes for Create and Update

88fad80

Move retryOnError into common package and export

a28d834

Move IAM error retry on resource creation into API client command

23e2b50

Retry mws credential creation to allow for IAM propagation

e50cca0

Move retryOnError into catalog and mws and out of common

f08833a

nfx linked an issue Jul 25, 2022 that may be closed by this pull request

[ISSUE] Cannot create databricks_mws_credentials because aws_iam not ready #1424

Open

pietern assigned nfx Sep 2, 2022

pietern requested review from a team as code owners June 5, 2024 08:35

pietern requested review from hectorcast-db and removed request for a team June 5, 2024 08:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT] Retry storage credential create/update to allow for IAM propagation #1454

[DRAFT] Retry storage credential create/update to allow for IAM propagation #1454

Uh oh!

neinkeinkaffee commented Jul 12, 2022

Uh oh!

codecov-commenter commented Jul 12, 2022 •

edited

Loading

Uh oh!

nfx Jul 12, 2022

Uh oh!

neinkeinkaffee Jul 13, 2022

Uh oh!

nfx Jul 12, 2022

Uh oh!

neinkeinkaffee Jul 13, 2022

Uh oh!

nfx Jul 12, 2022

Uh oh!

neinkeinkaffee Jul 13, 2022

Uh oh!

neinkeinkaffee commented Jul 23, 2022

Uh oh!

nfx commented Jul 23, 2022

Uh oh!

neinkeinkaffee commented Jul 23, 2022

Uh oh!

oleksandr-gubchenko commented Apr 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[DRAFT] Retry storage credential create/update to allow for IAM propagation #1454

Are you sure you want to change the base?

[DRAFT] Retry storage credential create/update to allow for IAM propagation #1454

Uh oh!

Conversation

neinkeinkaffee commented Jul 12, 2022

Uh oh!

codecov-commenter commented Jul 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nfx Jul 12, 2022

Choose a reason for hiding this comment

Uh oh!

neinkeinkaffee Jul 13, 2022

Choose a reason for hiding this comment

Uh oh!

nfx Jul 12, 2022

Choose a reason for hiding this comment

Uh oh!

neinkeinkaffee Jul 13, 2022

Choose a reason for hiding this comment

Uh oh!

nfx Jul 12, 2022

Choose a reason for hiding this comment

Uh oh!

neinkeinkaffee Jul 13, 2022

Choose a reason for hiding this comment

Uh oh!

neinkeinkaffee commented Jul 23, 2022

Uh oh!

nfx commented Jul 23, 2022

Uh oh!

neinkeinkaffee commented Jul 23, 2022

Uh oh!

oleksandr-gubchenko commented Apr 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Jul 12, 2022 •

edited

Loading