Description
I tried to access the slack space to do this question, but I couldn't login there. seems that we need an invite.
I'm using Failsafe's Retry, Timeout and Fallback for some time, and they are working great for the majority of my team's use cases.
But I have a use case were they are not fitting as we need.
We have some process chains that depends on another team's jobs that send us (using ssh) some files which we need to use.
Sometimes those files are not sent in the proper time, so a failure occurs, and retried until a limited time. But sometimes the resolution on the origin takes longer than the values we set for retries and timeout.
I'm thinking that one solution would be to increase the timeout and the delay between retries, or even cancel the retries (to use FallBack), but at runtime.
It would be possible to notify our monitoring tool through its API when an error have occurred, and also implement a rest to retrieve parameters, but how could we have Timeout and Retry policies modified and used after they have started?
Ideas and alternatives are welcome too :)