[Fleet] Add `upgrade_attempts` to `.fleet-agents` index #123256

jillguyonnet · 2025-02-24T10:43:55Z

Relates https://github.com/elastic/ingest-dev/issues/4720

This PR adds a new upgrade_attempts field mapping to .fleet-agents index. It will be used to implement retries for automatic agent upgrades.

Intended format:

{
   ...
   "upgrade_attempts": [
      "2025-02-24T15:03:54Z",
      "2025-02-24T15:33:54Z",
      "2025-02-24T16:33:54Z"
   ]
   ...
}

elasticsearchmachine · 2025-02-24T10:46:10Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

jillguyonnet · 2025-02-25T14:03:22Z

@elasticmachine update branch

## Summary Relates elastic/ingest-dev#4720 This PR adds retry logic to the task that handles automatic agent upgrades originally implemented in #211019. Complementary fleet-server change which sets the agent's `upgrade_attempts` to `null` once the upgrade is complete.: elastic/fleet-server#4528 ### Approach - A new `upgrade_attempts` property is added to agents and stored in the agent doc (ES mapping update in elastic/elasticsearch#123256). - When a bulk upgrade action is sent from the automatic upgrade task, it pushes the timestamp of the upgrade to the affected agents' `upgrade_attempts`. - The default retry delays are `['30m', '1h', '2h', '4h', '8h', '16h', '24h']` and can be overridden with the new `xpack.fleet.autoUpgrades.retryDelays` setting. - On every run, the automatic upgrade task will first process retries and then query more agents if necessary (cf. elastic/ingest-dev#4720 (comment)). - Once an agent has completed and failed the max retries defined by the retry delays array, it is no longer retried. ### Testing The ES query for fetching agents with existing `upgrade_attempts` needs the updated mappings, so it might be necessary to pull the latest `main` in the `elasticsearch` repo and run `yarn es source` instead of `yarn es snapshot` (requires an up-to-date Java environment, currently 23). In order to test that `upgrade_attempts` is set to `null` when the upgrade is complete, fleet-server should be run in dev using the change in elastic/fleet-server#4528. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) ### Identify risks Low probability risk of incorrectly triggering agent upgrades. This feature is currently behind the `enableAutomaticAgentUpgrades` feature flag. --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Julia Bardi <[email protected]> Co-authored-by: Elastic Machine <[email protected]>

## Summary Relates elastic/ingest-dev#4720 This PR adds retry logic to the task that handles automatic agent upgrades originally implemented in elastic#211019. Complementary fleet-server change which sets the agent's `upgrade_attempts` to `null` once the upgrade is complete.: elastic/fleet-server#4528 ### Approach - A new `upgrade_attempts` property is added to agents and stored in the agent doc (ES mapping update in elastic/elasticsearch#123256). - When a bulk upgrade action is sent from the automatic upgrade task, it pushes the timestamp of the upgrade to the affected agents' `upgrade_attempts`. - The default retry delays are `['30m', '1h', '2h', '4h', '8h', '16h', '24h']` and can be overridden with the new `xpack.fleet.autoUpgrades.retryDelays` setting. - On every run, the automatic upgrade task will first process retries and then query more agents if necessary (cf. elastic/ingest-dev#4720 (comment)). - Once an agent has completed and failed the max retries defined by the retry delays array, it is no longer retried. ### Testing The ES query for fetching agents with existing `upgrade_attempts` needs the updated mappings, so it might be necessary to pull the latest `main` in the `elasticsearch` repo and run `yarn es source` instead of `yarn es snapshot` (requires an up-to-date Java environment, currently 23). In order to test that `upgrade_attempts` is set to `null` when the upgrade is complete, fleet-server should be run in dev using the change in elastic/fleet-server#4528. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) ### Identify risks Low probability risk of incorrectly triggering agent upgrades. This feature is currently behind the `enableAutomaticAgentUpgrades` feature flag. --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Julia Bardi <[email protected]> Co-authored-by: Elastic Machine <[email protected]>

Relates elastic/ingest-dev#4720 This PR adds retry logic to the task that handles automatic agent upgrades originally implemented in elastic#211019. Complementary fleet-server change which sets the agent's `upgrade_attempts` to `null` once the upgrade is complete.: elastic/fleet-server#4528 - A new `upgrade_attempts` property is added to agents and stored in the agent doc (ES mapping update in elastic/elasticsearch#123256). - When a bulk upgrade action is sent from the automatic upgrade task, it pushes the timestamp of the upgrade to the affected agents' `upgrade_attempts`. - The default retry delays are `['30m', '1h', '2h', '4h', '8h', '16h', '24h']` and can be overridden with the new `xpack.fleet.autoUpgrades.retryDelays` setting. - On every run, the automatic upgrade task will first process retries and then query more agents if necessary (cf. elastic/ingest-dev#4720 (comment)). - Once an agent has completed and failed the max retries defined by the retry delays array, it is no longer retried. The ES query for fetching agents with existing `upgrade_attempts` needs the updated mappings, so it might be necessary to pull the latest `main` in the `elasticsearch` repo and run `yarn es source` instead of `yarn es snapshot` (requires an up-to-date Java environment, currently 23). In order to test that `upgrade_attempts` is set to `null` when the upgrade is complete, fleet-server should be run in dev using the change in elastic/fleet-server#4528. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) Low probability risk of incorrectly triggering agent upgrades. This feature is currently behind the `enableAutomaticAgentUpgrades` feature flag. --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Julia Bardi <[email protected]> Co-authored-by: Elastic Machine <[email protected]>

[Fleet] Add upgrade_attemps to .fleet-agents index

128d2ec

jillguyonnet self-assigned this Feb 24, 2025

jillguyonnet added Team:Fleet >non-issue labels Feb 24, 2025

jillguyonnet requested a review from juliaElastic February 24, 2025 10:44

elasticsearchmachine added v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Feb 24, 2025

jillguyonnet changed the title ~~[Fleet] Add upgrade_attemps to .fleet-agents index~~ [Fleet] Add upgrade_attempts to .fleet-agents index Feb 24, 2025

jillguyonnet added the :Core/Infra/Plugins Plugin API and infrastructure label Feb 24, 2025

elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Feb 24, 2025

juliaElastic approved these changes Feb 24, 2025

View reviewed changes

Change type to date

0208590

jillguyonnet requested a review from juliaElastic February 25, 2025 10:52

juliaElastic approved these changes Feb 25, 2025

View reviewed changes

Merge branch 'main' into fleet/4720-Add-upgrade_attempts-to-fleet-agents

341c01f

jillguyonnet merged commit 28f4d87 into elastic:main Feb 25, 2025
17 checks passed

jillguyonnet deleted the fleet/4720-Add-upgrade_attempts-to-fleet-agents branch February 25, 2025 15:38

This was referenced Feb 28, 2025

[Fleet] Add retry logic to automatic agent upgrades elastic/kibana#212744

Merged

Clear agent.upgrade_attempts on upgrade complete elastic/fleet-server#4528

Merged

juliaElastic added v8.19.0 auto-backport Automatically create backport pull requests when merged labels Apr 8, 2025

juliaElastic mentioned this pull request Apr 8, 2025

[8.x] [Fleet] Add upgrade_attempts to .fleet-agents index #126450

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Fleet] Add `upgrade_attempts` to `.fleet-agents` index #123256

[Fleet] Add `upgrade_attempts` to `.fleet-agents` index #123256

Uh oh!

jillguyonnet commented Feb 24, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Feb 24, 2025

Uh oh!

jillguyonnet commented Feb 25, 2025

Uh oh!

Uh oh!

Uh oh!

[Fleet] Add upgrade_attempts to .fleet-agents index #123256

[Fleet] Add upgrade_attempts to .fleet-agents index #123256

Uh oh!

Conversation

jillguyonnet commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Feb 24, 2025

Uh oh!

jillguyonnet commented Feb 25, 2025

Uh oh!

Uh oh!

Uh oh!

[Fleet] Add `upgrade_attempts` to `.fleet-agents` index #123256

[Fleet] Add `upgrade_attempts` to `.fleet-agents` index #123256

jillguyonnet commented Feb 24, 2025 •

edited

Loading