Skip to content

Conversation

jillguyonnet
Copy link
Contributor

@jillguyonnet jillguyonnet commented Feb 24, 2025

Relates https://github.com/elastic/ingest-dev/issues/4720

This PR adds a new upgrade_attempts field mapping to .fleet-agents index. It will be used to implement retries for automatic agent upgrades.

Intended format:

{
   ...
   "upgrade_attempts": [
      "2025-02-24T15:03:54Z",
      "2025-02-24T15:33:54Z",
      "2025-02-24T16:33:54Z"
   ]
   ...
}

@jillguyonnet jillguyonnet self-assigned this Feb 24, 2025
@elasticsearchmachine elasticsearchmachine added v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Feb 24, 2025
@jillguyonnet jillguyonnet changed the title [Fleet] Add upgrade_attemps to .fleet-agents index [Fleet] Add upgrade_attempts to .fleet-agents index Feb 24, 2025
@jillguyonnet jillguyonnet added the :Core/Infra/Plugins Plugin API and infrastructure label Feb 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Feb 24, 2025
@jillguyonnet
Copy link
Contributor Author

@elasticmachine update branch

@jillguyonnet jillguyonnet merged commit 28f4d87 into elastic:main Feb 25, 2025
17 checks passed
@jillguyonnet jillguyonnet deleted the fleet/4720-Add-upgrade_attempts-to-fleet-agents branch February 25, 2025 15:38
jillguyonnet added a commit to elastic/kibana that referenced this pull request Mar 6, 2025
## Summary

Relates elastic/ingest-dev#4720

This PR adds retry logic to the task that handles automatic agent
upgrades originally implemented in
#211019.

Complementary fleet-server change which sets the agent's
`upgrade_attempts` to `null` once the upgrade is complete.:
elastic/fleet-server#4528

### Approach

- A new `upgrade_attempts` property is added to agents and stored in the
agent doc (ES mapping update in
elastic/elasticsearch#123256).
- When a bulk upgrade action is sent from the automatic upgrade task, it
pushes the timestamp of the upgrade to the affected agents'
`upgrade_attempts`.
- The default retry delays are `['30m', '1h', '2h', '4h', '8h', '16h',
'24h']` and can be overridden with the new
`xpack.fleet.autoUpgrades.retryDelays` setting.
- On every run, the automatic upgrade task will first process retries
and then query more agents if necessary (cf.
elastic/ingest-dev#4720 (comment)).
- Once an agent has completed and failed the max retries defined by the
retry delays array, it is no longer retried.

### Testing

The ES query for fetching agents with existing `upgrade_attempts` needs
the updated mappings, so it might be necessary to pull the latest `main`
in the `elasticsearch` repo and run `yarn es source` instead of `yarn es
snapshot` (requires an up-to-date Java environment, currently 23).

In order to test that `upgrade_attempts` is set to `null` when the
upgrade is complete, fleet-server should be run in dev using the change
in elastic/fleet-server#4528.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

### Identify risks

Low probability risk of incorrectly triggering agent upgrades. This
feature is currently behind the `enableAutomaticAgentUpgrades` feature
flag.

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Julia Bardi <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this pull request Mar 22, 2025
## Summary

Relates elastic/ingest-dev#4720

This PR adds retry logic to the task that handles automatic agent
upgrades originally implemented in
elastic#211019.

Complementary fleet-server change which sets the agent's
`upgrade_attempts` to `null` once the upgrade is complete.:
elastic/fleet-server#4528

### Approach

- A new `upgrade_attempts` property is added to agents and stored in the
agent doc (ES mapping update in
elastic/elasticsearch#123256).
- When a bulk upgrade action is sent from the automatic upgrade task, it
pushes the timestamp of the upgrade to the affected agents'
`upgrade_attempts`.
- The default retry delays are `['30m', '1h', '2h', '4h', '8h', '16h',
'24h']` and can be overridden with the new
`xpack.fleet.autoUpgrades.retryDelays` setting.
- On every run, the automatic upgrade task will first process retries
and then query more agents if necessary (cf.
elastic/ingest-dev#4720 (comment)).
- Once an agent has completed and failed the max retries defined by the
retry delays array, it is no longer retried.

### Testing

The ES query for fetching agents with existing `upgrade_attempts` needs
the updated mappings, so it might be necessary to pull the latest `main`
in the `elasticsearch` repo and run `yarn es source` instead of `yarn es
snapshot` (requires an up-to-date Java environment, currently 23).

In order to test that `upgrade_attempts` is set to `null` when the
upgrade is complete, fleet-server should be run in dev using the change
in elastic/fleet-server#4528.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

### Identify risks

Low probability risk of incorrectly triggering agent upgrades. This
feature is currently behind the `enableAutomaticAgentUpgrades` feature
flag.

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Julia Bardi <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
@juliaElastic juliaElastic added v8.19.0 auto-backport Automatically create backport pull requests when merged labels Apr 8, 2025
juliaElastic added a commit to juliaElastic/kibana that referenced this pull request Apr 8, 2025
Relates elastic/ingest-dev#4720

This PR adds retry logic to the task that handles automatic agent
upgrades originally implemented in
elastic#211019.

Complementary fleet-server change which sets the agent's
`upgrade_attempts` to `null` once the upgrade is complete.:
elastic/fleet-server#4528

- A new `upgrade_attempts` property is added to agents and stored in the
agent doc (ES mapping update in
elastic/elasticsearch#123256).
- When a bulk upgrade action is sent from the automatic upgrade task, it
pushes the timestamp of the upgrade to the affected agents'
`upgrade_attempts`.
- The default retry delays are `['30m', '1h', '2h', '4h', '8h', '16h',
'24h']` and can be overridden with the new
`xpack.fleet.autoUpgrades.retryDelays` setting.
- On every run, the automatic upgrade task will first process retries
and then query more agents if necessary (cf.
elastic/ingest-dev#4720 (comment)).
- Once an agent has completed and failed the max retries defined by the
retry delays array, it is no longer retried.

The ES query for fetching agents with existing `upgrade_attempts` needs
the updated mappings, so it might be necessary to pull the latest `main`
in the `elasticsearch` repo and run `yarn es source` instead of `yarn es
snapshot` (requires an up-to-date Java environment, currently 23).

In order to test that `upgrade_attempts` is set to `null` when the
upgrade is complete, fleet-server should be run in dev using the change
in elastic/fleet-server#4528.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

Low probability risk of incorrectly triggering agent upgrades. This
feature is currently behind the `enableAutomaticAgentUpgrades` feature
flag.

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Julia Bardi <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged :Core/Infra/Plugins Plugin API and infrastructure external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue Team:Core/Infra Meta label for core/infra team Team:Fleet v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants