Skip to content

[Fleet] Add upgrade_attempts to .fleet-agents index #123256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jillguyonnet
Copy link
Contributor

@jillguyonnet jillguyonnet commented Feb 24, 2025

Relates https://github.com/elastic/ingest-dev/issues/4720

This PR adds a new upgrade_attempts field mapping to .fleet-agents index. It will be used to implement retries for automatic agent upgrades.

Intended format:

{
   ...
   "upgrade_attempts": [
      "2025-02-24T15:03:54Z",
      "2025-02-24T15:33:54Z",
      "2025-02-24T16:33:54Z"
   ]
   ...
}

@jillguyonnet jillguyonnet self-assigned this Feb 24, 2025
@elasticsearchmachine elasticsearchmachine added v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Feb 24, 2025
@jillguyonnet jillguyonnet changed the title [Fleet] Add upgrade_attemps to .fleet-agents index [Fleet] Add upgrade_attempts to .fleet-agents index Feb 24, 2025
@jillguyonnet jillguyonnet added the :Core/Infra/Plugins Plugin API and infrastructure label Feb 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Feb 24, 2025
@jillguyonnet
Copy link
Contributor Author

@elasticmachine update branch

@jillguyonnet jillguyonnet merged commit 28f4d87 into elastic:main Feb 25, 2025
17 checks passed
@jillguyonnet jillguyonnet deleted the fleet/4720-Add-upgrade_attempts-to-fleet-agents branch February 25, 2025 15:38
jillguyonnet added a commit to elastic/kibana that referenced this pull request Mar 6, 2025
## Summary

Relates elastic/ingest-dev#4720

This PR adds retry logic to the task that handles automatic agent
upgrades originally implemented in
#211019.

Complementary fleet-server change which sets the agent's
`upgrade_attempts` to `null` once the upgrade is complete.:
elastic/fleet-server#4528

### Approach

- A new `upgrade_attempts` property is added to agents and stored in the
agent doc (ES mapping update in
elastic/elasticsearch#123256).
- When a bulk upgrade action is sent from the automatic upgrade task, it
pushes the timestamp of the upgrade to the affected agents'
`upgrade_attempts`.
- The default retry delays are `['30m', '1h', '2h', '4h', '8h', '16h',
'24h']` and can be overridden with the new
`xpack.fleet.autoUpgrades.retryDelays` setting.
- On every run, the automatic upgrade task will first process retries
and then query more agents if necessary (cf.
elastic/ingest-dev#4720 (comment)).
- Once an agent has completed and failed the max retries defined by the
retry delays array, it is no longer retried.

### Testing

The ES query for fetching agents with existing `upgrade_attempts` needs
the updated mappings, so it might be necessary to pull the latest `main`
in the `elasticsearch` repo and run `yarn es source` instead of `yarn es
snapshot` (requires an up-to-date Java environment, currently 23).

In order to test that `upgrade_attempts` is set to `null` when the
upgrade is complete, fleet-server should be run in dev using the change
in elastic/fleet-server#4528.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

### Identify risks

Low probability risk of incorrectly triggering agent upgrades. This
feature is currently behind the `enableAutomaticAgentUpgrades` feature
flag.

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Julia Bardi <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this pull request Mar 22, 2025
## Summary

Relates elastic/ingest-dev#4720

This PR adds retry logic to the task that handles automatic agent
upgrades originally implemented in
elastic#211019.

Complementary fleet-server change which sets the agent's
`upgrade_attempts` to `null` once the upgrade is complete.:
elastic/fleet-server#4528

### Approach

- A new `upgrade_attempts` property is added to agents and stored in the
agent doc (ES mapping update in
elastic/elasticsearch#123256).
- When a bulk upgrade action is sent from the automatic upgrade task, it
pushes the timestamp of the upgrade to the affected agents'
`upgrade_attempts`.
- The default retry delays are `['30m', '1h', '2h', '4h', '8h', '16h',
'24h']` and can be overridden with the new
`xpack.fleet.autoUpgrades.retryDelays` setting.
- On every run, the automatic upgrade task will first process retries
and then query more agents if necessary (cf.
elastic/ingest-dev#4720 (comment)).
- Once an agent has completed and failed the max retries defined by the
retry delays array, it is no longer retried.

### Testing

The ES query for fetching agents with existing `upgrade_attempts` needs
the updated mappings, so it might be necessary to pull the latest `main`
in the `elasticsearch` repo and run `yarn es source` instead of `yarn es
snapshot` (requires an up-to-date Java environment, currently 23).

In order to test that `upgrade_attempts` is set to `null` when the
upgrade is complete, fleet-server should be run in dev using the change
in elastic/fleet-server#4528.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

### Identify risks

Low probability risk of incorrectly triggering agent upgrades. This
feature is currently behind the `enableAutomaticAgentUpgrades` feature
flag.

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Julia Bardi <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
@juliaElastic juliaElastic added v8.19.0 auto-backport Automatically create backport pull requests when merged labels Apr 8, 2025
juliaElastic added a commit to juliaElastic/kibana that referenced this pull request Apr 8, 2025
Relates elastic/ingest-dev#4720

This PR adds retry logic to the task that handles automatic agent
upgrades originally implemented in
elastic#211019.

Complementary fleet-server change which sets the agent's
`upgrade_attempts` to `null` once the upgrade is complete.:
elastic/fleet-server#4528

- A new `upgrade_attempts` property is added to agents and stored in the
agent doc (ES mapping update in
elastic/elasticsearch#123256).
- When a bulk upgrade action is sent from the automatic upgrade task, it
pushes the timestamp of the upgrade to the affected agents'
`upgrade_attempts`.
- The default retry delays are `['30m', '1h', '2h', '4h', '8h', '16h',
'24h']` and can be overridden with the new
`xpack.fleet.autoUpgrades.retryDelays` setting.
- On every run, the automatic upgrade task will first process retries
and then query more agents if necessary (cf.
elastic/ingest-dev#4720 (comment)).
- Once an agent has completed and failed the max retries defined by the
retry delays array, it is no longer retried.

The ES query for fetching agents with existing `upgrade_attempts` needs
the updated mappings, so it might be necessary to pull the latest `main`
in the `elasticsearch` repo and run `yarn es source` instead of `yarn es
snapshot` (requires an up-to-date Java environment, currently 23).

In order to test that `upgrade_attempts` is set to `null` when the
upgrade is complete, fleet-server should be run in dev using the change
in elastic/fleet-server#4528.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

Low probability risk of incorrectly triggering agent upgrades. This
feature is currently behind the `enableAutomaticAgentUpgrades` feature
flag.

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Julia Bardi <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged :Core/Infra/Plugins Plugin API and infrastructure external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue Team:Core/Infra Meta label for core/infra team Team:Fleet v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants