Skip to content

Conversation

@isaacaflores2
Copy link
Contributor

@isaacaflores2 isaacaflores2 commented May 20, 2025

Proposed commit message

apm: Add policy variable tail_discard_on_write_failure to configure apm-server.sampling.tail.discard_on_write_failure.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
    • Version constraints will be updated to much the below PR for the apm til config
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

How to test this PR locally

  1. Validated the config using elastic-packge to install the packaged. Used the Kibana UI to add the APM Integration. Verified the agent policy contained the correct config
  2. Installed elastic agent locally. Validated the computed-config.yaml and validated the apm-server behavior when discard_on_write_failure was enabled and disabled

Related issues

Screenshots

Agent Policy

Screenshot 2025-05-20 at 11 45 41 AM

Elastic Agent Logs

discard on write enabled

  • computed-config.yaml:
 sampling:
            tail:
                discard_on_write_failure: true
                enabled: true
                interval: 1m
                policies:
                    - sample_rate: 0.1
                storage_limit: 1B
                ttl: 30m
  • Traces should be discarded. Verified log shows discarding by default:
{
  "log.level": "info",
  "@timestamp": "2025-05-19T19:29:34.603Z",
  "message": "processing trace failed, discarding by default",
  "component": {
    "binary": "apm-server",
    "dataset": "elastic_agent.apm_server",
    "id": "apm-default",
    "type": "apm"
  },
  "log": {
    "source": "apm-default"
  },
  "log.logger": "sampling",
  "log.origin": {
    "file.line": 151,
    "file.name": "sampling/processor.go",
    "function": "github.com/elastic/apm-server/x-pack/apm-server/sampling.(*Processor).ProcessBatch"
  },
  "service.name": "apm-server",
  "ecs.version": "1.6.0"
}

discard on write disabled

  • computed-config.yaml:
 sampling:
            tail:
                discard_on_write_failure: false
                enabled: true
                interval: 1m
                policies:
                    - sample_rate: 0.1
                storage_limit: 1B
                ttl: 30m
  • Traces should be sampled. Verified log shows indexing by default
{
  "log.level": "info",
  "@timestamp": "2025-05-19T19:54:16.723Z",
  "message": "processing trace failed, indexing by default",
  "component": {
    "binary": "apm-server",
    "dataset": "elastic_agent.apm_server",
    "id": "apm-default",
    "type": "apm"
  },
  "log": {
    "source": "apm-default"
  },
  "log.logger": "sampling",
  "log.origin": {
    "file.line": 154,
    "file.name": "sampling/processor.go",
    "function": "github.com/elastic/apm-server/x-pack/apm-server/sampling.(*Processor).ProcessBatch"
  },
  "service.name": "apm-server",
  "ecs.version": "1.6.0"
}

@isaacaflores2 isaacaflores2 requested a review from a team as a code owner May 20, 2025 18:54
@isaacaflores2 isaacaflores2 added the Integration:apm Elastic APM label May 20, 2025
@isaacaflores2 isaacaflores2 force-pushed the apm-tbs-discard-on-write-fail branch from c728162 to 0d19d3e Compare May 21, 2025 19:09
@elasticmachine
Copy link

💚 Build Succeeded

History

@elastic-sonarqube
Copy link

Copy link
Member

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks

@isaacaflores2 isaacaflores2 merged commit 9f744b6 into elastic:main May 23, 2025
8 checks passed
v1v added a commit to v1v/integrations that referenced this pull request May 26, 2025
* main: (42 commits)
  [jamf_pro] Fix `flattened` field types for non-object values (elastic#13985)
  [Netskope Alerts] Add text multi-field to netskope.alerts.breach.description field (elastic#13977)
  zscaler_zia: add strict field template mode for tcp and http_endpoint input data streams (elastic#13904)
  apm: Add config for tail-based sampling discard on write (elastic#13950)
  [CI] Add dev/coverage into backport script (elastic#13987)
  Update configuration updatecli for 8.x snapshot (elastic#13981)
  [Prometheus] Add username, password, and SSL related fields for query dataset (elastic#13969)
  o365: Ignore failures in rename processors for organization fields (elastic#13983)
  aws.firewall: Document ingested log types of AWS Network Firewall (elastic#13978)
  mimecast: resolve field data type conflicts between data streams (elastic#13825)
  [Infoblox NIOS] Handle the parsing of IPv6 address (elastic#13947)
  [Cribl] Fix handling of metric event type (elastic#13930)
  zscaler_zpa: fix handling of multiple remote IPs, and event categorisation (elastic#13755)
  Adding agentless deployment to the sublime security integration (elastic#13963)
  [integration/system] add use_performance_counters in system integration (elastic#13150)
  crowdstrike,m365_defender,microsoft_defender_{cloud,endpoint},sentinel_one: normalise severity handling (elastic#13955)
  [forgerock] Map `forgerock.response.elapsedTime` as a long not a date (elastic#13959)
  github: squelch errors from pagination ends (elastic#13965)
  cisco_secure_endpoint: squelch errors from pagination ends (elastic#13964)
  [Cloud Security] Cloud Asset Inventory:  fixed cloud formation URL (elastic#13971)
  ...
v1v added a commit that referenced this pull request May 26, 2025
* feature/use-google-secrets: (43 commits)
  use -ci account
  [jamf_pro] Fix `flattened` field types for non-object values (#13985)
  [Netskope Alerts] Add text multi-field to netskope.alerts.breach.description field (#13977)
  zscaler_zia: add strict field template mode for tcp and http_endpoint input data streams (#13904)
  apm: Add config for tail-based sampling discard on write (#13950)
  [CI] Add dev/coverage into backport script (#13987)
  Update configuration updatecli for 8.x snapshot (#13981)
  [Prometheus] Add username, password, and SSL related fields for query dataset (#13969)
  o365: Ignore failures in rename processors for organization fields (#13983)
  aws.firewall: Document ingested log types of AWS Network Firewall (#13978)
  mimecast: resolve field data type conflicts between data streams (#13825)
  [Infoblox NIOS] Handle the parsing of IPv6 address (#13947)
  [Cribl] Fix handling of metric event type (#13930)
  zscaler_zpa: fix handling of multiple remote IPs, and event categorisation (#13755)
  Adding agentless deployment to the sublime security integration (#13963)
  [integration/system] add use_performance_counters in system integration (#13150)
  crowdstrike,m365_defender,microsoft_defender_{cloud,endpoint},sentinel_one: normalise severity handling (#13955)
  [forgerock] Map `forgerock.response.elapsedTime` as a long not a date (#13959)
  github: squelch errors from pagination ends (#13965)
  cisco_secure_endpoint: squelch errors from pagination ends (#13964)
  ...
carsonip added a commit that referenced this pull request May 28, 2025
Backport 2 PRs from 9.1 to 8.19:
- TBS ttl config #13348
- TBS discard_write_on_failure config #13950
anupratharamachandran pushed a commit to anupratharamachandran/integrations that referenced this pull request Jun 2, 2025
* apm: Add config for tail-based sampling discard on write

* Add changelog and link PR
@rubvs
Copy link

rubvs commented Jun 20, 2025

Tested manually:

  • Spin up a deployment in ECH with 9.1.0-SNAPSHOT

  • Install a local EA in docker as outlined in Docs

  • Send data to EA endpoint: https://localhost:8200/intake/v2/events

  • Observe output in Discovery with discard_on_write_failure: false

Screenshot 2025-06-20 at 10 09 34 AM
  • Update policy manually via Dev Tools to set discard_on_write_failure: true
PUT kbn:/api/fleet/package_policies/226a181e-c1e9-43ef-a9f1-ba8cfd2a191c
{
  "inputs": [
    {
      "type": "apm",
      "policy_template": "apmserver",
      "enabled": true,
      "config": {
        "apm-server": {
          "value": {
            "rum": {
              "source_mapping": {
                "metadata": [],
                "elasticsearch": {
                  "api_key": "7S8SjpcBBuYedCQP-s1L:8LkMGRCOGI65djTUV7hDOg"
                }
              }
            },
            "agent_config": [],
            "agent": {
              "config": {
                "elasticsearch": {
                  "api_key": "EaISjpcBF4fUUaI19v1i:CQlR906yj6YUccm4gddYpA"
                }
              }
            }
          }
        }
      },
      "streams": [],
      "vars": {
        "host": {
          "value": "localhost:8200",
          "type": "text"
        },
        "url": {
          "value": "http://localhost:8200",
          "type": "text"
        },
        "secret_token": {
          "type": "text"
        },
        "api_key_enabled": {
          "value": false,
          "type": "bool"
        },
        "enable_rum": {
          "value": true,
          "type": "bool"
        },
        "anonymous_enabled": {
          "value": true,
          "type": "bool"
        },
        "anonymous_allow_agent": {
          "value": [
            "rum-js",
            "js-base",
            "iOS/swift"
          ],
          "type": "text"
        },
        "anonymous_allow_service": {
          "value": [],
          "type": "text"
        },
        "anonymous_rate_limit_event_limit": {
          "value": 300,
          "type": "integer"
        },
        "anonymous_rate_limit_ip_limit": {
          "value": 1000,
          "type": "integer"
        },
        "default_service_environment": {
          "type": "text"
        },
        "rum_allow_origins": {
          "value": [
            "\"*\""
          ],
          "type": "text"
        },
        "rum_allow_headers": {
          "value": [],
          "type": "text"
        },
        "rum_response_headers": {
          "type": "yaml"
        },
        "rum_library_pattern": {
          "value": "\"node_modules|bower_components|~\"",
          "type": "text"
        },
        "rum_exclude_from_grouping": {
          "value": "\"^/webpack\"",
          "type": "text"
        },
        "api_key_limit": {
          "value": 100,
          "type": "integer"
        },
        "max_event_bytes": {
          "value": 307200,
          "type": "integer"
        },
        "capture_personal_data": {
          "value": true,
          "type": "bool"
        },
        "max_header_bytes": {
          "value": 1048576,
          "type": "integer"
        },
        "idle_timeout": {
          "value": "45s",
          "type": "text"
        },
        "read_timeout": {
          "value": "3600s",
          "type": "text"
        },
        "shutdown_timeout": {
          "value": "30s",
          "type": "text"
        },
        "write_timeout": {
          "value": "30s",
          "type": "text"
        },
        "max_connections": {
          "value": 0,
          "type": "integer"
        },
        "response_headers": {
          "type": "yaml"
        },
        "expvar_enabled": {
          "value": false,
          "type": "bool"
        },
        "pprof_enabled": {
          "value": false,
          "type": "bool"
        },
        "java_attacher_discovery_rules": {
          "type": "yaml"
        },
        "java_attacher_agent_version": {
          "type": "text"
        },
        "java_attacher_enabled": {
          "value": false,
          "type": "bool"
        },
        "tls_enabled": {
          "value": false,
          "type": "bool"
        },
        "tls_certificate": {
          "type": "text"
        },
        "tls_key": {
          "type": "text"
        },
        "tls_supported_protocols": {
          "value": [
            "TLSv1.2",
            "TLSv1.3"
          ],
          "type": "text"
        },
        "tls_cipher_suites": {
          "value": [],
          "type": "text"
        },
        "tls_curve_types": {
          "value": [],
          "type": "text"
        },
        "tail_sampling_policies": {
          "value": "- sample_rate: 0.1\n",
          "type": "yaml"
        },
        "tail_sampling_interval": {
          "value": "1m",
          "type": "text"
        },
        "tail_sampling_ttl": {
          "value": "30m",
          "type": "text"
        },
        "tail_sampling_enabled": {
          "value": true,
          "type": "bool"
        },
        "tail_sampling_storage_limit": {
          "value": "2B",
          "type": "text"
        },
        "tail_sampling_discard_on_write_failure": {
          "value": true,
          "type": "bool"
        }
      }
    }
  ]
}
  • Observe output in Discovery
Screenshot 2025-06-20 at 3 16 50 PM

@rubvs rubvs mentioned this pull request Jun 20, 2025
17 tasks
@rubvs
Copy link

rubvs commented Jun 20, 2025

UI changed just propagated so tested via UI also in elastic/kibana#224479 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Integration:apm Elastic APM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants