Skip to content

Private AKS cluster, enabled outbound, metrics pod keeps failing #265

@hardik-id

Description

@hardik-id

Hello,
I have private AKS cluster with open outbound connection and UAMI. I have tried to enable managed prometheus for metrics collections but it seems to fail. Can someone help? I followed this guide https://learn.microsoft.com/en-us/azure/azure-monitor/containers/kubernetes-monitoring-enable?tabs=cli

││ prometheus-collector Error: configmap section not mounted, using defaults 
││ addon-token-adapter 2025/02/09 00:21:37 handlers.go:57: received token request, handling...                                                                                                                                                    ││ addon-token-adapter 2025/02/09 00:23:22 utils.go:35: received event type MODIFIED                                                                                                                                                              ││ addon-token-adapter 2025/02/09 00:24:51 handlers.go:57: received token request, handling...                                                                                                                                                    ││ addon-token-adapter 2025/02/09 00:25:22 utils.go:35: received event type MODIFIED                                                                                                                                                              ││ addon-token-adapter 2025/02/09 00:27:22 utils.go:35: received event type MODIFIED                                                                                                                                                              ││ addon-token-adapter 2025/02/09 00:29:22 utils.go:35: received event type MODIFIED                                                                                                                                                              ││ prometheus-collector Health check failed: 503, Message: Metrics Extension is not running (configuration exists)                                                                                                                                ││ prometheus-collector Metrics Extension is not running (configuration exists)                                                                                                                                                                   ││ prometheus-collector Health check failed: 503, Message: Metrics Extension is not running (configuration exists)                                                                                                                                ││ stream closed EOF for kube-system/ama-metrics-5bff7d784d-mccqf (prometheus-collector)                             
││ prometheus-collector TokenConfig.json does not exist                                                                                                                                                                                           ││ prometheus-collector azmon-container-start-time file exists, reading start time                                                                                                                                                                ││ prometheus-collector Container has been running for 0 minutes                                                                                                                                                                                  ││ prometheus-collector 2025-02-09T00:33:05 No configuration present for the AKS resource                                                                                                                                                         ││ prometheus-collector TokenConfig.json does not exist                                                                                                                                                                                           ││ prometheus-collector azmon-container-start-time file exists, reading start time                                                                                                                                                                ││ prometheus-collector Container has been running for 0 minutes                                                                                                                                                                                  ││ prometheus-collector 2025-02-09T00:33:20 No configuration present for the AKS resource                                                                                                                                                         ││ addon-token-adapter 2025/02/09 00:33:22 utils.go:35: received event type MODIFIED                                                                                                                                                              ││ prometheus-collector TokenConfig.json does not exist                                                                                                                                                                                           ││ prometheus-collector azmon-container-start-time file exists, reading start time                                                                                                                                                                ││ prometheus-collector Container has been running for 1 minutes                                                                                                                                                                                  ││ prometheus-collector TokenConfig.json does not exist                                                                                                                                                                                           ││ prometheus-collector azmon-container-start-time file exists, reading start time                                                                                                                                                                ││ prometheus-collector Container has been running for 1 minutes    
││ prometheus-collector Container has been running for 1 minutes                                                                                                                                                                                  ││ prometheus-collector {"time":1739061266.590147,"filepath":"/opt/microsoft/linuxmonagent/mdsd.err","log":"2025-02-09T00:34:26.5900480Z: [/__w/1/s/external/WindowsAgent/src/shared/mcsmanager/lib/src/RefreshConfigurations.cpp:318,GetAgentCon ││ prometheus-collector Metrics Extension is not running (configuration exists)                                                                                                                                                                   ││ prometheus-collector Health check failed: 503, Message: Metrics Extension is not running (configuration exists)                                                                                                                                ││ prometheus-collector Metrics Extension is not running (configuration exists)                                                                                                                                                                   ││ prometheus-collector Health check failed: 503, Message: Metrics Extension is not running (configuration exists)                                                                                                                                ││ prometheus-collector Metrics Extension is not running (configuration exists)                                                                                                                                                                   ││ prometheus-collector Health check failed: 503, Message: Metrics Extension is not running (configuration exists)   

I tried to look at https://learn.microsoft.com/en-us/azure/azure-monitor/containers/prometheus-metrics-troubleshoot to find solutions here but it did not help. for example

  • Checked if UAMI has metric publish role
  • DCE and DCR are created, not sure how they work though

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions