Skip to content

engine: expose internal logging call counts as internal metrics #10326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

alecholmes
Copy link

This PR adds a new v2 runtime metric that exposes the number of logger calls by message type. A fluent-bit process consistently logging errors can be indicative of significant configuration or infrastructure problems. A common pattern for observing failures across many instances of software is to expose failures as metric counters that can then be observed and alerted on.

The implementation piggybacks on the src/flb_log.c logging library already extracting a worker context from the current thread.

Here is the example output of curling a fluent-bit with a service http_server enabled:

> curl localhost:5432/api/v2/metrics/prometheus 2>&1 | grep logger

fluentbit_logger_logs_total{severity="error"} 2
fluentbit_logger_logs_total{severity="warn"} 0
fluentbit_logger_logs_total{severity="info"} 10
fluentbit_logger_logs_total{severity="debug"} 0
fluentbit_logger_logs_total{severity="trace"} 0
fluentbit_logger_logs_total{severity="help"} 0

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • [N/A] Example configuration file for the change
  • Debug log output from testing the change (see example from curling above)
  • Attached Valgrind output that shows no leaks or memory corruption was found
> valgrind -s bin/flb-rt-core_internal_logger

SUCCESS: All unit tests have passed.
==118424==
==118424== HEAP SUMMARY:
==118424==     in use at exit: 0 bytes in 0 blocks
==118424==   total heap usage: 2,098 allocs, 2,098 frees, 720,751 bytes allocated
==118424==
==118424== All heap blocks were freed -- no leaks are possible
==118424==
==118424== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature (I will create a docs PR to update the metric name table once this PR is approved)

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@@ -33,7 +33,7 @@
#include <stdarg.h>
#include <ctype.h>

static flb_sds_t sds_alloc(size_t size)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This collides with the definition in cfl/cfl.h. This caused problems once I added the cmetrics import to flb_log.c.

return NULL;
}

ret_ctx->u = flb_upstream_create(ret_ctx->config, "127.0.0.1", 2020, 0, NULL);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any pattern or prior art for picking random free ports in tests?

Internal logger calls increment a new v2 metric exposed by the HTTP server
Prometheus scrape endpoint. There is one time series per log message type.

Signed-off-by: Alec Holmes <[email protected]>
@alecholmes
Copy link
Author

I'm not convinced this PR introduced the fuzzer failures since I'm able to repro them on master.

The signv4_fuzzer failure, for example, seems to have been introduced at some point between 352bb31 and 6899dc1 -- it's hard to bisect because the commits in the middle of that range do not compile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant