add multiline log support for s3 logs #111

ericmustin · 2019-05-02T15:07:57Z

What does this PR do?

Adds multiline log support for s3 logs that are multiline

Motivation

S3 log buckets that have multiline logs can now be supported similar to how log_processing_rules work in other portions of the Datadog ecosystem. I have seen a few gists/hacks floating around on modifying the Lambda to handle multiline logs, however given the KMS support this lambda offers and that it is regularly updated+maintained, I figured it would make more sense to add an optional configuration for this lambda rather than support an entirely different multiline log lambda

Additional Notes

I don't believe it would be possible to handle multiline logs from cloudwatch with this approach, but if I am mistaken, i would be happy to try to generalize this to other sources (such as cloudwatch) if there was consensus that was achievable

NBParis

Looks good to me. Thanks for putting this together @ericmustin.

I'll let @DataDog/logs-intake do the final review before merging.

aws/logs_monitoring/lambda_function.py

ajacquemot · 2019-05-07T13:19:08Z

aws/logs_monitoring/lambda_function.py

@@ -277,8 +281,14 @@ def s3_handler(event, context, metadata):
            )
            yield structured_line
    else:
+        # Check if using multiline log regex pattern
+        if DD_MULTILINE_LOG_REGEX_PATTERN:
+            split_data = re.compile("(?<!^)\s+(?=%s)(?!.\s)" % (DD_MULTILINE_LOG_REGEX_PATTERN,)).split(data)


what happens when there is no match ? i.e. there is two different kind of inputs, a pattern separated one and a line separated one ?

I see your point, in the case where there is both types of s3 logs and the format varies, the line separated kind would not get split out into separate log lines. It assumes each log line, single or multi, begins with the same pattern at the start of the line ie (?<!^)\s+(?=<insert_pattern>)

Open to suggestions on how to approach this, I pushed up a somewhat hacky attempt to solve this (i think it is a solution at least ) which checks, when a multiline_regex env var is supplied, if there are any matches for that pattern ,and if there aren't then assumes it is line separated and calls .splitlines() as usual. My understanding is that every event
passed into s3_handler() would contain only 1 type, either pattern or line separated, but if that is not the case please let me know.

Also open to scrapping this we feel like it opens up a huge rabbit hole

ajacquemot

Looks good but I just have concern, see https://github.com/DataDog/datadog-serverless-functions/pull/111/files#r281624640

ajacquemot · 2019-05-13T12:12:21Z

aws/logs_monitoring/lambda_function.py

+            # handle case where lambda is passed both line and pattern separated logs
+            # if there is only a single log and that log does not start with multiline regex pattern
+            # assume that these are line separated logs, not pattern separated
+            if len(split_data) <= 1 and not multiline_regex_start_pattern.match(data):


I would probably remove this check and change this check https://github.com/DataDog/datadog-serverless-functions/pull/111/files#diff-618731ddf1c446ef45d09613cb4b189bR287 to:

if DD_MULTILINE_LOG_REGEX_PATTERN and multiline_regex_start_pattern.match(data):

also I think that we should change a bit multiline_regex_start_pattern to make sure it starts from the beginning i.e.:

multiline_regex = re.compile("^{}".format(DD_MULTILINE_LOG_REGEX_PATTERN))

Thanks, updated this to clean up the nested if statements. Fwiw, I think .match implicitly ensures the pattern starts from the beginning of the string, but i agree it's good to explicitly put this in the regular expression, updated to reflect that.

ajacquemot

Did not test this functionally but if this is working fine, let's 🚢 it.

add multiline log support for s3 logs

005a8e3

NBParis requested a review from a team May 7, 2019 12:04

NBParis approved these changes May 7, 2019

View reviewed changes

ajacquemot reviewed May 7, 2019

View reviewed changes

aws/logs_monitoring/lambda_function.py Outdated Show resolved Hide resolved

ajacquemot reviewed May 7, 2019

View reviewed changes

ericmustin added 2 commits May 8, 2019 14:27

string formatting, only compile regex once

86d3bb4

handle case where there is both line and pattern separated logs

babd040

ajacquemot reviewed May 13, 2019

View reviewed changes

clean up nested if statements

cd3467f

ajacquemot approved these changes May 13, 2019

View reviewed changes

NBParis merged commit 35448ec into DataDog:master May 13, 2019

ericmustin deleted the add_multiline_log_support branch May 13, 2019 17:29

ericmustin mentioned this pull request May 16, 2019

default assign None to DD_MULTILINE_LOG_REGEX_PATTERN to avoid exception #115

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add multiline log support for s3 logs #111

add multiline log support for s3 logs #111

Uh oh!

ericmustin commented May 2, 2019

Uh oh!

NBParis left a comment

Uh oh!

Uh oh!

ajacquemot May 7, 2019

Uh oh!

ericmustin May 8, 2019

Uh oh!

ajacquemot left a comment

Uh oh!

ajacquemot May 13, 2019

Uh oh!

ajacquemot May 13, 2019

Uh oh!

ericmustin May 13, 2019 •

edited

Loading

Uh oh!

ajacquemot left a comment

Uh oh!

Uh oh!

add multiline log support for s3 logs #111

add multiline log support for s3 logs #111

Uh oh!

Conversation

ericmustin commented May 2, 2019

What does this PR do?

Motivation

Additional Notes

Uh oh!

NBParis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ajacquemot May 7, 2019

Choose a reason for hiding this comment

Uh oh!

ericmustin May 8, 2019

Choose a reason for hiding this comment

Uh oh!

ajacquemot left a comment

Choose a reason for hiding this comment

Uh oh!

ajacquemot May 13, 2019

Choose a reason for hiding this comment

Uh oh!

ajacquemot May 13, 2019

Choose a reason for hiding this comment

Uh oh!

ericmustin May 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajacquemot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ericmustin May 13, 2019 •

edited

Loading