-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Store a copy of the input message if any stripping or truncation occurs #9346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
should it be ORIGINAL_MESSAGE instead of _ORIGINAL_MESSAGE ? it's clearly a user-provided information... |
src/journal/journald-syslog.c
Outdated
| assert(s); | ||
| assert(buf); | ||
| assert(buf_len > 0); | ||
| /* We know that buf is non-empty. It is also always NUL-terminated and can be used a string */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, these are weird calling conventions... passing a size AND requiring a NUL byte at the end is pretty uncommon. If we do this, then I think this deserves a bigger comment really, and it should clarify that the trailing NUL byte is there, but not counted in buf_len.
It might also be worth making these assumptions explicit, by adding assert(buf[buf_len] == 0 or so.
Also, I think the name "buf_len" is a bit misleading now, given that the buffer is one byte larger than buf_len now. Maybe rename this to "raw" and "raw_len" or so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed buf_len to raw_len, and added a comment. This terminating byte allows strspn to be used, so I think it's worth keeping.
| forward_syslog_raw(s, priority, buf, buf_len, ucred, tv); | ||
|
|
||
| syslog_skip_date(&msg); | ||
| syslog_parse_identifier((const char**)&msg, &identifier, &pid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moving this up is unnecessary, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous order was weird, but it had to be because the buffer was modified. Now it's not, so I just grouped all the related calls together.
src/journal/journald-syslog.c
Outdated
| iovec[n++] = IOVEC_MAKE_STRING(message); | ||
|
|
||
| if (store_original) { | ||
| const size_t hlen = strlen("_ORIGINAL_MESSAGE="); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strlen() → STRLEN()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
STRLEN is only useful to avoid VLAs, I don't think we need to use it otherwise.
|
Hmm, it might be worth addressing #2398 at the same time, i.e. extend the condition under which ORIGINAL_MESSAGE is added to the case where the timestamp passed as part of the message doesn't match the kernel timestamp. |
That's true, the underscore should not be there. Also, I think this should contain SYSLOG or so in some form, matching SYSLOG_FACILITY=, SYSLOG_IDENTIFIER_= and SYSLOG_PID=. Maybe rename it SYSLOG_RAW= or so? Also, this should be documented in systemd.journal-fields(7) with a brief mention why and when it is added, underlining that it is not always there. |
doesn't seem related, though it's suspicious that they all started failing now. |
…eeded If we can just use the original data, let's do that.
This allows the original stream to be recreated and/or verified. The new field
is written if any stripping was done or if the input message contained embeded
NULs.
$ printf '<13>Sep 15 15:07:58 HOST: x\0y' | nc -w1 -u -U /run/systemd/journal/dev-log
$ journalctl -o json-pretty ...
{
...
"MESSAGE" : "x",
"SYSLOG_RAW" : [ 60, 49, 51, 62, 83, 101, 112, 32, 49, 53, 32, 49, 53, 58, 48, 55, 58, 53, 56, 32, 72, 79, 83, 84, 58, 32, 120, 0, 121 ]
}
$ journalctl -o export ... | cat -v
...
MESSAGE=x
SYSLOG_RAW
^]^@^@^@^@^@^@^@<13>Sep 15 15:07:58 HOST: x^@y
This mostly fixes systemd#4863.
mktime returns -1 on error, so checking for != 0 is not useful.
This is useful if someone wants to recreate the original syslog datagram. We already include timestamp information as _SOURCE_REALTIME_TIMESTAMP=, and in normal use that timestamp, converted back to the form used by syslog (Mth dd HH:MM:SS) would usually give the value. But there are various circumstances where this might not be true. Most obviously, if the datagram is sent a bit later after being prepared, the time is rounded to the nearest second, and it might be off. This is especially bad around New Year when the syslog timestamp wraps around. Then the same timezone and locale need to be used to recreate the original timestamp. In the end doing this reliably is complicated, and it seems much easier to just unconditionally include the original timestamp. If the original timestamp cannot be located, we store the full log line. This way, it should be always possible to recreate the original input. Example: MESSAGE=x SYSLOG_TIMESTAMP=Sep 15 15:07:58 SYSLOG_RAW ^]^@^@^@^@^@^@^@<13>Sep 15 15:07:58 HOST: x^@y _PID=3318 _SOURCE_REALTIME_TIMESTAMP=1530743976393553 Fixes systemd#2398.
|
Updated. The new field is now called SYSLOG_RAW=, and is documented. I also added SYSLOG_TIMESTAMP= handle #2398. I first tried parsing the timestamp and including the SYSLOG_RAW= field if the parsed time is different than the (truncated) _SOURCE_REALTIME_TIMESTAMP= time, but this was ugly and complicated. In the end unconditionally including the timestamp in a separate field seems like a much nicer option. |
This allows the input stream to be recreated including any whitespace or parts that were after a zero byte and are not included in MESSAGE=. This replaces #9239, but is much smaller: no escaping is performed, only the _ORIGINAL_MESSAGE= field is added.