Skip to content

Conversation

@keszybz
Copy link
Member

@keszybz keszybz commented Jun 20, 2018

This allows the input stream to be recreated including any whitespace or parts that were after a zero byte and are not included in MESSAGE=. This replaces #9239, but is much smaller: no escaping is performed, only the _ORIGINAL_MESSAGE= field is added.

@boucman
Copy link
Contributor

boucman commented Jun 20, 2018

should it be ORIGINAL_MESSAGE instead of _ORIGINAL_MESSAGE ?

it's clearly a user-provided information...

assert(s);
assert(buf);
assert(buf_len > 0);
/* We know that buf is non-empty. It is also always NUL-terminated and can be used a string */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, these are weird calling conventions... passing a size AND requiring a NUL byte at the end is pretty uncommon. If we do this, then I think this deserves a bigger comment really, and it should clarify that the trailing NUL byte is there, but not counted in buf_len.

It might also be worth making these assumptions explicit, by adding assert(buf[buf_len] == 0 or so.

Also, I think the name "buf_len" is a bit misleading now, given that the buffer is one byte larger than buf_len now. Maybe rename this to "raw" and "raw_len" or so?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed buf_len to raw_len, and added a comment. This terminating byte allows strspn to be used, so I think it's worth keeping.

forward_syslog_raw(s, priority, buf, buf_len, ucred, tv);

syslog_skip_date(&msg);
syslog_parse_identifier((const char**)&msg, &identifier, &pid);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving this up is unnecessary, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous order was weird, but it had to be because the buffer was modified. Now it's not, so I just grouped all the related calls together.

iovec[n++] = IOVEC_MAKE_STRING(message);

if (store_original) {
const size_t hlen = strlen("_ORIGINAL_MESSAGE=");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strlen() → STRLEN()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STRLEN is only useful to avoid VLAs, I don't think we need to use it otherwise.

@poettering
Copy link
Member

Hmm, it might be worth addressing #2398 at the same time, i.e. extend the condition under which ORIGINAL_MESSAGE is added to the case where the timestamp passed as part of the message doesn't match the kernel timestamp.

@poettering
Copy link
Member

should it be ORIGINAL_MESSAGE instead of _ORIGINAL_MESSAGE ?

it's clearly a user-provided information...

That's true, the underscore should not be there. Also, I think this should contain SYSLOG or so in some form, matching SYSLOG_FACILITY=, SYSLOG_IDENTIFIER_= and SYSLOG_PID=. Maybe rename it SYSLOG_RAW= or so?

Also, this should be documented in systemd.journal-fields(7) with a brief mention why and when it is added, underlining that it is not always there.

@poettering poettering added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Jun 20, 2018
@keszybz
Copy link
Member Author

keszybz commented Jun 20, 2018

Jun 20 09:24:21 autopkgtest systemd-udevd[1165]: Error resolving user 'usbmux': Connection timed out
Jun 20 09:24:46 autopkgtest systemd-udevd[1165]: Error resolving group 'render': Connection timed out

doesn't seem related, though it's suspicious that they all started failing now.

keszybz added 4 commits July 4, 2018 18:02
…eeded

If we can just use the original data, let's do that.
This allows the original stream to be recreated and/or verified. The new field
is written if any stripping was done or if the input message contained embeded
NULs.

$ printf '<13>Sep 15 15:07:58 HOST: x\0y' | nc -w1 -u -U /run/systemd/journal/dev-log

$ journalctl -o json-pretty ...
{
  ...
  "MESSAGE" : "x",
  "SYSLOG_RAW" : [ 60, 49, 51, 62, 83, 101, 112, 32, 49, 53, 32, 49, 53, 58, 48, 55, 58, 53, 56, 32, 72, 79, 83, 84, 58, 32, 120, 0, 121 ]
}

$ journalctl -o export ... | cat -v
...
MESSAGE=x
SYSLOG_RAW
^]^@^@^@^@^@^@^@<13>Sep 15 15:07:58 HOST: x^@y

This mostly fixes systemd#4863.
mktime returns -1 on error, so checking for != 0 is not useful.
This is useful if someone wants to recreate the original syslog datagram. We
already include timestamp information as _SOURCE_REALTIME_TIMESTAMP=, and in
normal use that timestamp, converted back to the form used by syslog
(Mth dd HH:MM:SS) would usually give the value. But there are various
circumstances where this might not be true. Most obviously, if the datagram is
sent a bit later after being prepared, the time is rounded to the nearest
second, and it might be off. This is especially bad around New Year when the
syslog timestamp wraps around. Then the same timezone and locale need to be
used to recreate the original timestamp. In the end doing this reliably is
complicated, and it seems much easier to just unconditionally include the
original timestamp.

If the original timestamp cannot be located, we store the full log line.
This way, it should be always possible to recreate the original input.

Example:
MESSAGE=x
SYSLOG_TIMESTAMP=Sep 15 15:07:58
SYSLOG_RAW
^]^@^@^@^@^@^@^@<13>Sep 15 15:07:58 HOST: x^@y
_PID=3318
_SOURCE_REALTIME_TIMESTAMP=1530743976393553

Fixes systemd#2398.
@keszybz keszybz force-pushed the journald-exact2 branch from f6e53b4 to 7c30c3c Compare July 4, 2018 22:43
@keszybz
Copy link
Member Author

keszybz commented Jul 4, 2018

Updated. The new field is now called SYSLOG_RAW=, and is documented. I also added SYSLOG_TIMESTAMP= handle #2398. I first tried parsing the timestamp and including the SYSLOG_RAW= field if the parsed time is different than the (truncated) _SOURCE_REALTIME_TIMESTAMP= time, but this was ugly and complicated. In the end unconditionally including the timestamp in a separate field seems like a much nicer option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

4 participants