Avoid creating archive status ".ready" files too early
authorAlvaro Herrera <[email protected]>
Mon, 23 Aug 2021 19:50:35 +0000 (15:50 -0400)
committerAlvaro Herrera <[email protected]>
Mon, 23 Aug 2021 19:50:35 +0000 (15:50 -0400)
commit515e3d84a0b58b58eb30194209d2bc47ed349f5b
tree5d4fdc0356ca527de988db1fc440393fdec23e2e
parentf7bda63a487c542949c8150de8e63bc728e5e31e
Avoid creating archive status ".ready" files too early

WAL records may span multiple segments, but XLogWrite() does not
wait for the entire record to be written out to disk before
creating archive status files.  Instead, as soon as the last WAL page of
the segment is written, the archive status file is created, and the
archiver may process it.  If PostgreSQL crashes before it is able to
write and flush the rest of the record (in the next WAL segment), the
wrong version of the first segment file lingers in the archive, which
causes operations such as point-in-time restores to fail.

To fix this, keep track of records that span across segments and ensure
that segments are only marked ready-for-archival once such records have
been completely written to disk.

This has always been wrong, so backpatch all the way back.

Author: Nathan Bossart <[email protected]>
Reviewed-by: Kyotaro Horiguchi <[email protected]>
Reviewed-by: Ryo Matsumura <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Discussion: https://postgr.es/m/CBDDFA01-6E40-46BB-9F98-9340F4379505@amazon.com
src/backend/access/transam/timeline.c
src/backend/access/transam/xlog.c
src/backend/access/transam/xlogarchive.c
src/backend/postmaster/walwriter.c
src/backend/replication/walreceiver.c
src/include/access/xlog.h
src/include/access/xlogarchive.h
src/include/access/xlogdefs.h