git.postgresql.org Git - postgresql-pgindent.git/log

Short-circuit slice requests that are for more than the object's size.

substring(), and perhaps other callers, isn't careful to pass a
slice length that is no more than the datum's true size.  Since
toast_decompress_datum_slice's children will palloc the requested
slice length, this can waste memory.  Also, close study of the liblz4
documentation suggests that it is dependent on the caller to not ask
for more than the correct amount of decompressed data; this squares
with observed misbehavior with liblz4 1.8.3.  Avoid these problems
by switching to the normal full-decompression code path if the
slice request is >= datum's decompressed size.

Tom Lane and Dilip Kumar

Discussion: https://postgr.es/m/507597.1616370729@sss.pgh.pa.us

Mostly-cosmetic adjustments of TOAST-related macros.

The authors of bbe0a81db hadn't quite got the idea that macros named
like SOMETHING_4B_C were only meant for internal endianness-related
details in postgres.h.  Choose more legible names for macros that are
intended to be used elsewhere.  Rearrange postgres.h a bit to clarify
the separation between those internal macros and ones intended for
wider use.

Also, avoid using the term "rawsize" for true decompressed size;
we've used "extsize" for that, because "rawsize" generally denotes
total Datum size including header.  This choice seemed particularly
unfortunate in tests that were comparing one of these meanings to
the other.

This patch includes a couple of not-purely-cosmetic changes: be
sure that the shifts aligning compression methods are unsigned
(not critical today, but will be when compression method 2 exists),
and fix broken definition of VARATT_EXTERNAL_GET_COMPRESSION (now
VARATT_EXTERNAL_GET_COMPRESS_METHOD), whose callers worked only
accidentally.

Discussion: https://postgr.es/m/574197.1616428079@sss.pgh.pa.us

Remove useless configure probe for <lz4/lz4.h>.

This seems to have been just copied-and-pasted from some other
header checks. But our C code is entirely unprepared to support
such a header name, so it's only wasting cycles to look for it.
If we did need to support it, some #ifdefs would be required.

(A quick trawl at codesearch.debian.net finds some packages that
reference lz4/lz4.h; but they use *only* that spelling, and
appear to be intending to reference their own copy rather than
a system-level installation of liblz4. There's no evidence of
freestanding installations that require this spelling.)

Discussion: https://postgr.es/m/457962.1616362509@sss.pgh.pa.us

Error on invalid TOAST compression in CREATE or ALTER TABLE.

The previous coding treated an invalid compression method name as
equivalent to the default, which is certainly not right.

Justin Pryzby

Discussion: http://postgr.es/m/20210321235544 [email protected]

docs: Fix omissions related to configurable TOAST compression.

Previously, the default_toast_compression GUC was not documented,
and neither was pg_dump's new --no-toast-compression option.

Justin Pryzby and Robert Haas

Discussion: http://postgr.es/m/20210321235544 [email protected]

More code cleanup for configurable TOAST compression.

Remove unused macro. Fix confusion about whether a TOAST compression
method is identified by an OID or a char.

Justin Pryzby

Discussion: http://postgr.es/m/20210321235544 [email protected]

Fix concurrency issues with WAL segment recycling on Windows

This commit is mostly a revert of aaa3aed, that switched the routine
doing the internal renaming of recycled WAL segments to use on Windows a
combination of CreateHardLinkA() plus unlink() instead of rename().  As
reported by several users of Postgres 13, this is causing concurrency
issues when manipulating WAL segments, mostly in the shape of the
following error:
LOG:  could not rename file "pg_wal/000000XX000000YY000000ZZ":
Permission denied

This moves back to a logic where a single rename() (well, pgrename() for
Windows) is used.  This issue has proved to be hard to hit when I tested
it, facing it only once with an archive_command that was not able to do
its work, so it is environment-sensitive.  The reporters of this issue
have been able to confirm that the situation improved once we switched
back to a single rename().  In order to check things, I have provided to
the reporters a patched build based on 13.2 with aaa3aed reverted, to
test if the error goes away, and an unpatched build of 13.2 to test if
the error still showed up (just to make sure that I did not mess up my
build process).

Extra thanks to Fujii Masao for pointing out what looked like the
culprit commit, and to all the reporters for taking the time to test
what I have sent them.

Reported-by: Andrus, Guy Burgess, Yaroslav Pashinsky, Thomas Trenz
Reviewed-by: Tom Lane, Andres Freund
Discussion: https://postgr.es/m/3861ff1e-0923-7838-e826-094cc9bef737@hot.ee
Discussion: https://postgr.es/m/16874-c3eecd319e36a2bf@postgresql.org
Discussion: https://postgr.es/m/095ccf8d-7f58-d928-427c-b17ace23cae6@burgess.co.nz
Discussion: https://postgr.es/m/16927-67c570d968c99567%40postgresql.org
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 13

pgbench: Improve error-handling in \sleep command.

This commit improves pgbench \sleep command so that it handles
the following three cases more properly.

(1) When only one argument was specified in \sleep command and
    it's not a number, previously pgbench reported a confusing error
    message like "unrecognized time unit, must be us, ms or s".
    This commit fixes this so that more proper error message like
    "invalid sleep time, must be an integer" is reported.

(2) When two arguments were specified in \sleep command and
    the first argument was not a number, previously pgbench treated
    that argument as the sleep time 0. No error was reported in this
    case. This commit fixes this so that an error is thrown in this
    case.

(3) When a variable was specified as the first argument in \sleep
    command and the variable stored non-digit value, previously
    pgbench treated that argument as the sleep time 0. No error
    was reported in this case. This commit fixes this so that
    an error is thrown in this case.

Author: Kota Miyake
Reviewed-by: Hayato Kuroda, Alvaro Herrera, Fujii Masao
Discussion: https://postgr.es/m/23b254daf20cec4332a2d9168505dbc9@oss.nttdata.com

Make a test endure log_error_verbosity=verbose.

Back-patch to v13, which introduced the test code in question.

Fix new TAP test for 2PC transactions and PITRs on Windows

The test added by 595b9cb forgot that on Windows it is necessary to set
up pg_hba.conf (see PostgresNode::set_replication_conf) with a specific
entry or base backups fail.  Any node that requires to support
replication just needs to pass down allows_streaming at initialization.
This updates the test to do so.  Simplify things a bit while on it.

Per buildfarm member fairywren.  Any Windows hosts running this test
would have failed, and I have reproduced the problem as well.

Backpatch-through: 10

Simplify TAP tests of kerberos with expected log file contents

The TAP tests of kerberos rely on the logs generated by the backend to
check various connection scenarios.  In order to make sure that a given
test does not overlap with the log contents generated by a previous
test, the test suite relied on a logic with the logging collector and a
rotation of the log files to ensure the uniqueness of the log generated
with a wait phase.

Parsing the log contents for expected patterns is a problem that has
been solved in a simpler way by PostgresNode::issues_sql_like() where
the log file is truncated before checking for the contents generated,
with the backend sending its output to a log file given by pg_ctl
instead.  This commit switches the kerberos test suite to use such a
method, removing any wait phase and simplifying the whole logic,
resulting in less code.  If a failure happens in the tests, the contents
of the logs are still showed to the user at the moment of the failure
thanks to like(), so this has no impact on debugging capabilities.

I have bumped into this issue while reviewing a different patch set
aiming at extending the kerberos test suite to check for multiple
log patterns instead of one now.

Author: Michael Paquier
Reviewed-by: Stephen Frost, Bharath Rupireddy
Discussion: https://postgr.es/m/[email protected]

Fix timeline assignment in checkpoints with 2PC transactions

Any transactions found as still prepared by a checkpoint have their
state data read from the WAL records generated by PREPARE TRANSACTION
before being moved into their new location within pg_twophase/.  While
reading such records, the WAL reader uses the callback
read_local_xlog_page() to read a page, that is shared across various
parts of the system.  This callback, since 1148e22a, has introduced an
update of ThisTimeLineID when reading a record while in recovery, which
is potentially helpful in the context of cascading WAL senders.

This update of ThisTimeLineID interacts badly with the checkpointer if a
promotion happens while some 2PC data is read from its record, as, by
changing ThisTimeLineID, any follow-up WAL records would be written to
an timeline older than the promoted one.  This results in consistency
issues.  For instance, a subsequent server restart would cause a failure
in finding a valid checkpoint record, resulting in a PANIC, for
instance.

This commit changes the code reading the 2PC data to reset the timeline
once the 2PC record has been read, to prevent messing up with the static
state of the checkpointer.  It would be tempting to do the same thing
directly in read_local_xlog_page().  However, based on the discussion
that has led to 1148e22a, users may rely on the updates of
ThisTimeLineID when a WAL record page is read in recovery, so changing
this callback could break some cases that are working currently.

A TAP test reproducing the issue is added, relying on a PITR to
precisely trigger a promotion with a prepared transaction still
tracked.

Per discussion with Heikki Linnakangas, Kyotaro Horiguchi, Fujii Masao
and myself.

Author: Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap
Discussion: https://postgr.es/m/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.com
Backpatch-through: 10

Fix assorted silliness in ATExecSetCompression().

It's not okay to scribble directly on a syscache entry.
Nor to continue accessing said entry after releasing it.

Also get rid of not-used local variables.

Per valgrind testing.

Recycle nbtree pages deleted during same VACUUM.

Maintain a simple array of metadata about pages that were deleted during
nbtree VACUUM's current btvacuumscan() call.  Use this metadata at the
end of btvacuumscan() to attempt to place newly deleted pages in the FSM
without further delay.  It might not yet be safe to place any of the
pages in the FSM by then (they may not be deemed recyclable), but we
have little to lose and plenty to gain by trying.  In practice there is
a very good chance that this will work out when vacuuming larger
indexes, where scanning the index naturally takes quite a while.

This commit doesn't change the page recycling invariants; it merely
improves the efficiency of page recycling within the confines of the
existing design.  Recycle safety is a part of nbtree's implementation of
what Lanin & Shasha call "the drain technique".  The design happens to
use transaction IDs (they're stored in deleted pages), but that in
itself doesn't align the cutoff for recycle safety to any of the
XID-based cutoffs used by VACUUM (e.g., OldestXmin).  All that matters
is whether or not _other_ backends might be able to observe various
inconsistencies in the tree structure (that they cannot just detect and
recover from by moving right).  Recycle safety is purely a question of
maintaining the consistency (or the apparent consistency) of a physical
data structure.

Note that running a simple serial test case involving a large range
DELETE followed by a VACUUM VERBOSE will probably show that any newly
deleted nbtree pages are not yet reusable/recyclable.  This is expected
in the absence of even one concurrent XID assignment.  It is an old
implementation restriction.  In practice it's unlikely to be the thing
that makes recycling remain unsafe, at least with larger indexes, where
recycling newly deleted pages during the same VACUUM actually matters.

An important high-level goal of this commit (as well as related recent
commits e5d8a999 and 9f3665fb) is to make expensive deferred cleanup
operations in index AMs rare in general.  If index vacuuming frequently
depends on the next VACUUM operation finishing off work that the current
operation started, then the general behavior of index vacuuming is hard
to predict.  This is relevant to ongoing work that adds a vacuumlazy.c
mechanism to skip index vacuuming in certain cases.  Anything that makes
the real world behavior of index vacuuming simpler and more linear will
also make top-down modeling in vacuumlazy.c more robust.

Author: Peter Geoghegan <[email protected]>
Reviewed-By: Masahiko Sawada <[email protected]>
Discussion: https://postgr.es/m/CAH2-Wzk76_P=67iUscb1UN44-gyZL-KgpsXbSxq_bdcMa7Q+wQ@mail.gmail.com

Bring configure support for LZ4 up to snuff.

It's not okay to just shove the pkg_config results right into our
build flags, for a couple different reasons:

* This fails to maintain the separation between CPPFLAGS and CFLAGS,
as well as that between LDFLAGS and LIBS.  (The CPPFLAGS angle is,
I believe, the reason for warning messages reported when building
with MacPorts' liblz4.)

* If pkg_config emits anything other than -I/-D/-L/-l switches,
it's highly unlikely that we want to absorb those.  That'd be more
likely to break the build than do anything helpful.  (Even the -D
case is questionable; but we're doing that for libxml2, so I kept it.)

Also, it's not okay to skip doing an AC_CHECK_LIB probe, as
evidenced by recent build failure on topminnow; that should
have been caught at configure time.

Model fixes for this on configure's libxml2 support.

It appears that somebody overlooked an autoheader run, too.

Discussion: https://postgr.es/m/20210119190720 [email protected]

Make compression.sql regression test independent of default.

This test will fail in "make installcheck" if the installation's
default_toast_compression setting is not 'pglz'. Make it robust
against that situation.

Dilip Kumar

Discussion: https://postgr.es/m/CAFiTN-t0w+Rc2U3S+y=7KWcLuOYNB5MfWeGdNa7+pg0UovVdcQ@mail.gmail.com

Don't run recover crash_temp_files test in Windows perl

This reverts commit 677271a3a125e294b33b891669f594a2c8cb36ce.
"Unbreak recovery test on Windows"

The test hangs on Windows, and attempts to remedy the problem have
proved fragile at best. So we simply disable the test on Windows perl.
(Msys perl seems perfectly happy).

Discussion: https://postgr.es/m/5b748470-7335-5439-e876-6a88c951e1c5@dunslane.net

Fix new memory leaks in libpq

My oversight in commit 9aa491abbf07.

Per coverity.

Unbreak recovery test on Windows

On Windows we need to send explicit quit messages to psql or the TAP tests
can hang.

Suppress various new compiler warnings.

Compilers that don't understand that elog(ERROR) doesn't return
issued warnings here. In the cases in libpq_pipeline.c, we were
not exactly helping things by failing to mark pg_fatal() as noreturn.

Per buildfarm.

Move lwlock-release probe back where it belongs

The documentation specifically states that lwlock-release fires before
any released waiters have been awakened. It worked that way until
ab5194e6f617a9a9e7aadb3dd1cee948a42d0755, where is seems to have been
misplaced accidentally. Move it back where it belongs.

Author: Craig Ringer <[email protected]>
Discussion: https://www.postgresql.org/message-id/CAGRY4nwxKUS_RvXFW-ugrZBYxPFFM5kjwKT5O+0+Stuga5b4+Q@mail.gmail.com

Use valid compression method in brin_form_tuple

When compressing the BRIN summary, we can't simply use the compression
method from the indexed attribute. The summary may use a different data
type, e.g. fixed-length attribute may have varlena summary, leading to
compression failures. For the built-in BRIN opclasses this happens to
work, because the summary uses the same data type as the attribute.

When the data types match, we can inherit use the compression method
specified for the attribute (it's copied into the index descriptor).
Otherwise we don't have much choice and have to use the default one.

Author: Tomas Vondra
Reviewed-by: Justin Pryzby <[email protected]>
Discussion: https://postgr.es/m/e0367f27-392c-321a-7411-a58e1a7e4817%40enterprisedb.com

Fix up pg_dump's handling of per-attribute compression options.

The approach used in commit bbe0a81db would've been disastrous for
portability of dumps. Instead handle non-default compression options
in separate ALTER TABLE commands. This reduces chatter for the
common case where most columns are compressed the same way, and it
makes it possible to restore the dump to a server that lacks any
knowledge of per-attribute compression options (so long as you're
willing to ignore syntax errors from the ALTER TABLE commands).

There's a whole lot left to do to mop up after bbe0a81db, but
I'm fast-tracking this part because we need to see if it's
enough to make the buildfarm's cross-version-upgrade tests happy.

Justin Pryzby and Tom Lane

Discussion: https://postgr.es/m/20210119190720 [email protected]

Fix memory leak when rejecting bogus DH parameters.

While back-patching e0e569e1d, I noted that there were some other
places where we ought to be applying DH_free(); namely, where we
load some DH parameters from a file and then reject them as not
being sufficiently secure. While it seems really unlikely that
anybody would hit these code paths in production, let alone do
so repeatedly, let's fix it for consistency.

Back-patch to v10 where this code was introduced.

Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org

Avoid leaking memory in RestoreGUCState(), and improve comments.

RestoreGUCState applied InitializeOneGUCOption to already-live
GUC entries, causing any malloc'd subsidiary data to be forgotten.
We do want the effect of resetting the GUC to its compiled-in
default, and InitializeOneGUCOption seems like the best way to do
that, so add code to free any existing subsidiary data beforehand.

The interaction between can_skip_gucvar, SerializeGUCState, and
RestoreGUCState is way more subtle than their opaque comments
would suggest to an unwary reader. Rewrite and enlarge the
comments to try to make it clearer what's happening.

Remove a long-obsolete assertion in read_nondefault_variables: the
behavior of set_config_option hasn't depended on IsInitProcessingMode
since f5d9698a8 installed a better way of controlling it.

Although this is fixing a clear memory leak, the leak is quite unlikely
to involve any large amount of data, and it can only happen once in the
lifetime of a worker process. So it seems unnecessary to take any
risk of back-patching.

Discussion: https://postgr.es/m/4105247.1616174862@sss.pgh.pa.us

Provide recovery_init_sync_method=syncfs.

Since commit 2ce439f3 we have opened every file in the data directory
and called fsync() at the start of crash recovery. This can be very
slow if there are many files, leading to field complaints of systems
taking minutes or even hours to begin crash recovery.

Provide an alternative method, for Linux only, where we call syncfs() on
every possibly different filesystem under the data directory. This is
equivalent, but avoids faulting in potentially many inodes from
potentially slow storage.

The new mode comes with some caveats, described in the documentation, so
the default value for the new setting is "fsync", preserving the older
behavior.

Reported-by: Michael Brown <[email protected]>
Reviewed-by: Fujii Masao <[email protected]>
Reviewed-by: Paul Guo <[email protected]>
Reviewed-by: Bruce Momjian <[email protected]>
Reviewed-by: Justin Pryzby <[email protected]>
Reviewed-by: David Steele <[email protected]>
Discussion: https://postgr.es/m/11bc2bb7-ecb5-3ad0-b39f-df632734cd81%40discourse.org
Discussion: https://postgr.es/m/CAEET0ZHGnbXmi8yF3ywsDZvb3m9CbdsGZgfTXscQ6agcbzcZAw%40mail.gmail.com

Use lfirst_int in cmp_list_len_contents_asc

The function added in be45be9c33 is comparing integer lists (IntList) by
length and contents, but there were two bugs.  Firstly, it used intVal()
to extract the value, but that's for Value nodes, not for extracting int
values from IntList.  Secondly, it called it directly on the ListCell,
without doing lfirst().  So just do lfirst_int() instead.

Interestingly enough, this did not cause any crashes on the buildfarm,
but valgrind rightfully complained about it.

Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org

Fix use-after-ReleaseSysCache problem in ATExecAlterColumnType.

Introduced by commit bbe0a81db69bd10bd166907c3701492a29aca294.

Per buildfarm member prion.

Allow configurable LZ4 TOAST compression.

There is now a per-column COMPRESSION option which can be set to pglz
(the default, and the only option in up until now) or lz4. Or, if you
like, you can set the new default_toast_compression GUC to lz4, and
then that will be the default for new table columns for which no value
is specified. We don't have lz4 support in the PostgreSQL code, so
to use lz4 compression, PostgreSQL must be built --with-lz4.

In general, TOAST compression means compression of individual column
values, not the whole tuple, and those values can either be compressed
inline within the tuple or compressed and then stored externally in
the TOAST table, so those properties also apply to this feature.

Prior to this commit, a TOAST pointer has two unused bits as part of
the va_extsize field, and a compessed datum has two unused bits as
part of the va_rawsize field. These bits are unused because the length
of a varlena is limited to 1GB; we now use them to indicate the
compression type that was used. This means we only have bit space for
2 more built-in compresison types, but we could work around that
problem, if necessary, by introducing a new vartag_external value for
any further types we end up wanting to add. Hopefully, it won't be
too important to offer a wide selection of algorithms here, since
each one we add not only takes more coding but also adds a build
dependency for every packager. Nevertheless, it seems worth doing
at least this much, because LZ4 gets better compression than PGLZ
with less CPU usage.

It's possible for LZ4-compressed datums to leak into composite type
values stored on disk, just as it is for PGLZ. It's also possible for
LZ4-compressed attributes to be copied into a different table via SQL
commands such as CREATE TABLE AS or INSERT .. SELECT.  It would be
expensive to force such values to be decompressed, so PostgreSQL has
never done so. For the same reasons, we also don't force recompression
of already-compressed values even if the target table prefers a
different compression method than was used for the source data.  These
architectural decisions are perhaps arguable but revisiting them is
well beyond the scope of what seemed possible to do as part of this
project.  However, it's relatively cheap to recompress as part of
VACUUM FULL or CLUSTER, so this commit adjusts those commands to do
so, if the configured compression method of the table happens not to
match what was used for some column value stored therein.

Dilip Kumar. The original patches on which this work was based were
written by Ildus Kurbangaliev, and those were patches were based on
even earlier work by Nikita Glukhov, but the design has since changed
very substantially, since allow a potentially large number of
compression methods that could be added and dropped on a running
system proved too problematic given some of the architectural issues
mentioned above; the choice of which specific compression method to
add first is now different; and a lot of the code has been heavily
refactored.  More recently, Justin Przyby helped quite a bit with
testing and reviewing and this version also includes some code
contributions from him. Other design input and review from Tomas
Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander
Korotkov, and me.

Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain
Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com

Fix race condition in remove_temp_files_after_crash TAP test

The TAP test was written so that it was not waiting for the correct SQL
command, but for output from the preceding one. This resulted in race
conditions, allowing the commands to run in a different order, not block
as expected and so on. This fixes it by inverting the order of commands
where possible, so that observing the output guarantees the data was
inserted properly, and waiting for a lock to appear in pg_locks.

Discussion: https://postgr.es/m/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com

Blindly try to fix test script's tar invocation for MSYS.

Buildfarm member fairywren doesn't like the test case I added
in commit 081876d75. I'm guessing the reason is that I shouldn't
be using a perl2host-ified path in the tar command line.

Fix comments in postmaster.c.

Commit 86c23a6eb2 changed the option to specify that postgres will
stop all other server processes by sending the signal SIGSTOP,
from -s to -T. But previously there were comments incorrectly
explaining that SIGSTOP behavior is set by -s option. This commit
fixes them.

Author: Kyotaro Horiguchi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/20210316.165141.1400441966284654043 [email protected]

Don't leak malloc'd error string in libpqrcv_check_conninfo().

We leaked the error report from PQconninfoParse, when there was
one. It seems unlikely that real usage patterns would repeat
the failure often enough to create serious bloat, but let's
back-patch anyway to keep the code similar in all branches.

Found via valgrind testing.
Back-patch to v10 where this code was added.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us

Don't leak malloc'd strings when a GUC setting is rejected.

Because guc.c prefers to keep all its string values in malloc'd
not palloc'd storage, it has to be more careful than usual to
avoid leaks. Error exits out of string GUC hook checks failed
to clear the proposed value string, and error exits out of
ProcessGUCArray() failed to clear the malloc'd results of
ParseLongOption().

Found via valgrind testing.
This problem is ancient, so back-patch to all supported branches.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us

Don't leak compiled regex(es) when an ispell cache entry is dropped.

The text search cache mechanisms assume that we can clean up
an invalidated dictionary cache entry simply by resetting the
associated long-lived memory context.  However, that does not work
for ispell affixes that make use of regular expressions, because
the regex library deals in plain old malloc.  Hence, we leaked
compiled regex(es) any time we dropped such a cache entry.  That
could quickly add up, since even a fairly trivial regex can use up
tens of kB, and a large one can eat megabytes.  Add a memory context
callback to ensure that a regex gets freed when its owning cache
entry is cleared.

Found via valgrind testing.
This problem is ancient, so back-patch to all supported branches.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us

Don't run RelationInitTableAccessMethod in a long-lived context.

Some code paths in this function perform syscache lookups, which
can lead to table accesses and possibly leakage of cruft into
the caller's context.  If said context is CacheMemoryContext,
we eventually will have visible bloat.  But fixing this is no
harder than moving one memory context switch step.  (The other
callers don't have a problem.)

Andres Freund and I independently found this via valgrind testing.
Back-patch to v12 where this code was added.

Discussion: https://postgr.es/m/20210317023101 [email protected]
Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us

Don't leak rd_statlist when a relcache entry is dropped.

Although these lists are usually NIL, and even when not empty
are unlikely to be large, constant relcache update traffic could
eventually result in visible bloat of CacheMemoryContext.

Found via valgrind testing.
Back-patch to v10 where this field was added.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us

Fix TAP test for remove_temp_files_after_crash

The test included in cd91de0d17 had two simple flaws.

Firstly, the number of rows was low and on some platforms (e.g. 32-bit)
the sort did not require on-disk sort, so on those machines it was not
testing the automatic removal.  The test was however failing, because
without any temporary files the base/pgsql_tmp directory was not even
created.  Fixed by increasing the rowcount to 5000, which should be high
engough on any platform.

Secondly, the test used a simple sleep to wait for the temporary file to
be created.  This is obviously problematic, because on slow machines (or
with valgrind, CLOBBER_CACHE_ALWAYS etc.) it may take a while to create
the temporary file.  But we also want the tests run reasonably fast.
Fixed by instead relying on a UNIQUE constraint, blocking the query that
created the temporary file.

Author: Euler Taveira
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com

Improve tab completion of IMPORT FOREIGN SCHEMA with \h in psql

Only "IMPORT" was showing as result of the completion, while IMPORT
FOREIGN SCHEMA is the only command using this keyword in first
position. This changes the completion to show the full command name
instead of just "IMPORT".

Reviewed-by: Georgios Kokolatos, Julien Rouhaud
Discussion: https://postgr.es/m/[email protected]

Fix misuse of foreach_delete_current().

Our coding convention requires this macro's result to be assigned
back to the original List variable. In this usage, since the
List could not become empty, there was no actual bug --- but
some compilers warned about it. Oversight in be45be9c3.

Discussion: https://postgr.es/m/35077b31-2d62-1e31-0e2e-ddb52d590b73@enterprisedb.com

Implement GROUP BY DISTINCT

With grouping sets, it's possible that some of the grouping sets are
duplicate.  This is especially common with CUBE and ROLLUP clauses. For
example GROUP BY CUBE (a,b), CUBE (b,c) is equivalent to

  GROUP BY GROUPING SETS (
    (a, b, c),
    (a, b, c),
    (a, b, c),
    (a, b),
    (a, b),
    (a, b),
    (a),
    (a),
    (a),
    (c, a),
    (c, a),
    (c, a),
    (c),
    (b, c),
    (b),
    ()
  )

Some of the grouping sets are calculated multiple times, which is mostly
unnecessary.  This commit implements a new GROUP BY DISTINCT feature, as
defined in the SQL standard, which eliminates the duplicate sets.

Author: Vik Fearing
Reviewed-by: Erik Rijkers, Georgios Kokolatos, Tomas Vondra
Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org

Remove temporary files after backend crash

After a crash of a backend using temporary files, the files used to be
left behind, on the basis that it might be useful for debugging. But we
don't have any reports of anyone actually doing that, and it means the
disk usage may grow over time due to repeated backend failures (possibly
even hitting ENOSPC). So this behavior is a bit unfortunate, and fixing
it required either manual cleanup (deleting files, which is error-prone)
or restart of the instance (i.e. service disruption).

This implements automatic cleanup of temporary files, controled by a new
GUC remove_temp_files_after_crash. By default the files are removed, but
it can be disabled to restore the old behavior if needed.

Author: Euler Taveira
Reviewed-by: Tomas Vondra, Michael Paquier, Anastasia Lubennikova, Thomas Munro
Discussion: https://postgr.es/m/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com

Fix function name in error hint

pg_read_file() is the function that's in core, pg_file_read() is in
adminpack. But when using pg_file_read() in adminpack it calls the *C*
level function pg_read_file() in core, which probably threw the original
author off. But the error hint should be about the SQL function.

Reported-By: Sergei Kornilov
Backpatch-through: 11
Discussion: https://postgr.es/m/373021616060475@mail.yandex.ru

Doc: Update description for parallel insert reloption.

Commit c8f78b6161 added a new reloption to enable inserts in parallel-mode
but forgot to update at one of the places about the same in docs. In
passing, fix a typo in the same commit.

Reported-by: Justin Pryzby
Author: Justin Pryzby
Reviewed-by: "Hou, Zhijie", Amit Kapila
Discussion: https://postgr.es/m/20210318025228 [email protected]

Add a new GUC and a reloption to enable inserts in parallel-mode.

Commit 05c8482f7f added the implementation of parallel SELECT for
"INSERT INTO ... SELECT ..." which may incur non-negligible overhead in
the additional parallel-safety checks that it performs, even when, in the
end, those checks determine that parallelism can't be used. This is
normally only ever a problem in the case of when the target table has a
large number of partitions.

A new GUC option "enable_parallel_insert" is added, to allow insert in
parallel-mode. The default is on.

In addition to the GUC option, the user may want a mechanism to allow
inserts in parallel-mode with finer granularity at table level. The new
table option "parallel_insert_enabled" allows this. The default is true.

Author: "Hou, Zhijie"
Reviewed-by: Greg Nancarrow, Amit Langote, Takayuki Tsunakawa, Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com

Fix memory lifetime issues of replication slot stats.

When accessing replication slot stats, introduced in 98681675002d,
pgstat_read_statsfiles() reads the data into newly allocated
memory. Unfortunately the current memory context at that point is the
callers, leading to leaks and use-after-free dangers.

The fix is trivial, explicitly use pgStatLocalContext. There's some
potential for further improvements, but that's outside of the scope of
this bugfix.

No backpatch necessary, feature is only in HEAD.

Author: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/20210317230447 [email protected]

Doc: remove duplicated step in RLS example.

Seems to have been a copy-and-paste mistake in 093129c9d.
Per report from [email protected].

Discussion: https://postgr.es/m/161591740692.24273.4202054598867879464@wrigleys.postgresql.org

Code review for server's handling of "tablespace map" files.

While looking at Robert Foggia's report, I noticed a passel of
other issues in the same area:

* The scheme for backslash-quoting newlines in pathnames is just
wrong; it will misbehave if the last ordinary character in a pathname
is a backslash.  I'm not sure why we're bothering to allow newlines
in tablespace paths, but if we're going to do it we should do it
without introducing other problems.  Hence, backslashes themselves
have to be backslashed too.

* The author hadn't read the sscanf man page very carefully, because
this code would drop any leading whitespace from the path.  (I doubt
that a tablespace path with leading whitespace could happen in
practice; but if we're bothering to allow newlines in the path, it
sure seems like leading whitespace is little less implausible.)  Using
sscanf for the task of finding the first space is overkill anyway.

* While I'm not 100% sure what the rationale for escaping both \r and
\n is, if the idea is to allow Windows newlines in the file then this
code failed, because it'd throw an error if it saw \r followed by \n.

* There's no cross-check for an incomplete final line in the map file,
which would be a likely apparent symptom of the improper-escaping
bug.

On the generation end, aside from the escaping issue we have:

* If needtblspcmapfile is true then do_pg_start_backup will pass back
escaped strings in tablespaceinfo->path values, which no caller wants
or is prepared to deal with.  I'm not sure if there's a live bug from
that, but it looks like there might be (given the dubious assumption
that anyone actually has newlines in their tablespace paths).

* It's not being very paranoid about the possibility of random stuff
in the pg_tblspc directory.  IMO we should ignore anything without an
OID-like name.

The escaping rule change doesn't seem back-patchable: it'll require
doubling of backslashes in the tablespace_map file, which is basically
a basebackup format change.  The odds of that causing trouble are
considerably more than the odds of the existing bug causing trouble.
The rest of this seems somewhat unlikely to cause problems too,
so no back-patch.

Prevent buffer overrun in read_tablespace_map().

Robert Foggia of Trustwave reported that read_tablespace_map()
fails to prevent an overrun of its on-stack input buffer.
Since the tablespace map file is presumed trustworthy, this does
not seem like an interesting security vulnerability, but still
we should fix it just in the name of robustness.

While here, document that pg_basebackup's --tablespace-mapping option
doesn't work with tar-format output, because it doesn't. To make it
work, we'd have to modify the tablespace_map file within the tarball
sent by the server, which might be possible but I'm not volunteering.
(Less-painful solutions would require changing the basebackup protocol
so that the source server could adjust the map. That's not very
appetizing either.)

Add end-to-end testing of pg_basebackup's tar-format output.

The existing test script does run pg_basebackup with the -Ft option,
but it makes no real attempt to verify the sanity of the results.
We wouldn't know if the output is incompatible with standard "tar"
programs, nor if the server fails to start from the restored output.
Notably, this means that xlog.c's read_tablespace_map() is not being
meaningfully tested, since that code is used only in the tar-format
case. (We do have reasonable coverage of restoring from plain-format
output, though it's over in src/test/recovery not here.)

Hence, attempt to untar the output and start a server from it,
rather just hoping it's OK.

This test assumes that the local "tar" has the "-C directory"
switch. Although that's not promised by POSIX, my research
suggests that all non-extinct tar implementations have it.
Should the buildfarm's opinion differ, we can complicate the
test a bit to avoid requiring that.

Possibly this should be back-patched, but I'm unsure about
whether it could work on Windows before d66b23b03.

Doc: improve discussion of variable substitution in PL/pgSQL.

This was a bit disjointed, partly because of a not-well-considered
decision to document SQL commands that don't return result rows as
though they had nothing in common with commands that do.  Rearrange
so that we have one discussion of variable substitution that clearly
applies to all types of SQL commands, and then handle the question
of processing command output separately.  Clarify that EXPLAIN,
CREATE TABLE AS SELECT, and similar commands that incorporate an
optimizable statement will act like optimizable statements for the
purposes of variable substitution.  Do a bunch of minor wordsmithing
in the same area.

David Johnston and Tom Lane, reviewed by Pavel Stehule and David
Steele

Discussion: https://postgr.es/m/CAKFQuwYvMKucM5fnZvHSo-ah4S=_n9gmKeu6EAo=_fTrohunqQ@mail.gmail.com

Revert "Fix race in Parallel Hash Join batch cleanup."

This reverts commit 378802e3713c6c0fce31d2390c134cd5d7c30157.
This reverts commit 3b8981b6e1a2aea0f18384c803e21e9391de669a.

Discussion: https://postgr.es/m/CA%2BhUKGJmcqAE3MZeDCLLXa62cWM0AJbKmp2JrJYaJ86bz36LFA%40mail.gmail.com

Fix comment in indexing.c

578b229, that removed support for WITH OIDS, has changed
CatalogTupleInsert() to not return an Oid, but one comment was still
mentioning that.

Author: Vik Fearing
Discussion: https://postgr.es/m/fef01975-ed10-3601-7b9e-80ecef72d00b@postgresfriends.org

Small error message improvement

Update the names of Parallel Hash Join phases.

Commit 3048898e dropped -ING from some wait event names that correspond
to barrier phases.  Update the phases' names to match.

While we're here making cosmetic changes, also rename "DONE" to "FREE".
That pairs better with "ALLOCATE", and describes the activity that
actually happens in that phase (as we do for the other phases) rather
than describing a state.  The distinction is clearer after bugfix commit
3b8981b6 split the phase into two.  As for the growth barriers, rename
their "ALLOCATE" phase to "REALLOCATE", which is probably a better
description of what happens then.  Also improve the comments about
the phases a bit.

Discussion: https://postgr.es/m/CA%2BhUKG%2BMDpwF2Eo2LAvzd%3DpOh81wUTsrwU1uAwR-v6OGBB6%2B7g%40mail.gmail.com

Fix race in Parallel Hash Join batch cleanup.

With very unlucky timing and parallel_leader_participation off, PHJ
could attempt to access per-batch state just as it was being freed.
There was code intended to prevent that by checking for a cleared
pointer, but it was buggy.

Fix, by introducing an extra barrier phase. The new phase
PHJ_BUILD_RUNNING means that it's safe to access the per-batch state to
find a batch to help with, and PHJ_BUILD_DONE means that it is too late.
The last to detach will free the array of per-batch state as before, but
now it will also atomically advance the phase at the same time, so that
late attachers can avoid the hazard, without the data race. This
mirrors the way per-batch hash tables are freed (see phases
PHJ_BATCH_PROBING and PHJ_BATCH_DONE).

Revealed by a one-off build farm failure, where BarrierAttach() failed a
sanity check assertion, because the memory had been clobbered by
dsa_free().

Back-patch to 11, where the code arrived.

Reported-by: Michael Paquier <[email protected]>
Discussion: https://postgr.es/m/20200929061142.GA29096%40paquier.xyz

Fix transaction.sql tests in higher isolation levels.

It seems like a useful sanity check to be able to run "installcheck"
against a cluster running with default_transaction_level set to
serializable or repeatable read. Only one thing currently fails in
those configurations, so let's fix that.

No back-patch for now, because it fails in many other places in some of
the stable branches. We'd have to go back and fix those too if we
included this configuration in automated testing.

Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGJUaHeK%3DHLATxF1JOKDjKJVrBKA-zmbPAebOM0Se2FQRg%40mail.gmail.com

Fix race condition in drop subscription's handling of tablesync slots.

Commit ce0fdbfe97 made tablesync slots permanent and allow Drop
Subscription to drop such slots. However, it is possible that before
tablesync worker could get the acknowledgment of slot creation, drop
subscription stops it and that can lead to a dangling slot on the
publisher. Prevent cancel/die interrupts while creating a slot in the
tablesync worker.

Reported-by: Thomas Munro as per buildfarm
Author: Amit Kapila
Reviewed-by: Vignesh C, Takamichi Osumi
Discussion: https://postgr.es/m/CA+hUKGJG9dWpw1cOQ2nzWU8PHjm=PTraB+KgE5648K9nTfwvxg@mail.gmail.com

Doc: Add a description of substream in pg_subscription.

Commit 464824323e added a new column substream in pg_subscription but
forgot to update the docs.

Reported-by: Peter Smith
Author: Amit Kapila
Reviewed-by: Peter Smith
Discussion: https://postgr.es/m/CAHut+PuPGGASnh2Dy37VYODKULVQo-5oE=Shc6gwtRizDt==cA@mail.gmail.com

Enable parallelism in REFRESH MATERIALIZED VIEW.

Pass CURSOR_OPT_PARALLEL_OK to pg_plan_query() so that parallel plans
are considered when running the underlying SELECT query. This wasn't
done in commit e9baa5e9, which did this for CREATE MATERIALIZED VIEW,
because it wasn't yet known to be safe.

Since REFRESH always inserts into a freshly created table before later
merging or swapping the data into place with separate operations, we can
enable such plans here too.

Author: Bharath Rupireddy <[email protected]>
Reviewed-by: Hou, Zhijie <[email protected]>
Reviewed-by: Luc Vlaming <[email protected]>
Reviewed-by: Thomas Munro <[email protected]>
Discussion: https://postgr.es/m/CALj2ACXg-4hNKJC6nFnepRHYT4t5jJVstYvri%2BtKQHy7ydcr8A%40mail.gmail.com

Fix comment about promising tuples.

Oversight in commit d168b666823, which added bottom-up index deletion.

amcheck: Reduce debug message verbosity.

Empty sibling pages can occasionally be much more common than any other
event that we report on at elevel DEBUG1. Increase the elevel for
relevant cases to DEBUG2 to avoid overwhelming the user with relatively
insignificant details.

Avoid corner-case memory leak in SSL parameter processing.

After reading the root cert list from the ssl_ca_file, immediately
install it as client CA list of the new SSL context. That gives the
SSL context ownership of the list, so that SSL_CTX_free will free it.
This avoids a permanent memory leak if we fail further down in
be_tls_init(), which could happen if bogus CRL data is offered.

The leak could only amount to something if the CRL parameters get
broken after server start (else we'd just quit) and then the server
is SIGHUP'd many times without fixing the CRL data. That's rather
unlikely perhaps, but it seems worth fixing, if only because the
code is clearer this way.

While we're here, add some comments about the memory management
aspects of this logic.

Noted by Jelte Fennema and independently by Andres Freund.
Back-patch to v10; before commit de41869b6 it doesn't matter,
since we'd not re-execute this code during SIGHUP.

Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org

Fix a confusing amcheck corruption message.

Don't complain about the last TOAST chunk number being different
from what we expected if there are no TOAST chunks at all.
In such a case, saying that the final chunk number is 0 is not
really accurate, and the fact the value is missing from the
TOAST table is reported separately anyway.

Mark Dilger

Discussion: http://postgr.es/m/AA5506CE-7D2A-42E4-A51D-358635E3722D@enterprisedb.com

Use pre-fetching for ANALYZE

When we have posix_fadvise() available, we can improve the performance
of an ANALYZE by quite a bit by using it to inform the kernel of the
blocks that we're going to be asking for. Similar to bitmap index
scans, the number of buffers pre-fetched is based off of the
maintenance_io_concurrency setting (for the particular tablespace or,
if not set, globally, via get_tablespace_maintenance_io_concurrency()).

Reviewed-By: Heikki Linnakangas, Tomas Vondra
Discussion: https://www.postgresql.org/message-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0%40VI1PR0701MB6960.eurprd07.prod.outlook.com

Improve logging of auto-vacuum and auto-analyze

When logging auto-vacuum and auto-analyze activity, include the I/O
timing if track_io_timing is enabled. Also, for auto-analyze, add the
read rate and the dirty rate, similar to how that information has
historically been logged for auto-vacuum.

Stephen Frost and Jakub Wartak

Reviewed-By: Heikki Linnakangas, Tomas Vondra
Discussion: https://www.postgresql.org/message-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0%40VI1PR0701MB6960.eurprd07.prod.outlook.com

Improve logging of bad parameter values in BIND messages.

Since commit ba79cb5dc, values of bind parameters have been logged
during errors in extended query mode. However, we only did that after
we'd collected and converted all the parameter values, thus failing to
offer any useful localization of invalid-parameter problems. Add a
separate callback that's used during parameter collection, and have it
print the parameter number, along with the input string if text input
format is used.

Justin Pryzby and Tom Lane

Discussion: https://postgr.es/m/20210104170939 [email protected]
Discussion: https://postgr.es/m/CANfkH5k-6nNt-4cSv1vPB80nq2BZCzhFVR5O4VznYbsX0wZmow@mail.gmail.com

(Blind) fix Perl splitting of strings at newlines

I forgot that Windows represents newlines as \r\n, so splitting a string
at /\s/ creates additional empty strings. Let's rewrite that as /\s+/
to see if that avoids those. (There's precedent for using that pattern
on Windows in other scripts.)

Previously: 91bdf499b37b, 8ed428dc977f, 650b96707672.

Per buildfarm, via Tom Lane.
Discussion: https://postgr.es/m/3144460.1615860259@sss.pgh.pa.us

Add some basic tests for progress reporting of COPY

This tests some basic features for progress reporting of COPY, relying
on an INSERT trigger that gets fired when doing COPY FROM with a file or
stdin, checking for sizes, number of tuples processed, and number of
tuples excluded by a WHERE clause.

Author: Josef Šimánek, Matthias van de Meent
Reviewed-by: Michael Paquier, Justin Pryzby, Bharath Rupireddy, Tomas
Vondra
Discussion: https://postgr.es/m/CAEze2WiOcgdH4aQA8NtZq-4dgvnJzp8PohdeKchPkhMY-jWZXA@mail.gmail.com

Add libpq pipeline mode support to pgbench

New metacommands \startpipeline and \endpipeline allow the user to run
queries in libpq pipeline mode.

Author: Daniel Vérité <[email protected]>
Reviewed-by: Álvaro Herrera <[email protected]>
Discussion: https://postgr.es/m/b4e34135-2bd9-4b8a-94ca-27d760da26d7@manitou-mail.org

Implement pipeline mode in libpq

Pipeline mode in libpq lets an application avoid the Sync messages in
the FE/BE protocol that are implicit in the old libpq API after each
query. The application can then insert Sync at its leisure with a new
libpq function PQpipelineSync. This can lead to substantial reductions
in query latency.

Co-authored-by: Craig Ringer <[email protected]>
Co-authored-by: Matthieu Garrigues <[email protected]>
Co-authored-by: Álvaro Herrera <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Aya Iwata <[email protected]>
Reviewed-by: Daniel Vérité <[email protected]>
Reviewed-by: David G. Johnston <[email protected]>
Reviewed-by: Justin Pryzby <[email protected]>
Reviewed-by: Kirk Jamison <[email protected]>
Reviewed-by: Michael Paquier <[email protected]>
Reviewed-by: Nikhil Sontakke <[email protected]>
Reviewed-by: Vaishnavi Prabakaran <[email protected]>
Reviewed-by: Zhihong Yu <[email protected]>
Discussion: https://postgr.es/m/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com
Discussion: https://postgr.es/m/CAJkzx4T5E-2cQe3dtv2R78dYFvz+in8PY7A8MArvLhs_pg75gg@mail.gmail.com

Work around issues in MinGW-64's setjmp/longjmp support.

It's hard to avoid the conclusion that there is something wrong with
setjmp/longjmp on MinGW-64, as we have seen failures come and go after
entirely-unrelated-looking changes in our own code. Other projects
such as Ruby have given up and started using gcc's setjmp/longjmp
builtins on that platform; this patch just follows that lead.

Note that this is a pretty fundamental ABI break for functions
containining either setjmp or longjmp, so we can't really consider
a back-patch.

Per reports from Regina Obe and Heath Lord, as well as recent failures
on buildfarm member walleye, and less-recent failures on fairywren.

Juan José Santamaría Flecha

Discussion: https://postgr.es/m/000401d716a0$1ed0fc70$5c72f550 [email protected]
Discussion: https://postgr.es/m/CA+BEBhvHhM-Bn628pf-LsjqRh3Ang7qCSBG0Ga+7KwhGqrNUPw@mail.gmail.com
Discussion: https://postgr.es/m/f1caef93-9640-022e-9211-bbe8755a56b0@2ndQuadrant.com

Drop SERIALIZABLE workaround from parallel query tests.

SERIALIZABLE no longer inhibits parallelism, so we can drop some
outdated workarounds and comments from regression tests. The change
came in release 12, commit bb16aba5, but it's not really worth
back-patching.

Also fix a typo.

Reviewed-by: Bharath Rupireddy <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGJUaHeK%3DHLATxF1JOKDjKJVrBKA-zmbPAebOM0Se2FQRg%40mail.gmail.com

Make archiver process an auxiliary process.

This commit changes WAL archiver process so that it's treated as
an auxiliary process and can use shared memory. This is an infrastructure
patch required for upcoming shared-memory based stats collector patch
series. These patch series basically need any processes including archiver
that can report the statistics to access to shared memory. Since this patch
itself is useful to simplify the code and when users monitor the status of
archiver, it's committed separately in advance.

This commit simplifies the code for WAL archiving. For example, previously
backends need to signal to archiver via postmaster when they notify
archiver that there are some WAL files to archive. On the other hand,
this commit removes that signal to postmaster and enables backends to
notify archier directly using shared latch.

Also, as the side of this change, the information about archiver process
becomes viewable at pg_stat_activity view.

Author: Kyotaro Horiguchi
Reviewed-by: Andres Freund, Álvaro Herrera, Julien Rouhaud, Tomas Vondra, Arthur Zakirov, Fujii Masao
Discussion: https://postgr.es/m/20180629.173418.190173462 [email protected]

Notice that heap page has dead items during VACUUM.

Consistently set a flag variable that tracks whether the current heap
page has a dead item during lazy vacuum's heap scan.  We missed the
common case where there is an preexisting (or even a new) LP_DEAD heap
line pointer.

Also make it clear that the variable might be affected by an existing
line pointer, say from an earlier opportunistic pruning operation.  This
distinction is important because it's the main reason why we can't just
use the nearby tups_vacuumed variable instead.

No backpatch.  In theory failing to set the page level flag variable had
no consequences.  Currently it is only used to defensively check if a
page marked all visible has dead items, which should never happen anyway
(if it does then the table must be corrupt).

Author: Masahiko Sawada <[email protected]>
Diagnosed-By: Masahiko Sawada <[email protected]>
Discussion: https://postgr.es/m/CAD21AoAtZb4+HJT_8RoOXvu4HM-Zd4HKS3YSMCH6+-W=bDyh-w@mail.gmail.com

Doc: add note about how to run the pg_amcheck regression tests.

It's not immediately obvious what you have to do to get "make
installcheck" to work here, so document that along the same lines
as we've used elsewhere.

In pg_amcheck tests, don't depend on perl's Q/q pack code.

It does not work on all versions of perl across all platforms.

To avoid endian-ness issues, pick a new value for column a
that has the same upper 4 bytes as lower 4 bytes. Try to
make it something that isn't likely to occur anywhere nearby
in the page.

Discussion: http://postgr.es/m/29DA079B-0658-4E66-BDAA-0EFD7B64D9C6@enterprisedb.com

pg_amcheck: Keep trying to fix the tests.

Fix another example of non-portable option ordering in the tests.
Oversight in 24189277f.

Mark Dilger

Discussion: https://postgr.es/m/C37D28BA-3BA3-4776-B812-17F05F3472D8@enterprisedb.com

Fix new pthread code to respect --disable-thread-safety.

Don't try to compile src/port/pthread_barrier_wait.c if we opted out of
threads at configure time. Revealed by build farm member gaur, which
can't compile this code because of problems with its pthread
implementation. It shouldn't be trying to, because it's using
--disable-thread-safety.

Defect in commit 44bf3d50.

Reported-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/2568537.1615603606%40sss.pgh.pa.us

Improve FK trigger parallel-safety check added by 05c8482f7f.

Commit 05c8482f7f added special logic related to parallel-safety of FK
triggers. This is a bit of a hack and should have instead been done by
simply setting appropriate proparallel values on those trigger functions
themselves.

Suggested-by: Tom Lane
Author: Greg Nancarrow
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/2309260.1615485644@sss.pgh.pa.us

pg_amcheck: Keep trying to fix the tests.

Commit 24189277f6ff3169b15c7bc82926a372ca7f2dbf managed to remove
one of the two places where we were checking for a "no such user"
error while leaving the other one right next to it. So remove that
too. In fact, remove the entire test, because the whole point of
this test was to see which message we got on a failure.

pg_amcheck: Try to fix still more test failures.

Avoid use of non-portable option ordering in command_checks_all().
The use of bare command line arguments before switches doesn't work
everywhere. Per buildfarm members drongo and hoverfly.

Avoid testing for the message "role \"%s\" does not exist", because
some buildfarm machines report a different error. fairywren complains
about "SSPI authentication failed for user \"%s\"", for example.

Mark Dilger

Discussion: http://postgr.es/m/9E76E46A-48B2-4869-BD0C-422204C1F767@enterprisedb.com
Discussion: http://postgr.es/m/F0A1FD70-A2F4-4528-8A03-8650CAEC0554%40enterprisedb.com

Try to avoid apparent platform-dependency in IPC::Run

It's hard to believe, but buildfarm results from the new pg_amcheck
suggest that command_checks_all() perform shell expansion on some
machines but not others, apparently due to an underlying behavior
difference in IPC::Run. Let's see if we can work around that - and
confirm that it is the real problem - by passing '-S*' as a single
argument rather than '-S' and '*' as two separate ones.

Failures were observed on jacana and hoverfly.

Mark Dilger

Discussion: http://postgr.es/m/9E76E46A-48B2-4869-BD0C-422204C1F767@enterprisedb.com

Fix portability issues in pg_amcheck's 004_verify_heapam.pl.

Test #12 overwrote a 1-byte varlena header to make it look like the
initial byte of a 4-byte varlena header, but the results were
endian-dependent. Also, the byte "abc" that followed the overwritten
byte would be interpreted differently depending on endian-ness.
Overwrite 4 bytes instead, in an endian-aware manner.

Test #13 accidentally managed to depend on TOAST_MAX_CHUNK_SIZE,
which varies slightly depending on MAXIMUM_ALIGNOF. That's not
the point anyway, so make the regexp insensitive to the expected
number of chunks.

Mark Dilger

Discussion: http://postgr.es/m/A80D68F6-E38F-482D-9522-E2FB6AAFE8A1@enterprisedb.com

Consolidate nbtree VACUUM metapage routines.

Simplify _bt_vacuum_needs_cleanup() functions's signature (it only needs
a single 'rel' argument now), and move it next to its sibling function
in nbtpage.c.

I believe that _bt_vacuum_needs_cleanup() was originally located in
nbtree.c due to an include dependency issue. That's no longer an issue.

Follow-up to commit 9f3665fb.

Move PG_USED_FOR_ASSERTS_ONLY before initializer.

Erik Rijkers reported a compile failure, and I think this is probably
the reason.

Adjust perl style.

Per buildfarm member crake.

Try to fix compiler warnings.

Per report from Peter Geoghegan.

Discussion: http://postgr.es/m/CAH2-WznpwULZ3uJ1_6WXvNMXYbOy8k8tYs3r=qSdGmZeRd6tDw@mail.gmail.com

Add pg_amcheck, a CLI for contrib/amcheck.

This makes it a lot easier to run the corruption checks that are
implemented by contrib/amcheck against lots of relations and get
the result in an easily understandable format. It has a wide variety
of options for choosing which relations to check and which checks
to perform, and it can run checks in parallel if you want.

Mark Dilger, reviewed by Peter Geoghegan and by me.

Discussion: http://postgr.es/m/12ED3DA8-25F0-4B68-937D-D907CFBF08E7@enterprisedb.com
Discussion: http://postgr.es/m/BA592F2D-F928-46FF-9516-2B827F067F57@enterprisedb.com

Fix race condition in psql \e's detection of file modification.

psql's editing commands decide whether the user has edited the file
by checking for change of modification timestamp. This is probably
fine for a pre-existing file, but with a temporary file that is
created within the command, it's possible for a fast typist to
save-and-exit in less than the one-second granularity of stat(2)
timestamps. On Windows FAT filesystems the granularity is even
worse, 2 seconds, making the race a bit easier to hit.

To fix, try to set the temp file's mod time to be two seconds ago.
It's unlikely this would fail, but then again the race condition
itself is unlikely, so just ignore any error.

Also, we might as well check the file size as well as its mod time.

While this is a difficult bug to hit, it still seems worth
back-patching, to ensure that users' edits aren't lost.

Laurenz Albe, per gripe from Jacob Champion; based on fix suggestions
from Jacob and myself

Discussion: https://postgr.es/m/0ba3f2a658bac6546d9934ab6ba63a805d46a49b [email protected]

Forbid marking an identity column as nullable.

GENERATED ALWAYS AS IDENTITY implies NOT NULL, but the code failed
to complain if you overrode that with "GENERATED ALWAYS AS IDENTITY
NULL". One might think the old behavior was a feature, but it was
inconsistent because the outcome varied depending on the order of
the clauses, so it seems to have been just an oversight.

Per bug #16913 from Pavel Boev. Back-patch to v10 where identity
columns were introduced.

Vik Fearing (minor tweaks by me)

Discussion: https://postgr.es/m/16913-3b5198410f67d8c6@postgresql.org

Specialize checkpointer sort functions.

When sorting a potentially large number of dirty buffers, the
checkpointer can benefit from a faster sort routine. One reported
improvement on a large buffer pool system was 1.4s -> 0.6s.

Reviewed-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com

Fix size overflow in calculation introduced by commits d6ad34f3 and bea449c6.

Reported-by: Thomas Munro
Author: Takayuki Tsunakawa
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CA+hUKG+oPoFizjABt=GXZWTEHx3oev5rAe2scjW2r6F1rguo5w@mail.gmail.com

Fix use of relcache TriggerDesc field introduced by commit 05c8482f7f.

The commit added code which used a relcache TriggerDesc field across
another cache access, which it shouldn't because the relcache doesn't
guarantee it won't get moved.

Diagnosed-by: Tom Lane
Author: Greg Nancarrow
Reviewed-by: Hou Zhijie, Amit Kapila
Discussion: https://postgr.es/m/2309260.1615485644@sss.pgh.pa.us

Poll postmaster less frequently in recovery.

Since commits 9f095299 and f98b8476 we don't poll the postmaster
pipe at all during crash recovery on Linux and FreeBSD, but on other
operating systems we were still doing it for every WAL record. Do it
less frequently on operating systems where system calls are required, at
the cost of delaying exit a bit after postmaster death. This avoids
expensive system calls reported to slow down CPU-bound recovery by as
much as 10-30%.

Reviewed-by: Heikki Linnakangas <[email protected]>
Reviewed-by: Fujii Masao <[email protected]>
Reviewed-by: Michael Paquier <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi

Add condition variable for walreceiver shutdown.

Use this new CV to wait for walreceiver shutdown without a sleep/poll
loop, while also benefiting from standard postmaster death handling.

Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com

Add condition variable for recovery resume.

Replace a sleep loop with a CV, to get a fast reaction time when
recovery is resumed or the postmaster exits via standard infrastructure.
Unfortunately we still need to wake up every second to perform extra
polling during the recovery pause loop.

Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com

Send statistics collected during shutdown checkpoint to the stats collector.

When shutdown is requested, checkpointer performs checkpoint or
restartpoint, and updates the statistics, before it exits. But previously
checkpointer didn't send those statistics to the stats collector.

Shutdown checkpoint and restartpoint are treated as requested ones
instead of scheduled ones, so the number of them are counted in
pg_stat_bgwriter.checkpoints_req column.

Author: Masahiro Ikeda
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com

Force to send remaining WAL stats to the stats collector at walwriter exit.

In walwriter's main loop, WAL stats message is only sent if enough time
has passed since last one was sent to reach PGSTAT_STAT_INTERVAL msecs.
This is necessary to avoid overloading to the stats collector. But this
can cause recent WAL stats to be unsent when walwriter exits.

To ensure that all the WAL stats are sent, this commit makes walwriter
force to send remaining WAL stats to the collector when it exits because
of shutdown request. Note that those remaining WAL stats can still be
unsent when walwriter exits with non-zero exit code (e.g., FATAL error).
This is OK because that walwriter exit leads to server crash and
subsequent recovery discards all the stats. So there is no need to send
remaining stats in that case.

Author: Masahiro Ikeda
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com

Minor modernization for README.barrier.

Itanium is very uncommon and being discontinued. ARM is everywhere.
Prefer ARM as an example of an architecture with weak memory ordering.