git.postgresql.org Git - postgresql-pgindent.git/log

De-Revert "Add support for Kerberos credential delegation"

This reverts commit 3d03b24c3 (Revert Add support for Kerberos
credential delegation) which was committed on the grounds of concern
about portability, but on further review and discussion, it's clear that
we are better off explicitly requiring MIT Kerberos as that appears to
be the only GSSAPI library currently that's under proper maintenance
and ongoing development. The API used for storing credentials was added
to MIT Kerberos over a decade ago while for the other libraries which
appear to be mainly based on Heimdal, which exists explicitly to be a
re-implementation of MIT Kerberos, the API never made it to a released
version (even though it was added to the Heimdal git repo over 5 years
ago..).

This post-feature-freeze change was approved by the RMT.

Discussion: https://postgr.es/m/ZDDO6jaESKaBgej0%40tamriel.snowman.net

doc: Make HTML ids discoverable

In the HTML output, this decorates section headers and variable list
terms with a marker ("#") that is a link to the same section/term.
That way, links inside a page can be discovered for easier sharing.
The marker only appears when hovering.

This now requires that all elements that are candidates for such a
link have an id attribute. Otherwise, an error will be generated.
All previously missing ids have been added prior to this patch.

Author: Brar Piening <[email protected]>
Reviewed-by: Karl O. Pinc <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CAB8KJ=jpuQU9QJe4+RgWENrK5g9jhoysMw2nvTN_esoOU0=a_w@mail.gmail.com

Add missing XML ID attribute

Discussion: https://www.postgresql.org/message-id/dc813a6f-60d9-991f-eecd-675a0921de11@gmx.de

Skip the 004_io_direct.pl test if a pre-flight check fails.

The test previously had a list of OSes that direct I/O was expected to
work on.  That worked well enough for the systems in our build farm, but
didn't survive contact with the Debian build bots running on tmpfs via
overlayfs.  tmpfs does not support O_DIRECT, but we don't want to
exclude Linux generally.

The new approach is to try to create an empty file with O_DIRECT from
Perl first.  If that fails, we'll skip the test and report what the
error was.

Reported-by: Christoph Berg <[email protected]>
Reviewed-by: Dagfinn Ilmari Mannsåker <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Discussion: https://postgr.es/m/ZDYd4A78cT2ULxZZ%40msg.df7cb.de

Remove overzealous assertion from PHJ.

We can't assert that we're the only process attached to a barrier after
BarrierArriveAndDetachExceptLast(). Although that'll be true almost
always, a late-starting parallel worker can attach very briefly (that
is, immediately detach after checking the phase) right at that moment.
BarrierArriveAndDetachExceptLast() already contains an assertion like
that, but it holds a spinlock preventing the race. This thinko caused a
one-off failure on build farm animal chimaera.

Diagnosed-by: Melanie Plageman <[email protected]>
Reported-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/3590249.1680971629@sss.pgh.pa.us

Revert "Adjust contrib/sepgsql regression test expected outputs."

This reverts commit 76c111a7f166; should have been included in
9ce04b50e120.

Noted by Joe Conway

Improve error messages introduced in be87200efd9 and 0fdab27ad68

Author: Kyotaro Horiguchi <[email protected]>
Discussion: https://postgr.es/m/20230411.120301.93333867350615278 [email protected]
Discussion: https://postgr.es/m/20230412174244 [email protected]

Revert "Catalog NOT NULL constraints" and fallout

This reverts commit e056c557aef4 and minor later fixes thereof.

There's a few problems in this new feature -- most notably regarding
pg_upgrade behavior, but others as well. This new feature is not in any
way critical on its own, so instead of scrambling to fix it we revert it
and try again in early 17 with these issues in mind.

Discussion: https://postgr.es/m/3801207.1681057430@sss.pgh.pa.us

basebackup_to_shell: Check for a NULL return from OpenPipeStream.

Per complaint from Peter Eisentraut.

Discussion: http://postgr.es/m/4f1707cc-2432-da35-64a2-5c2a8d92a388@enterprisedb.com

Document BaseBackupSync and BaseBackupWrite wait events.

Commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc should have done
this, but I overlooked it.

Per complaint from Thomas Munro.

Discussion: http://postgr.es/m/CA+hUKGJixAHc860Ej9Qzd_z96Z6aoajAgJ18bYfV3Lfn6t9=+Q@mail.gmail.com

Fix parallel-safety marking when moving initplans to another node.

Our policy since commit ab77a5a45 has been that a plan node having
any initplans is automatically not parallel-safe.  (This could be
relaxed, but not today.)  clean_up_removed_plan_level neglected
this, and could attach initplans to a parallel-safe child plan
node without clearing the plan's parallel-safe flag.  That could
lead to "subplan was not initialized" errors at runtime, in case
an initplan referenced another one and only the referencing one
got transmitted to parallel workers.

The fix in clean_up_removed_plan_level is trivial enough.
materialize_finished_plan also moves initplans from one node
to another, but it's okay because it already copies the source
node's parallel_safe flag.  The other place that does this kind
of thing is standard_planner's hack to inject a top-level Gather
when debug_parallel_query is active.  But that's actually dead
code given that we're correctly enforcing the "initplans aren't
parallel safe" rule, so just replace it with an Assert that
there are no initplans.

Also improve some related comments.

Normally we'd add a regression test case for this sort of bug.
The mistake itself is already reached by existing tests, but there
is accidentally no visible problem.  The only known test case that
creates an actual failure seems too indirect and fragile to justify
keeping it as a regression test (not least because it fails to fail
in v11, though the bug is clearly present there too).

Per report from Justin Pryzby.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/[email protected]

doc: Reword unexplained abbreviation

The previous wording used MVF to indicate the Most Common Values'
Frequencies, but the abbreviation was never explained or defined.
Reword to mcv_freqs to make the use clearer.

Also add MCF and MCV as acronyms as they were using <acronym>
markup but were missing from the acronyms page.

Reported-by: Eric Mutta <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/166112292492.654.5377188452604176150@wrigleys.postgresql.org

Fix incorrect format placeholders

Update config.guess and config.sub

doc: Fix some typos and grammar

This is a first batch of the fixes, for the most obvious fixes. A
little bit more is under discussion.

Author: Thom Brown, Justin Pryzby
Discussion: https://postgr.es/m/CAA-aLv7xCZ0nBJa-NWe0rxBB28TjFjS2JtjiZMoQ+0wsugG+hQ@mail.gmail.com

Fix detection of unseekable files for fseek() and ftello() with MSVC

Calling fseek() or ftello() on a handle to a non-seeking device such as
a pipe or a communications device is not supported.  Unfortunately,
MSVC's flavor of these routines, _fseeki64() and _ftelli64(), do not
return an error when given a pipe as handle.  Some of the logic of
pg_dump and restore relies on these routines to check if a handle is
seekable, causing failures when passing the contents of pg_dump to
pg_restore through a pipe, for example.

This commit introduces wrappers for fseeko() and ftello() on MSVC so as
any callers are able to properly detect the cases of non-seekable
handles.  This relies mainly on GetFileType(), sharing a bit of code
with the MSVC port for fstat().  The code in charge of getting a file
type is refactored into a new file called win32common.c, shared by
win32stat.c and the new win32fseek.c.  It includes the MSVC ports for
fseeko() and ftello().

Like 765f5df, this is backpatched down to 14, where the fstat()
implementation for MSVC is able to understand about files larger than
4GB in size.  Using a TAP test for that is proving to be tricky as
IPC::Run handles the pipes by itself, still I have been able to check
the fix manually.

Reported-by: Daniel Watzinger
Author: Juan José Santamaría Flecha, Michael Paquier
Discussion: https://postgr.es/m/CAC+AXB26a4EmxM2suXxPpJaGrqAdxracd7hskLg-zxtPB50h7A@mail.gmail.com
Backpatch-through: 14

Refine the guidelines for rmgrdesc authors.

Clarify the goals of the recently added guidelines for rmgrdesc authors:
to avoid gratuitous inconsistencies across resource managers, and to
make it reasonably easy to write a reusable custom parser.

Beyond that, the guidelines leave rmgrdesc authors with a significant
amount of leeway. This even includes the leeway to invent custom
conventions (in cases where it's warranted).

Follow-up to commit 7d8219a4.

Author: Peter Geoghegan <[email protected]>
Reviewed-By: Melanie Plageman <[email protected]>
Discussion: https://postgr.es/m/CAH2-WzkbYuvwYKm-Y-72QEh6SPMQcAo9uONv+mR3bMGcu9E_Cg@mail.gmail.com

Fix Heap rmgr's desc output for infobits arrays.

Make heap desc routines that output status bit as arrays of constants
avoid outputting array literals that contain superfluous punctuation
characters that complicate parsing the output.  Also make sure that no
heap desc routine repeats the same key name (at the same nesting level),
for the same reason.  Arguably, these were both oversights in commit
7d8219a4.

In passing, make the desc output code (which covers Heap's DELETE,
UPDATE, HOT_UPDATE, LOCK, and LOCK_UPDATED record types) consistent in
terms of the output order of each field.  This order also matches WAL
record struct order.  Heap's DELETE desc output now shows the record's
xmax field for the first time (just like UPDATE/HOT_UPDATE records).

Author: Peter Geoghegan <[email protected]>
Reviewed-By: Melanie Plageman <[email protected]>
Discussion: https://postgr.es/m/CAH2-Wz=pNYtxiJ2Jx5Lj=fKo1OEZ4GE0p_kct+ugAUTqBwU46g@mail.gmail.com

Fix xl_heap_lock WAL record field's data type.

Make xl_heap_lock's infobits_set field of type uint8, not int8. Using
int8 isn't appropriate given that the field just holds status bits.
This fixes an oversight in commit 0ac5ad5134.

In passing rename the nearby TransactionId field to "xmax" to make
things consistency with related records, such as xl_heap_lock_updated.

Deliberately avoid a bump in XLOG_PAGE_MAGIC. No backpatch, either.

Author: Peter Geoghegan <[email protected]>
Discussion: https://postgr.es/m/CAH2-WzkCd3kOS8b7Rfxw7Mh1_6jvX=Nzo-CWR1VBTiOtVZkWHA@mail.gmail.com

035_standby_logical_decoding: Add missing waits for replication

At least one slow buildfarm system (hoverfly) showed that the database
creation was not replicated before we try to create logical replication slots
on the standby, in that database.

Reported-by: Noah Misch <[email protected]>
Author: "Drouvot, Bertrand" <[email protected]>
Discussion: https://postgr.es/m/20230411053657 [email protected]

Document new pg_subscription columns.

Commit 482675987bcdffb390ae735cfd5f34b485ae97c6 and commit
c3afe8cf5a1e465bd71e48e4bc717f5bfdc7a7d6 forgot to take care
of this.

Noriyoshi Shinoda

Discussion: http://postgr.es/m/DM4PR84MB17345D8760165F14A199B81CEE9A9@DM4PR84MB1734.NAMPRD84.PROD.OUTLOOK.COM

Fix uninitialized variable in transformTableLikeClause()

process_notnull_constraints should be set to false until we discover a NOT
NULL column.

Discovered while running Valgrind.

Discussion: https://postgr.es/m/CAApHDvoMyiZVi1KW5WVdqMRzWsWkD3F7n6QD+BbAO6WTeAWsUQ@mail.gmail.com

Improve ereports for VACUUM's BUFFER_USAGE_LIMIT option

There's no need to check if opt->arg is NULL since defGetString() already
does that and raises an ERROR if it is.  Let's just remove that check.

Also, combine the two remaining ERRORs into a single check.  It seems
better to give an indication about what sort of values we're looking for
rather than just to state that the value given isn't valid.  Make
BUFFER_USAGE_LIMIT uppercase in this ERROR message too.  It's already
upper case in one other error message, so make that consistent.

Reported-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20230411.102335.1643720544536884844 [email protected]

Doc: use "an SQL" consistently rather than "a SQL"

Similarly to what was done in 04539e73f and 7bdd489d3, we standardized on
SQL being pronounced "es-que-ell" rather than "sequel" in our
documentation.

This fixes the instances of "a SQL" that have crept in during the v16
cycle.

Discussion: https://postgr.es/m/CAApHDvpML27UqFXnrYO1MJddsKVMQoiZisPvsAGhKE_tsKXquw%40mail.gmail.com

Clarify nbtree posting list update desc issue.

Per complaint from Melanie Plageman.

Follow-up to commit 5d6728e5.

Reported-By: Melanie Plageman <[email protected]>
Discussion: https://postgr.es/m/20230411002315.oyaicmcqrq2hb3ek@liskov

Doc: add missed entries in BRIN extensibility tables.

The tables in "71.3. Extensibility" listing the support functions
for bloom and minmax-multi opclasses should include the associated
options function. While this isn't quite as required as the rest,
you need it for full functionality of the opclass.

Back-patch to v14 where these functions were added.

Fix nbtree posting list update desc output.

We cannot use the generic array_desc approach with per-tuple nbtree
posting list update metadata because array_desc can only deal with fixed
width elements (e.g., page offset numbers).  Using array_desc led to
incorrect rmgr descriptions for updates from nbtree DELETE/VACUUM WAL
records.

To fix, add specialized code to describe the update metadata as array
elements in desc output.  We now iterate over the update metadata using
an approach that matches related REDO routines.

Also stop showing the updates offset number array separately in nbtree
DELETE/VACUUM desc output.  It's redundant information, since the same
page offset numbers appear in the description of each individual update
element.  Also make some small tweaks to the way that we format arrays
in all desc routines (not just nbtree desc routines) to make arrays a
little less verbose.

Oversight in commit 1c453cfd, which enhanced the nbtree rmgr desc
routines.

Author: Peter Geoghegan <[email protected]>
Discussion: https://postgr.es/m/CAH2-WzkbYuvwYKm-Y-72QEh6SPMQcAo9uONv+mR3bMGcu9E_Cg@mail.gmail.com

Doc: adjust examples of EXTRACT() output to match current reality.

EXTRACT(EPOCH), EXTRACT(SECOND), and some related cases print more
trailing zeroes than they used to.  This behavior change happened
with commit a2da77cdb (Change return type of EXTRACT to numeric),
and it was intentional according to the commit log:

    - Return values when extracting fields with possibly fractional
      values, such as second and epoch, now have the full scale that the
      value has internally (so, for example, '1.000000' instead of just
      '1').

It's been like that for two releases now, so while I suggested
changing this back, it's probably better to adjust the documentation
examples.

Per bug #17866 from Евгений Жужнев.  Back-patch to v14 where the
change came in.

Discussion: https://postgr.es/m/17866-18eb70095b1594e2@postgresql.org

Doc: avoid using pg_get_publication_tables() in an example.

pg_get_publication_tables() is undocumented because it's only meant
as infrastructure for the pg_publication_tables system view.
That being the case, we should use the view not the bare function
in this sample query.

Shi Yu

Discussion: https://postgr.es/m/OSZPR01MB63107E83D07FEDEEABD83A23FD949@OSZPR01MB6310.jpnprd01.prod.outlook.com

Simplify version check for SKIP clause

Checking for the required versions of IO::Pty as well as IPC::Run
can be achieved with a single eval call, and by using the VERSION
function the comparison is guaranteed to follow the same rules as
calling 'use' on the module with a version.

Reported-by: Andrew Dunstan <[email protected]>
Discussion: https://postgr.es/m/6d880ea2-f8ca-f458-4dcd-a7a3e6d6cd7c@dunslane.net

Use higher wal_level for 004_io_direct.pl.

The new direct I/O test deliberately uses a very small shared_buffers to
force some disk transfers without making the data set large and slow,
but ran into a problem with wal_level = minimal: log_newpage_range()
pins many buffers, leading to a few intermittent "no unpinned buffers
available" errors.

We could presumably fix that by adjusting shared_buffers, but crake
seems to be trying to tell us something interesting with these settings,
so let's just avoid wal_level = minimal in this test for now.

Reported-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/20230408060408.n7xdwk3mxj5oykt6%40awork3.anarazel.de

Improve indentation of multiline initialization expressions.

If a variable has an initialization expression that wraps onto the
next line(s), pg_bsd_indent will now indent the continuation lines
one stop, instead of aligning them flush with the variable declaration.

We've been holding off applying this until the last v16 CF finished,
but now it's time.

Thomas Munro and Tom Lane

Discussion: https://postgr.es/m/20230120013137 [email protected]

Try to unbreak MSVC builds for pg_waldump

remedy an omission in commit 7d8219a444

Suppress bogus printout during new 035_standby_logical_decoding.pl test.

Our convention for some time has been that successful tests shouldn't
print anything on stderr. A stray "diag" call violated that, and
for that matter messed up the normal TAP progress display.

Skip \password TAP test on old IPC::Run versions

IPC::Run versions prior to 0.98 cause the interactive session to time out,
so SKIP the test in case these versions are detected (they are within the
base requirement for our TAP tests in general). Error reported by the BF
and investigation by Tom Lane.

Discussion: https://postgr.es/m/414A86BD-986B-48A7-A1E4-EEBCE5AF08CB@yesql.se

Try to unbreak MSVC builds for fuzzystrmatch

Commit a290378a37 neglrected to add a recipe for MSVC to build the
daitch_motokoff.h file.

Per buildfarm animal bowerbird.

Revert "Add support for Kerberos credential delegation"

This reverts commit 3d4fa227bce4294ce1cc214b4a9d3b7caa3f0454.

Per discussion and buildfarm, this depends on APIs that seem to not
be available on at least one platform (NetBSD). Should be certainly
possible to rework to be optional on that platform if necessary but bit
late for that at this point.

Discussion: https://postgr.es/m/3286097.1680922218@sss.pgh.pa.us

Redesign interrupt/cancel API for regex engine.

Previously, a PostgreSQL-specific callback checked by the regex engine
had a way to trigger a special error code REG_CANCEL if it detected that
the next call to CHECK_FOR_INTERRUPTS() would certainly throw via
ereport().

A later proposed bugfix aims to move some complex logic out of signal
handlers, so that it won't run until the next CHECK_FOR_INTERRUPTS(),
which makes the above design impossible unless we split
CHECK_FOR_INTERRUPTS() into two phases, one to run logic and another to
ereport(). We may develop such a system in the future, but for the
regex code it is no longer necessary.

An earlier commit moved regex memory management over to our
MemoryContext system. Given that the purpose of the two-phase interrupt
checking was to free memory before throwing, something we don't need to
worry about anymore, it seems simpler to inject CHECK_FOR_INTERRUPTS()
directly into cancelation points, and just let it throw.

Since the plan is to keep PostgreSQL-specific concerns separate from the
main regex engine code (with a view to bein able to stay in sync with
other projects), do this with a new macro INTERRUPT(), customizable in
regcustom.h and defaulting to nothing.

Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com

Update contrib/trgm_regexp's memory management.

While no code change was necessary for this code to keep working, we
don't need to use PG_TRY()/PG_FINALLY() with explicit clean-up while
working with regexes anymore.

Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com

Update tsearch regex memory management.

Now that our regex engine uses palloc(), it's not necessary to set up a
special memory context callback to free compiled regexes. The regex has
no resources other than the memory that is already going to be freed in
bulk.

Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com

Use MemoryContext API for regex memory management.

Previously, regex_t objects' memory was managed with malloc() and free()
directly.  Switch to palloc()-based memory management instead.
Advantages:

* memory used by cached regexes is now visible with MemoryContext
   observability tools

* cleanup can be done automatically in certain failure modes
   (something that later commits will take advantage of)

* cleanup can be done in bulk

On the downside, there may be more fragmentation (wasted memory) due to
per-regex MemoryContext objects.  This is a problem shared with other
cached objects in PostgreSQL and can probably be improved with later
tuning.

Thanks to Noah Misch for suggesting this general approach, which
unblocks later work on interrupts.

Suggested-by: Noah Misch <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com

TAP test for logical decoding on standby

Author: "Drouvot, Bertrand" <[email protected]>
Author: Amit Khandekar <[email protected]>
Author: Craig Ringer <[email protected]> (in an older version)
Author: Andres Freund <[email protected]>
Reviewed-by: "Drouvot, Bertrand" <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Reviewed-by: Fabrízio de Royes Mello <[email protected]>

Allow logical decoding on standbys

Unsurprisingly, this requires wal_level = logical to be set on the primary and
standby. The infrastructure added in 26669757b6a ensures that slots are
invalidated if the primary's wal_level is lowered.

Creating a slot on a standby waits for a xl_running_xact record to be
processed. If the primary is idle (and thus not emitting xl_running_xact
records), that can take a while. To make that faster, this commit also
introduces the pg_log_standby_snapshot() function. By executing it on the
primary, completion of slot creation on the standby can be accelerated.

Note that logical decoding on a standby does not itself enforce that required
catalog rows are not removed. The user has to use physical replication slots +
hot_standby_feedback or other measures to prevent that. If catalog rows
required for a slot are removed, the slot is invalidated.

See 6af1793954e for an overall design of logical decoding on a standby.

Bumps catversion, for the addition of the pg_log_standby_snapshot() function.

Author: "Drouvot, Bertrand" <[email protected]>
Author: Andres Freund <[email protected]> (in an older version)
Author: Amit Khandekar <[email protected]> (in an older version)
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: FabrÌzio de Royes Mello <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Reviewed-By: Robert Haas <[email protected]>

For cascading replication, wake physical and logical walsenders separately

Physical walsenders can't send data until it's been flushed; logical
walsenders can't decode and send data until it's been applied. On the
standby, the WAL is flushed first, which will only wake up physical
walsenders; and then applied, which will only wake up logical
walsenders.

Previously, all walsenders were awakened when the WAL was flushed. That
was fine for logical walsenders on the primary; but on the standby the
flushed WAL would have been not applied yet, so logical walsenders were
awakened too early.

Per idea from Jeff Davis and Amit Kapila.

Author: "Drouvot, Bertrand" <[email protected]>
Reviewed-By: Jeff Davis <[email protected]>
Reviewed-By: Robert Haas <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Discussion: https://postgr.es/m/CAA4eK1+zO5LUeisabX10c81LU-fWMKO4M9Wyg1cdkbW7Hqh6vQ@mail.gmail.com

Handle logical slot conflicts on standby

During WAL replay on the standby, when a conflict with a logical slot is
identified, invalidate such slots. There are two sources of conflicts:
1) Using the information added in 6af1793954e, logical slots are invalidated if
required rows are removed
2) wal_level on the primary server is reduced to below logical

Uses the infrastructure introduced in the prior commit. FIXME: add commit
reference.

Change InvalidatePossiblyObsoleteSlot() to use a recovery conflict to
interrupt use of a slot, if called in the startup process. The new recovery
conflict is added to pg_stat_database_conflicts, as confl_active_logicalslot.

See 6af1793954e for an overall design of logical decoding on a standby.

Bumps catversion for the addition of the pg_stat_database_conflicts column.
Bumps PGSTAT_FILE_FORMAT_ID for the same reason.

Author: "Drouvot, Bertrand" <[email protected]>
Author: Andres Freund <[email protected]>
Author: Amit Khandekar <[email protected]> (in an older version)
Reviewed-by: "Drouvot, Bertrand" <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Fabrízio de Royes Mello <[email protected]>
Reviewed-by: Bharath Rupireddy <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Reviewed-by: Alvaro Herrera <[email protected]>
Discussion: https://postgr.es/m/20230407075009 [email protected]

Support invalidating replication slots due to horizon and wal_level

Needed for logical decoding on a standby. Slots need to be invalidated because
of the horizon if rows required for logical decoding are removed. If the
primary's wal_level is lowered from 'logical', logical slots on the standby
need to be invalidated.

The new invalidation methods will be used in a subsequent commit.

Logical slots that have been invalidated can be identified via the new
pg_replication_slots.conflicting column.

See 6af1793954e for an overall design of logical decoding on a standby.

Bumps catversion for the addition of the new pg_replication_slots column.

Author: "Drouvot, Bertrand" <[email protected]>
Author: Andres Freund <[email protected]>
Author: Amit Khandekar <[email protected]> (in an older version)
Reviewed-by: "Drouvot, Bertrand" <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Fabrízio de Royes Mello <[email protected]>
Reviewed-by: Bharath Rupireddy <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Reviewed-by: Melanie Plageman <[email protected]>
Reviewed-by: Alvaro Herrera <[email protected]>
Discussion: https://postgr.es/m/20230407075009 [email protected]

Fix underspecified sort order in inherit.sql

Introduced in e056c557aef4.

Per buildfarm member prion.

Prevent use of invalidated logical slot in CreateDecodingContext()

Previously we had checks for this in multiple places. Support for logical
decoding on standbys will add other forms of invalidation, making it worth
while to centralize the checks.

This slightly changes the error message for both the walsender and SQL
interface. Particularly the SQL interface error was inaccurate, as the "This
slot has never previously reserved WAL" portion was unreachable.

Reviewed-by: "Drouvot, Bertrand" <[email protected]>
Reviewed-by: Melanie Plageman <[email protected]>
Discussion: https://postgr.es/m/20230407075009 [email protected]

Replace replication slot's invalidated_at LSN with an enum

This is mainly useful because the upcoming logical-decoding-on-standby feature
adds further reasons for invalidating slots, and we don't want to end up with
multiple invalidated_* fields, or check different attributes.

Eventually we should consider not resetting restart_lsn when invalidating a
slot due to max_slot_wal_keep_size. But that's a user visible change, so left
for later.

Increases SLOT_VERSION, due to the changed field (with a different alignment,
no less).

Reviewed-by: "Drouvot, Bertrand" <[email protected]>
Reviewed-by: Alvaro Herrera <[email protected]>
Reviewed-by: Melanie Plageman <[email protected]>
Discussion: https://postgr.es/m/20230407075009 [email protected]

Add io_direct setting (developer-only).

Provide a way to ask the kernel to use O_DIRECT (or local equivalent)
where available for data and WAL files, to avoid or minimize kernel
caching.  This hurts performance currently and is not intended for end
users yet.  Later proposed work would introduce our own I/O clustering,
read-ahead, etc to replace the facilities the kernel disables with this
option.

The only user-visible change, if the developer-only GUC is not used, is
that this commit also removes the obscure logic that would activate
O_DIRECT for the WAL when wal_sync_method=open_[data]sync and
wal_level=minimal (which also requires max_wal_senders=0).  Those are
non-default and unlikely settings, and this behavior wasn't (correctly)
documented.  The same effect can be achieved with io_direct=wal.

Author: Thomas Munro <[email protected]>
Author: Andres Freund <[email protected]>
Author: Bharath Rupireddy <[email protected]>
Reviewed-by: Justin Pryzby <[email protected]>
Reviewed-by: Bharath Rupireddy <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg%40mail.gmail.com

Introduce PG_IO_ALIGN_SIZE and align all I/O buffers.

In order to have the option to use O_DIRECT/FILE_FLAG_NO_BUFFERING in a
later commit, we need the addresses of user space buffers to be well
aligned.  The exact requirements vary by OS and file system (typically
sectors and/or memory pages).  The address alignment size is set to
4096, which is enough for currently known systems: it matches modern
sectors and common memory page size.  There is no standard governing
O_DIRECT's requirements so we might eventually have to reconsider this
with more information from the field or future systems.

Aligning I/O buffers on memory pages is also known to improve regular
buffered I/O performance.

Three classes of I/O buffers for regular data pages are adjusted:
(1) Heap buffers are now allocated with the new palloc_aligned() or
MemoryContextAllocAligned() functions introduced by commit 439f6175.
(2) Stack buffers now use a new struct PGIOAlignedBlock to respect
PG_IO_ALIGN_SIZE, if possible with this compiler.  (3) The buffer
pool is also aligned in shared memory.

WAL buffers were already aligned on XLOG_BLCKSZ.  It's possible for
XLOG_BLCKSZ to be configured smaller than PG_IO_ALIGNED_SIZE and thus
for O_DIRECT WAL writes to fail to be well aligned, but that's a
pre-existing condition and will be addressed by a later commit.

BufFiles are not yet addressed (there's no current plan to use O_DIRECT
for those, but they could potentially get some incidental speedup even
in plain buffered I/O operations through better alignment).

If we can't align stack objects suitably using the compiler extensions
we know about, we disable the use of O_DIRECT by setting PG_O_DIRECT to
0.  This avoids the need to consider systems that have O_DIRECT but
can't align stack objects the way we want; such systems could in theory
be supported with more work but we don't currently know of any such
machines, so it's easier to pretend there is no O_DIRECT support
instead.  That's an existing and tested class of system.

Add assertions that all buffers passed into smgrread(), smgrwrite() and
smgrextend() are correctly aligned, unless PG_O_DIRECT is 0 (= stack
alignment tricks may be unavailable) or the block size has been set too
small to allow arrays of buffers to be all aligned.

Author: Thomas Munro <[email protected]>
Author: Andres Freund <[email protected]>
Reviewed-by: Justin Pryzby <[email protected]>
Discussion: https://postgr.es/m/CA+hUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg@mail.gmail.com

Doc: Fix the datatype of the newly added SUBSCRIPTION options.

In docs, the datatype of "password_required" and "run_as_owner" was
incorrectly specified as a string.

Author: Amit Kapila
Reviewed-by: Sawada Masahiko
Discussion: https://postgr.es/m/CAHut+Pu=pnJf=SS1583pknSQ3CbOqLCkWcJCQYt6zxTagHEdmw@mail.gmail.com

Add missing .gitignore entry.

Seems an oversight in 7d8219a44. Fix before somebody commits
a generated file.

Remove useless dependencies in daitch_mokotoff_header.pl.

Actually, the correct fix for this is "we don't need this at all",
because this program isn't dealing in any non-ASCII data. The
dependency on Data::Dumper seems to be a leftover too.

Discussion: https://postgr.es/m/3287943.1680922997@sss.pgh.pa.us

Add support for Kerberos credential delegation

Support GSSAPI/Kerberos credentials being delegated to the server by a
client. With this, a user authenticating to PostgreSQL using Kerberos
(GSSAPI) credentials can choose to delegate their credentials to the
PostgreSQL server (which can choose to accept them, or not), allowing
the server to then use those delegated credentials to connect to
another service, such as with postgres_fdw or dblink or theoretically
any other service which is able to be authenticated using Kerberos.

Both postgres_fdw and dblink are changed to allow non-superuser
password-less connections but only when GSSAPI credentials have been
delegated to the server by the client and GSSAPI is used to
authenticate to the remote system.

Authors: Stephen Frost, Peifeng Qiu
Reviewed-By: David Christensen
Discussion: https://postgr.es/m/CO1PR05MB8023CC2CB575E0FAAD7DF4F8A8E29@CO1PR05MB8023.namprd05.prod.outlook.com

Pacify perlcritic.

Discussion: https://postgr.es/m/3271512.1680916423@sss.pgh.pa.us

Track IO times in pg_stat_io

a9c70b46dbe and 8aaa04b32S added counting of IO operations to a new view,
pg_stat_io. Now, add IO timing for reads, writes, extends, and fsyncs to
pg_stat_io as well.

This combines the tracking for pgBufferUsage with the tracking for pg_stat_io
into a new function pgstat_count_io_op_time(). This should make it a bit
easier to avoid the somewhat costly instr_time conversion done for
pgBufferUsage.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ay5iKmnbXZ3DsauViF3eMxu4m1oNnJXqV_HyqYeg55Ww%40mail.gmail.com

Show more detail in nbtree rmgr descriptions.

Show a detailed description of the page offset number arrays that appear
in certain nbtree WAL records.

Also brings nbtree desc routines in line with the guidelines established
by recent commit 7d8219a4.

Author: Melanie Plageman <[email protected]>
Reviewed-By: Peter Geoghegan <[email protected]>
Discussion: https://postgr.es/m/flat/20230109215842.fktuhesvayno6o4g%40awork3.anarazel.de

For Kerberos testing, disable DNS lookups

Similar to 8dff2f224, this disables DNS lookups by the Kerberos library
to look up the KDC and the realm while the Kerberos tests are running.
In some environments, these lookups can take a long time and end up
timing out and causing tests to fail. Further, since this isn't really
our domain, we shouldn't be sending out these DNS requests during our
tests.

Show more detail in heapam rmgr descriptions.

Add helper functions that output arrays in a standard format, and use
the functions inside heapdesc routines. This allows tools like
pg_walinspect to show a detailed description of the page offset number
arrays for records like PRUNE and VACUUM (unless there was an FPI).

Also document the conventions that desc routines should follow. Only
the heapdesc routines follow the conventions for now, so they're just
guidelines for the time being.

Based on a suggestion from Andres Freund.

Author: Melanie Plageman <[email protected]>
Reviewed-By: Peter Geoghegan <[email protected]>
Discussion: https://postgr.es/m/flat/20230109215842.fktuhesvayno6o4g%40awork3.anarazel.de

Adjust contrib/sepgsql regression test expected outputs.

Per buildfarm, the log output has changed as a consequence of
commit e056c557a changing the catalog accesses performed in
some commands.

Discussion: https://postgr.es/m/20230407211950 [email protected]

Add support for Daitch-Mokotoff Soundex in contrib/fuzzystrmatch.

This modernized version of Soundex works significantly better than
the original, particularly for non-English names.

Dag Lem, reviewed by quite a few people along the way

Discussion: https://postgr.es/m/[email protected]

Fix table name clash in recently introduced test

A few buildfarm animals recently started complaining about the "child"
relation already existing. e056c557aef added a new child table to inherit.sql,
but triggers.sql, running in the same parallel group, also uses a child table.

Rename the new table to inh_child. It maybe worth renaming child, parent in
other tests as well, but that's work for another day.

Discussion: https://postgr.es/m/20230407204530 [email protected]

Improve IO accounting for temp relation writes

Both pgstat_database and pgBufferUsage count IO timing for reads of temporary
relation blocks into local buffers. However, both failed to count write IO
timing for flushes of dirty local buffers. Fix.

Additionally, FlushRelationBuffers() seems to have omitted counting write
IO (both count and timing) stats for both pgstat_database and
pgBufferUsage. Fix.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/20230321023451.7rzy4kjj2iktrg2r%40awork3.anarazel.de

Test SCRAM iteration changes with psql \password

A version of this test was included in the original patch for altering
SCRAM iteration count, but was omitted due to how interactive psql TAP
sessions worked before being refactored.

Discussion: https://postgr.es/m/20230130194350 [email protected]
Discussion: https://postgr.es/m/F72E7BC7-189F-4B17-BF47-9735EB72C364@yesql.se

Refactor background psql TAP functions

This breaks out the background and interactive psql functionality into a
new class, PostgreSQL::Test::BackgroundPsql.  Sessions are still initiated
via PostgreSQL::Test::Cluster, but once started they can be manipulated by
the new helper functions which intend to make querying easier.  A sample
session for a command which can be expected to finish at a later time can
be seen below.

  my $session = $node->background_psql('postgres');
  $bsession->query_until(qr/start/, q(
    \echo start
CREATE INDEX CONCURRENTLY idx ON t(a);
  ));
  $bsession->quit;

Patch by Andres Freund with some additional hacking by me.

Author: Andres Freund <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Discussion: https://postgr.es/m/20230130194350 [email protected]

Fix underspecified sort order in test query

Fail in e056c557aef4.

Add pg_buffercache_usage_counts() to contrib/pg_buffercache.

It was pointed out that pg_buffercache_summary()'s report of
the overall average usage count isn't that useful, and what
would be more helpful in many cases is to report totals for
each possible usage count. Add a new function to do it like
that. Since pg_buffercache 1.4 is already new for v16,
we don't need to create a new extension version; we'll just
define this as part of 1.4.

Nathan Bossart

Discussion: https://postgr.es/m/20230130233040.GA2800702@nathanxps13

Catalog NOT NULL constraints

We now create pg_constaint rows for NOT NULL constraints with
contype='n'.

We propagate these constraints during operations such as adding
inheritance relationships, creating and attaching partitions, creating
tables LIKE other tables.  We mostly follow the well-known rules of
conislocal and coninhcount that we have for CHECK constraints, with some
adaptations; for example, as opposed to CHECK constraints, we don't
match NOT NULL ones by name when descending a hierarchy to alter it;
instead we match by column number.  This means we don't require the
constraint names to be identical across a hierarchy.

For now, we omit them from system catalogs.  Maybe this is worth
reconsidering.  We don't support NOT VALID nor DEFERRABLE clauses
either; these can be added as separate features later (this patch is
already large and complicated enough.)

This has been very long in the making.  The first patch was written by
Bernd Helmle in 2010 to add a new pg_constraint.contype value ('n'),
which I (Álvaro) then hijacked in 2011 and 2012, until that one was
killed by the realization that we ought to use contype='c' instead:
manufactured CHECK constraints.  However, later SQL standard
development, as well as nonobvious emergent properties of that design
(mostly, failure to distinguish them from "normal" CHECK constraints as
well as the performance implication of having to test the CHECK
expression) led us to reconsider this choice, so now the current
implementation uses contype='n' again.

In 2016 Vitaly Burovoy also worked on this feature[1] but found no
consensus for his proposed approach, which was claimed to be closer to
the letter of the standard, requiring additional pg_attribute columns to
track the OID of the NOT NULL constraint for that column.
[1] https://postgr.es/m/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA@mail.gmail.com

Author: Álvaro Herrera <[email protected]>
Author: Bernd Helmle <[email protected]>
Reviewed-by: Justin Pryzby <[email protected]>
Reviewed-by: Peter Eisentraut <[email protected]>
Discussion: https://postgr.es/m/CACA0E642A0267EDA387AF2B%40%5B172.26.14.62%5D
Discussion: https://postgr.es/m/[email protected]
Discussion: https://postgr.es/m/20110707213401 [email protected]
Discussion: https://postgr.es/m/1343682669 [email protected]
Discussion: https://postgr.es/m/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA@mail.gmail.com
Discussion: https://postgr.es/m/20220817181249 [email protected]

Doc: improve descriptions of max_[pred_]locks_per_transaction GUCs.

The old wording described these as being multiplied by max_connections
plus max_prepared_transactions, which hasn't been exactly right for
some time thanks to the addition of various auxiliary processes.
Moreover, exactness here is a bit pointless given that the lock tables
can expand into the initially-unallocated "slop" space in shared
memory. Rather than trying to track exactly what the code is doing,
let's just use the term "server processes".

Likewise adjust these GUCs' description strings in guc_tables.c.

Wang Wei, reviewed by Nathan Bossart and myself

Discussion: https://postgr.es/m/OS3PR01MB6275BDD09C9B875C65FCC5AB9EA39@OS3PR01MB6275.jpnprd01.prod.outlook.com

Add array_sample() and array_shuffle() functions.

These are useful in Monte Carlo applications.

Martin Kalcher, reviewed/adjusted by Daniel Gustafsson and myself

Discussion: https://postgr.es/m/9d160a44-7675-51e8-60cf-6d64b76db831@aboutsource.net

Fix locale-dependent test case.

psql parses the interval argument of \watch with locale-dependent
strtod(). In commit 00beecfe8 I added a test case that exercises
a fractional interval, but I hard-coded 0.01 which doesn't work
in locales where the radix point isn't ".". We don't want to
change this longstanding parsing behavior, so fix the test case
to generate a suitably locale-aware spelling.

Report and patch by Alexander Korotkov.

Discussion: https://postgr.es/m/CAPpHfdv+10Uk6FWjsh3+ju7kHYr76LaRXbYayXmrM7FBU-=Hgg@mail.gmail.com

Fix copy-paste bug in 12f3867f553 triggering an assert after a write error

The same condition accidentally was copied to both branches. Manual testing
confirms that otherwise the error recovery path works fine.

Found while reviewing the logical-decoding-on-standby patch.

Add tab-completion for newly added SUBSCRIPTION options.

Commits c3afe8cf5a and 482675987b added new subscription options
"password_required" and "run_as_owner". This patch adds tab-completion
for these newly added options.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pu=pnJf=SS1583pknSQ3CbOqLCkWcJCQYt6zxTagHEdmw@mail.gmail.com

Add more protections in WAL record APIs against overflows

This commit adds a limit to the size of an XLogRecord at 1020MB, based
on a suggestion by Heikki Linnakangas.  This counts for the overhead
needed by the XLogReader when allocating the memory it needs to read a
record in DecodeXLogRecordRequiredSpace(), based on the record size.  An
assertion based on that is added to detect that any additions in the
XLogReader facilities would not cause any overflows.  If that's ever the
case, the upper bound allowed would need to be adjusted.

Before this, it was possible for an external module to create WAL
records large enough to be assembled but not replayable, causing
failures when replaying such WAL records on standbys.  One case
mentioned where this is possible is the in-core function
pg_logical_emit_message() (wrapper for LogLogicalMessage), that allows
to emit WAL records with an arbitrary amount of data potentially higher
than the replay limit of approximately 1GB (limit of a palloc, minus the
overhead needed by a XLogReader).

This commit is a follow-up of ffd1b6b that has added similar protections
for the block-level data.  Here, the checks are extended to the whole
record length, mainrdata_len being extended from uint32 to uint64 with
the routines registering buffer and record data still limited to uint32
to minimize the checks when assembling a record.  All the error messages
related to overflow checks are improved to provide more context about
the error happening.

Author: Matthias van de Meent
Reviewed-by: Andres Freund, Heikki Linnakangas, Michael Paquier
Discussion: https://postgr.es/m/CAEze2WgGiw+LZt+vHf8tWqB_6VxeLsMeoAuod0N=ij1q17n5pw@mail.gmail.com

Use ExtendBufferedRelTo() in XLogReadBufferExtended()

Instead of extending the relation block-by-block, use ExtendBufferedRelTo(),
introduced in 31966b151e6. This is faster and simpler.

This also somewhat reduces the danger that disconnected segments pose (which
can be "discovered" once the previous segment reaches SEGSIZE), as
ExtendBufferedRelTo() won't extend past the block it has been asked. However,
the risk of the content of such a disconnected segment being invalid
remains.

Discussion: https://postgr.es/m/20221029025420 [email protected]
Discussion: https://postgr.es/m/20230223010147 [email protected]

Add --buffer-usage-limit option to vacuumdb

1cbbee033 added BUFFER_USAGE_LIMIT to the VACUUM and ANALYZE commands, so
here we permit that option to be specified in vacuumdb.

In passing, adjust the documents for vacuum_buffer_usage_limit and the
BUFFER_USAGE_LIMIT VACUUM option to mention "kB" rather than "KB". Do the
same for the ERROR message in ExecVacuum() and
check_vacuum_buffer_usage_limit(). Without that we might tell a user that
the valid minimum value is 128 KB only to reject that because we accept
only "kB" and not "KB".

Also, add a small reminder comment in vacuum.h to try to trigger the
memory of anyone adding new fields to VacuumParams that they might want to
consider if vacuumdb needs to grow a new option too.

Author: Melanie Plageman
Reviewed-by: Justin Pryzby
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/[email protected]

hio: Use ExtendBufferedRelBy() to extend tables more efficiently

While we already had some form of bulk extension for relations, it was fairly
limited. It only amortized the cost of acquiring the extension lock, the
relation itself was still extended one-by-one. Bulk extension was also solely
triggered by contention, not by the amount of data inserted.

To address this, use ExtendBufferedRelBy(), introduced in 31966b151e6, to
extend the relation. We try to extend the relation by multiple blocks in two
situations:

1) The caller tells RelationGetBufferForTuple() that it will need multiple
   pages. For now that's only used by heap_multi_insert(), see commit FIXME.

2) If there is contention on the extension lock, use the number of waiters for
   the lock as a multiplier for the number of blocks to extend by. This is
   similar to what we already did. Previously we additionally multiplied the
   numbers of waiters by 20, but with the new relation extension
   infrastructure I could not see a benefit in doing so.

Using the freespacemap to provide empty pages can cause significant
contention, and adds measurable overhead, even if there is no contention. To
reduce that, remember the blocks the relation was extended by in the
BulkInsertState, in the extending backend. In case 1) from above, the blocks
the extending backend needs are not entered into the FSM, as we know that we
will need those blocks.

One complication with using the FSM to record empty pages, is that we need to
insert blocks into the FSM, when we already hold a buffer content lock. To
avoid doing IO while holding a content lock, release the content lock before
recording free space. Currently that opens a small window in which another
backend could fill the block, if a concurrent VACUUM records the free
space. If that happens, we retry, similar to the already existing case when
otherBuffer is provided. In the future it might be worth closing the race by
preventing VACUUM from recording the space in newly extended pages.

This change provides very significant wins (3x at 16 clients, on my
workstation) for concurrent COPY into a single relation. Even single threaded
COPY is measurably faster, primarily due to not dirtying pages while
extending, if supported by the operating system (see commit 4d330a61bb1). Even
single-row INSERTs benefit, although to a much smaller degree, as the relation
extension lock rarely is the primary bottleneck.

Reviewed-by: Melanie Plageman <[email protected]>
Discussion: https://postgr.es/m/20221029025420 [email protected]

Add VACUUM/ANALYZE BUFFER_USAGE_LIMIT option

Add new options to the VACUUM and ANALYZE commands called
BUFFER_USAGE_LIMIT to allow users more control over how large to make the
buffer access strategy that is used to limit the usage of buffers in
shared buffers.  Larger rings can allow VACUUM to run more quickly but
have the drawback of VACUUM possibly evicting more buffers from shared
buffers that might be useful for other queries running on the database.

Here we also add a new GUC named vacuum_buffer_usage_limit which controls
how large to make the access strategy when it's not specified in the
VACUUM/ANALYZE command.  This defaults to 256KB, which is the same size as
the access strategy was prior to this change.  This setting also
controls how large to make the buffer access strategy for autovacuum.

Per idea by Andres Freund.

Author: Melanie Plageman
Reviewed-by: David Rowley
Reviewed-by: Andres Freund
Reviewed-by: Justin Pryzby
Reviewed-by: Bharath Rupireddy
Discussion: https://postgr.es/m/20230111182720 [email protected]

heapam: Pass number of required pages to RelationGetBufferForTuple()

A future commit will use this information to determine how aggressively to
extend the relation by. In heap_multi_insert() we know accurately how many
pages we need once we need to extend the relation, providing an accurate lower
bound for how much to extend.

Reviewed-by: Melanie Plageman <[email protected]>
Discussion: https://postgr.es/m/20221029025420 [email protected]

Refresh cost-based delay params more frequently in autovacuum

Allow autovacuum to reload the config file more often so that cost-based
delay parameters can take effect while VACUUMing a relation. Previously,
autovacuum workers only reloaded the config file once per relation
vacuumed, so config changes could not take effect until beginning to
vacuum the next table.

Now, check if a reload is pending roughly once per block, when checking
if we need to delay.

In order for autovacuum workers to safely update their own cost delay
and cost limit parameters without impacting performance, we had to
rethink when and how these values were accessed.

Previously, an autovacuum worker's wi_cost_limit was set only at the
beginning of vacuuming a table, after reloading the config file.
Therefore, at the time that autovac_balance_cost() was called, workers
vacuuming tables with no cost-related storage parameters could still
have different values for their wi_cost_limit_base and wi_cost_delay.

Now that the cost parameters can be updated while vacuuming a table,
workers will (within some margin of error) have no reason to have
different values for cost limit and cost delay (in the absence of
cost-related storage parameters). This removes the rationale for keeping
cost limit and cost delay in shared memory. Balancing the cost limit
requires only the number of active autovacuum workers vacuuming a table
with no cost-based storage parameters.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Reviewed-by: Kyotaro Horiguchi <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com

Separate vacuum cost variables from GUCs

Vacuum code run both by autovacuum workers and a backend doing
VACUUM/ANALYZE previously inspected VacuumCostLimit and VacuumCostDelay,
which are the global variables backing the GUCs vacuum_cost_limit and
vacuum_cost_delay.

Autovacuum workers needed to override these variables with their
own values, derived from autovacuum_vacuum_cost_limit and
autovacuum_vacuum_cost_delay and worker cost limit balancing logic.
This led to confusing code which, in some cases, both derived and
set a new value of VacuumCostLimit from VacuumCostLimit.

In preparation for refreshing these GUC values more often, introduce
new, independent global variables and add a function to update them
using the GUCs and existing logic.

Per suggestion by Kyotaro Horiguchi

Author: Melanie Plageman <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Reviewed-by: Kyotaro Horiguchi <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com

Make vacuum failsafe_active globally visible

While vacuuming a table in failsafe mode, VacuumCostActive should
not be re-enabled. This currently isn't a problem because vacuum
cost parameters are only refreshed in between vacuuming tables and
failsafe status is reset for every table.

In preparation for allowing vacuum cost parameters to be updated
more frequently, elevate LVRelState->failsafe_active to a global,
VacuumFailsafeActive, which will be checked when determining whether
or not to re-enable vacuum cost-related delays.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Reviewed-by: Kyotaro Horiguchi <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com

Stabilize just-added regression test cases.

The tests added by commits 029dea882 et al turn out to produce
different output under -DRANDOMIZE_ALLOCATED_MEMORY.  This is
not a bug exactly: that flag causes coerce_type() to invoke
the input function twice when coercing an unknown-type literal
to a specific type.  So you get tsqueryin's bleat about an empty
tsquery twice.  Revise the test query to avoid that.

Discussion: https://postgr.es/m/20230406213813 [email protected]

psql: set SHELL_ERROR and SHELL_EXIT_CODE in more places.

Make the \g, \o, \w, and \copy commands set these variables
when closing a pipe. We missed doing this in commit b0d8f2d98,
but it seems like a good idea.

There are some remaining places in psql that intentionally don't
update these variables after running a child program:
* pager invocations
* backtick evaluation within a prompt
* \e (edit query buffer)

Corey Huinker and Tom Lane

Discussion: https://postgr.es/m/CADkLM=eSKwRGF-rnRqhtBORRtL49QsjcVUCa-kLxKTqxypsakw@mail.gmail.com

Fix ts_headline() edge cases for empty query and empty search text.

tsquery's GETQUERY() macro is only safe to apply to a tsquery
that is known non-empty; otherwise it gives a pointer to garbage.
Before commit 5a617d75d, ts_headline() avoided this pitfall, but
only in a very indirect, nonobvious way.  (hlCover could not reach
its TS_execute call, because if the query contains no lexemes
then hlFirstIndex would surely return -1.)  After that commit,
it fell into the trap, resulting in weird errors such as
"unrecognized operator" and/or valgrind complaints.  In HEAD,
fix this by not calling TS_execute_locations() at all for an
empty query.  In the back branches, add a defensive check to
hlCover() --- that's not fixing any live bug, but I judge the
code a bit too fragile as-is.

Also, both mark_hl_fragments() and mark_hl_words() were careless
about the possibility of empty search text: in the cases where
no match has been found, they'd end up telling mark_fragment() to
mark from word indexes 0 to 0 inclusive, even when there is no
word 0.  This is harmless since we over-allocated the prs->words
array, but it does annoy valgrind.  Fix so that the end index is -1
and thus mark_fragment() will do nothing in such cases.

Bottom line is that this fixes a live bug in HEAD, but in the
back branches it's only getting rid of a valgrind nitpick.
Back-patch anyway.

Per report from Alexander Lakhin.

Discussion: https://postgr.es/m/c27f642d-020b-01ff-ae61-086af287c4fd@gmail.com

hio: Don't pin the VM while holding buffer lock while extending

Starting with commit 7db0cd2145f, RelationGetBufferForTuple() did a
visibilitymap_pin() while holding an exclusive buffer content lock on a newly
extended page, when using COPY FREEZE. We elsewhere try hard to avoid to doing
IO while holding a content lock. And until 14f98e0af99, that happened while
holding the relation extension lock.

Practically, this isn't a huge issue, because COPY FREEZE is restricted to
relations created or truncated in the current session, so it's unlikely
there's a lot of contention.

We can't avoid doing IO while holding the content lock by pinning the VM
earlier, because we don't know which page it will be on.

While we could just ignore the issue in this case, a future commit will add
bulk relation extension, which needs to enter pages into the FSM while also
trying to hold onto a buffer lock.

To address this issue, use visibilitymap_pin_ok() to see if the relevant
buffer is already pinned. If not, release the buffer, pin the VM buffer, and
acquire the lock again. This opens up a small window for other backends to
insert data onto the page - as the page is not entered into the freespacemap,
other backends won't see it normally, but a concurrent vacuum could enter the
page, if it started just after the relation is extended. In case the page is
used by another backend, retry. This is very similar to how locking
"otherBuffer" is already dealt with.

Reviewed-by: Tomas Vondra <[email protected]>
Discussion: http://postgr.es/m/20230325025740 [email protected]

hio: Relax rules for calling GetVisibilityMapPins()

GetVisibilityMapPins() insisted on the buffer1/buffer2 being in a specific
order. This required checks at the callsite. As a subsequent patch will add
another callsite, move related logic into GetVisibilityMapPins().

Discussion: https://postgr.es/m/20230403190030 [email protected]

psql: add an optional execution-count limit to \watch.

\watch can now be told to stop after N executions of the query.

With the idea that we might want to add more options to \watch
in future, this patch generalizes the command's syntax to a list
of name=value options, with the interval allowed to omit the name
for backwards compatibility.

Andrey Borodin, reviewed by Kyotaro Horiguchi, Nathan Bossart,
Michael Paquier, Yugo Nagata, and myself

Discussion: https://postgr.es/m/CAAhFRxiZ2-n_L1ErMm9AZjgmUK=qS6VHb+0SaMn8sqqbhF7How@mail.gmail.com

Support long distance matching for zstd compression

zstd compression supports a special mode for finding matched in distant
past, which may result in better compression ratio, at the expense of
using more memory (the window size is 128MB).

To enable this optional mode, use the "long" keyword when specifying the
compression method (--compress=zstd:long).

Author: Justin Pryzby
Reviewed-by: Tomas Vondra, Jacob Champion
Discussion: https://postgr.es/m/20230224191840 [email protected]
Discussion: https://postgr.es/m/20220327205020 [email protected]

postgres_fdw: Add support for parallel abort.

postgres_fdw aborts remote (sub)transactions opened on remote server(s)
in a local (sub)transaction one by one when the local (sub)transaction
aborts.  This patch allows it to abort the remote (sub)transactions in
parallel to improve performance.  This is enabled by the server option
"parallel_abort".  The default is false.

Etsuro Fujita, reviewed by David Zhang.

Discussion: http://postgr.es/m/CAPmGK15FuPVGx3TGHKShsbPKKtF1y58-ZLcKoxfN-nqLj1dZ%3Dg%40mail.gmail.com

Move various prechecks from vacuum() into ExecVacuum()

vacuum() is used for both the VACUUM command and for autovacuum. There
were many prechecks being done inside vacuum() that were just not relevant
to autovacuum.  Let's move the bulk of these into ExecVacuum() so that
they're only executed when running the VACUUM command.  This removes a
small amount of overhead when autovacuum vacuums a table.

While we are at it, allocate VACUUM's BufferAccessStrategy in ExecVacuum()
and pass it into vacuum() instead of expecting vacuum() to make it if it's
not already made by the calling function.  To make this work, we need to
create the vacuum memory context slightly earlier, so we now need to pass
that down to vacuum() so that it's available for use in other memory
allocations.

Author: Melanie Plageman
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/20230405211534 [email protected]

Convert many uses of ReadBuffer[Extended](P_NEW) to ExtendBufferedRel()

A few places are not converted. Some because they are tackled in later
commits (e.g. hio.c, xlogutils.c), some because they are more
complicated (e.g. brin_pageops.c). Having a few users of ReadBuffer(P_NEW) is
good anyway, to ensure the backward compat path stays working.

Discussion: https://postgr.es/m/20221029025420 [email protected]

Use ExtendBufferedRelTo() in {vm,fsm}_extend()

This uses ExtendBufferedRelTo(), introduced in 31966b151e6, to extend the
visibilitymap and freespacemap to the size needed.

It also happens to fix a warning introduced in 3d6a98457d8, reported by Tom
Lane.

Discussion: https://postgr.es/m/20221029025420 [email protected]
Discussion: https://postgr.es/m/2194723.1680736788@sss.pgh.pa.us

Always make a BufferAccessStrategy for ANALYZE

32fbe0239 changed things so we didn't bother allocating the
BufferAccessStrategy during VACUUM (ONLY_DATABASE_STATS); and VACUUM
(FULL), however, it forgot to consider that VACUUM (FULL, ANALYZE) is a
possible combination. That change would have resulted in such a command
allowing ANALYZE to make full use of shared buffers, which wasn't
intended, so fix that.

Reported-by: Melanie Plageman
Discussion: https://postgr.es/m/CAAKRu_bJRKe+v_=OqwC+5sA3j5qv8rqdAwy3+yHaO3wmtfrCRg@mail.gmail.com

Fix row tracking in pg_stat_statements with extended query protocol

pg_stat_statements relies on EState->es_processed to count the number of
rows processed by ExecutorRun().  This proves to be a problem under the
extended query protocol when the result of a query is fetched through
more than one call of ExecutorRun(), as es_processed is reset each time
ExecutorRun() is called.  This causes pg_stat_statements to report the
number of rows calculated in the last execute fetch, rather than the
global sum of all the rows processed.

As pquery.c tells, this is a problem when a portal does not use
holdStore.  For example, DMLs with RETURNING would report a correct
tuple count as these do one execution cycle when the query is first
executed to fill in the portal's store with one ExecutorRun(), feeding
on the portal's store for each follow-up execute fetch depending on the
fetch size requested by the client.

The fix proposed for this issue is simple with the addition of an extra
counter in EState that's preserved across multiple ExecutorRun() calls,
incremented with the value calculated in es_processed.  This approach is
not back-patchable, unfortunately.

Note that libpq does not currently give any way to control the fetch
size when using the extended v3 protocol, meaning that in-core testing
is not possible yet.  This issue can be easily verified with the JDBC
driver, though, with *autocommit disabled*.  Hence, having in-core tests
requires more features, left for future discussion:
- At least two new libpq routines splitting PQsendQueryGuts(), one for
the bind/describe and a second for a series of execute fetches with a
custom fetch size, likely in a fashion similar to what JDBC does.
- A psql meta-command for the execute phase.  This part is not strictly
mandatory, still it could be handy.

Reported-by: Andrew Dunstan (original discovery by Simon Siggs)
Author: Sami Imseih
Reviewed-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/EBE6C507-9EB6-4142-9E4D-38B1673363A7@amazon.com
Discussion: https://postgr.es/m/c90890e7-9c89-c34f-d3c5-d5c763a34bd8@dunslane.net

bufmgr: Introduce infrastructure for faster relation extension

The primary bottlenecks for relation extension are:

1) The extension lock is held while acquiring a victim buffer for the new
   page. Acquiring a victim buffer can require writing out the old page
   contents including possibly needing to flush WAL.

2) When extending via ReadBuffer() et al, we write a zero page during the
   extension, and then later write out the actual page contents. This can
   nearly double the write rate.

3) The existing bulk relation extension infrastructure in hio.c just amortized
   the cost of acquiring the relation extension lock, but none of the other
   costs.

Unfortunately 1) cannot currently be addressed in a central manner as the
callers to ReadBuffer() need to acquire the extension lock. To address that,
this this commit moves the responsibility for acquiring the extension lock
into bufmgr.c functions. That allows to acquire the relation extension lock
for just the required time. This will also allow us to improve relation
extension further, without changing callers.

The reason we write all-zeroes pages during relation extension is that we hope
to get ENOSPC errors earlier that way (largely works, except for CoW
filesystems). It is easier to handle out-of-space errors gracefully if the
page doesn't yet contain actual tuples. This commit addresses 2), by using the
recently introduced smgrzeroextend(), which extends the relation, without
dirtying the kernel page cache for all the extended pages.

To address 3), this commit introduces a function to extend a relation by
multiple blocks at a time.

There are three new exposed functions: ExtendBufferedRel() for extending the
relation by a single block, ExtendBufferedRelBy() to extend a relation by
multiple blocks at once, and ExtendBufferedRelTo() for extending a relation up
to a certain size.

To avoid duplicating code between ReadBuffer(P_NEW) and the new functions,
ReadBuffer(P_NEW) now implements relation extension with
ExtendBufferedRel(), using a flag to tell ExtendBufferedRel() that the
relation lock is already held.

Note that this commit does not yet lead to a meaningful performance or
scalability improvement - for that uses of ReadBuffer(P_NEW) will need to be
converted to ExtendBuffered*(), which will be done in subsequent commits.

Reviewed-by: Heikki Linnakangas <[email protected]>
Reviewed-by: Melanie Plageman <[email protected]>
Discussion: https://postgr.es/m/20221029025420 [email protected]

Allow to use system CA pool for certificate verification

This adds a new option to libpq's sslrootcert, "system", which will load
the system trusted CA roots for certificate verification. This is a more
convenient way to achieve this than pointing to the system CA roots
manually since the location can differ by installation and be locally
adjusted by env vars in OpenSSL.

When sslrootcert is set to system, sslmode is forced to be verify-full
as weaker modes aren't providing much security for public CAs.

Changing the location of the system roots by setting environment vars is
not supported by LibreSSL so the tests will use a heuristic to determine
if the system being tested is LibreSSL or OpenSSL.

The workaround in .cirrus.yml is required to handle a strange interaction
between homebrew and the openssl@3 formula; hopefully this can be removed
in the near future.

The original patch was written by Thomas Habets, which was later revived
by Jacob Champion.

Author: Jacob Champion <[email protected]>
Author: Thomas Habets <[email protected]>
Reviewed-by: Jelte Fennema <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Magnus Hagander <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CA%2BkHd%2BcJwCUxVb-Gj_0ptr3_KZPwi3%2B67vK6HnLFBK9MzuYrLA%40mail.gmail.com

bufmgr: Support multiple in-progress IOs by using resowner

A future patch will add support for extending relations by multiple blocks at
once. To be concurrency safe, the buffers for those blocks need to be marked
as BM_IO_IN_PROGRESS. Until now we only had infrastructure for recovering from
an IO error for a single buffer. This commit extends that infrastructure to
multiple buffers by using the resource owner infrastructure.

This commit increases the size of the ResourceOwnerData struct, which appears
to have a just about measurable overhead in very extreme workloads. Medium
term we are planning to substantially shrink the size of
ResourceOwnerData. Short term the increase is small enough to not worry about
it for now.

Reviewed-by: Melanie Plageman <[email protected]>
Discussion: https://postgr.es/m/20221029025420 [email protected]
Discussion: https://postgr.es/m/20221029200025 [email protected]

Support "Right Anti Join" plan shapes.

Merge and hash joins can support antijoin with the non-nullable input
on the right, using very simple combinations of their existing logic
for right join and anti join. This gives the planner more freedom
about how to order the join. It's particularly useful for hash join,
since we may now have the option to hash the smaller table instead
of the larger.

Richard Guo, reviewed by Ronan Dunklau and myself

Discussion: https://postgr.es/m/CAMbWs48xh9hMzXzSy3VaPzGAz+fkxXXTUbCLohX1_L8THFRm2Q@mail.gmail.com