git.postgresql.org Git - postgresql.git/log

Inline CRC computation for small fixed-length input on x86

pg_crc32c.h now has a simplified copy of the loop in pg_crc32c_sse42.c
suitable for inlining where possible.

This may slightly reduce contention for the WAL insertion lock,
but that hasn't been tested. The motivation for this change is avoid
regressing for a future commit that will use a function pointer for
non-constant input in all x86 builds.

While it's technically possible to make a similar change for Arm and
LoongArch, there are some questions about how inlining should work
since those platforms prefer stricter alignment. There are also no
immediate plans to add additional implementations for them.

Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Raghuveer Devulapalli <[email protected]>
Discussion: https://postgr.es/m/CANWCAZZEiTzhZcuwTiJ2=opiNpAUn1vuDRu1N02z61AthwRZLA@mail.gmail.com
Discussion: https://postgr.es/m/CANWCAZYRhLHArpyfV4uRK-Rw9N5oV5HMkkKtBehcuTjNOMwCZg@mail.gmail.com

Add relallfrozen to pg_dump statistics.

Author: Corey Huinker <[email protected]>
Discussion: https://postgr.es/m/CADkLM=desCuf3dVHasADvdUVRmb-5gO0mhMO5u9nzgv6i7U86Q@mail.gmail.com

Enable IO concurrency on all systems

Previously effective_io_concurrency and maintenance_io_concurrency could not
be set above 0 on machines without fadvise support. AIO enables IO concurrency
without such support, via io_method=worker.

Currently only subsystems using the read stream API will take advantage of
this. Other users of maintenance_io_concurrency (like recovery prefetching)
which leverage OS advice directly will not benefit from this change. In those
cases, maintenance_io_concurrency will have no effect on I/O behavior.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/CAAKRu_atGgZePo=_g6T3cNtfMf0QxpvoUh5OUqa_cnPdhLd=gw@mail.gmail.com

read_stream: Introduce and use optional batchmode support

Submitting IO in larger batches can be more efficient than doing so
one-by-one, particularly for many small reads. It does, however, require
the ReadStreamBlockNumberCB callback to abide by the restrictions of AIO
batching (c.f. pgaio_enter_batchmode()). Basically, the callback may not:
a) block without first calling pgaio_submit_staged(), unless a
   to-be-waited-on lock cannot be part of a deadlock, e.g. because it is
   never held while waiting for IO.

b) directly or indirectly start another batch pgaio_enter_batchmode()

As this requires care and is nontrivial in some cases, batching is only
used with explicit opt-in.

This patch adds an explicit flag (READ_STREAM_USE_BATCHING) to read_stream and
uses it where appropriate.

There are two cases where batching would likely be beneficial, but where we
aren't using it yet:

1) bitmap heap scans, because the callback reads the VM

   This should soon be solved, because we are planning to remove the use of
   the VM, due to that not being sound.

2) The first phase of heap vacuum

   This could be made to support batchmode, but would require some care.

Reviewed-by: Noah Misch <[email protected]>
Reviewed-by: Thomas Munro <[email protected]>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt

aio: Basic read_stream adjustments for real AIO

Adapt the read stream logic for real AIO:
- If AIO is enabled, we shouldn't issue advice, but if it isn't, we should
  continue issuing advice
- AIO benefits from reading ahead with direct IO
- If effective_io_concurrency=0, pass READ_BUFFERS_SYNCHRONOUSLY to
  StartReadBuffers() to ensure synchronous IO execution

There are further improvements we should consider:

- While in read_stream_look_ahead(), we can use AIO batch submission mode for
  increased efficiency. That however requires care to avoid deadlocks and thus
  done separately.
- It can be beneficial to defer starting new IOs until we can issue multiple
  IOs at once. That however requires non-trivial heuristics to decide when to
  do so.

Reviewed-by: Noah Misch <[email protected]>
Co-authored-by: Andres Freund <[email protected]>
Co-authored-by: Thomas Munro <[email protected]>

docs: Reframe track_io_timing related docs as wait time

With AIO it does not make sense anymore to track the time for each individual
IO, as multiple IOs can be in-flight at the same time. Instead we now track
the time spent *waiting* for IOs.

This should be reflected in the docs. While, so far, we only do a subset of
reads, and no other operations, via AIO, describing the GUC and view columns
as measuring IO waits is accurate for synchronous and asynchronous IO.

Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/5dzyoduxlvfg55oqtjyjehez5uoq6hnwgzor4kkybkfdgkj7ag@rbi4gsmzaczk

bufmgr: Use AIO in StartReadBuffers()

This finally introduces the first actual use of AIO. StartReadBuffers() now
uses the AIO routines to issue IO.

As the implementation of StartReadBuffers() is also used by the functions for
reading individual blocks (StartReadBuffer() and through that
ReadBufferExtended()) this means all buffered read IO passes through the AIO
paths. However, as those are synchronous reads, actually performing the IO
asynchronously would be rarely beneficial. Instead such IOs are flagged to
always be executed synchronously. This way we don't have to duplicate a fair
bit of code.

When io_method=sync is used, the IO patterns generated after this change are
the same as before, i.e. actual reads are only issued in WaitReadBuffers() and
StartReadBuffers() may issue prefetch requests. This allows to bypass most of
the actual asynchronicity, which is important to make a change as big as this
less risky.

One thing worth calling out is that, if IO is actually executed
asynchronously, the precise meaning of what track_io_timing is measuring has
changed. Previously it tracked the time for each IO, but that does not make
sense when multiple IOs are executed concurrently. Now it only measures the
time actually spent waiting for IO. A subsequent commit will adjust the docs
for this.

While AIO is now actually used, the logic in read_stream.c will often prevent
using sufficiently many concurrent IOs. That will be addressed in the next
commit.

Reviewed-by: Noah Misch <[email protected]>
Reviewed-by: Nazir Bilal Yavuz <[email protected]>
Co-authored-by: Andres Freund <[email protected]>
Co-authored-by: Thomas Munro <[email protected]>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344 [email protected]
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m

bufmgr: Implement AIO read support

This commit implements the infrastructure to perform asynchronous reads into
the buffer pool.

To do so, it:

- Adds readv AIO callbacks for shared and local buffers

  It may be worth calling out that shared buffer completions may be run in a
  different backend than where the IO started.

- Adds an AIO wait reference to BufferDesc, to allow backends to wait for
  in-progress asynchronous IOs

- Adapts StartBufferIO(), WaitIO(), TerminateBufferIO(), and their localbuf.c
  equivalents, to be able to deal with AIO

- Moves the code to handle BM_PIN_COUNT_WAITER into a helper function, as it
  now also needs to be called on IO completion

As of this commit, nothing issues AIO on shared/local buffers. A future commit
will update StartReadBuffers() to do so.

Buffer reads executed through this infrastructure will report invalid page /
checksum errors / warnings differently than before:

In the error case the error message will cover all the blocks that were
included in the read, rather than just the reporting the first invalid
block. If more than one block is invalid, the error will include information
about the range of the read, the first invalid block and the number of invalid
pages, with a HINT towards the server log for per-block details.

For the warning case (i.e. zero_damaged_buffers) we would previously emit one
warning message for each buffer in a multi-block read. Now there is only a
single warning message for the entire read, again referring to the server log
for more details in case of multiple checksum failures within a single larger
read.

Reviewed-by: Noah Misch <[email protected]>
Reviewed-by: Melanie Plageman <[email protected]>
Reviewed-by: Nazir Bilal Yavuz <[email protected]>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344 [email protected]
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m

aio: Add WARNING result status

If an IO succeeds, but issues a warning, e.g. due to a page verification
failure with zero_damaged_pages, we want to issue that warning in the context
of the issuer of the IO, not the process that executes the completion (always
the case for worker).

It's already possible for a completion callback to report a custom error
message, we just didn't have a result status that allowed a user of AIO to
know that a warning should be emitted even though the IO request succeeded.

All that's needed for that is a dedicated PGAIO_RS_ value.

Previously there were not enough bits in PgAioResult.id for the new
value. Increase. While at that, add defines for the amount of bits and static
asserts to check that the widths are appropriate.

Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/20250329212929 [email protected]

Let caller of PageIsVerified() control ignore_checksum_failure

For AIO the completion of a read into shared buffers (i.e. verifying the page
including the checksum, updating the BufferDesc to reflect the IO) can happen
in a different backend than the backend that started the IO. As
ignore_checksum_failure can differ between backends, we need to allow the
caller of PageIsVerified() control whether to ignore checksum failures.

The commit leaves a gap in the PIV_* values, as an upcoming commit, which
depends on this commit, will add PIV_LOG_LOG, which better fits just after
PIV_LOG_WARNING.

Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/20250329212929 [email protected]

pgstat: Allow checksum errors to be reported in critical sections

For AIO we execute completion callbacks in critical sections (to ensure that
AIO can in the future be used for WAL, which in turn requires that we can call
completion callbacks in critical sections, to get the resources for WAL
io). To report checksum errors a backend now has to call
pgstat_prepare_report_checksum_failure(), before entering a critical section,
which guarantees the relevant pgstats entry is in shared memory, the relevant
DSM segment is mapped into the backend's memory and the address is known via a
PgStat_EntryRef.

Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/wkjj4p2rmkevutkwc6tewoovdqznj6c6nvjmvii4oo5wmbh5sr@retq7d6uqs4j

Add errhint_internal()

We have errmsg_internal(), errdetail_internal(), but not errhint_internal().

Sometimes it is useful to output a hint with already translated format
string (e.g. because there different messages depending on the condition). For
message/detail we do that with the _internal() variants, but we can't do that
with hint today.  It's possible to work around that that by using something
like
  str = psprintf(translated_format, args);
  ereport(...
          errhint("%s", str);
but that's not exactly pretty and makes it harder to avoid memory leaks.

Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/ym3dqpa4xcvoeknewcw63x77vnqdosbqcetjinb2zfoh65k55m@m4ozmwhr6lk6

Remove incidental md5() function use from test

Replace md5() with sha256() in tests introduced in 14ffaece0fb5, to
allow test to pass in OpenSSL FIPS mode.

Reported-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/3518736.1743307492@sss.pgh.pa.us

localbuf: Track pincount in BufferDesc as well

For AIO on temporary table buffers the AIO subsystem needs to be able to
ensure a pin on a buffer while AIO is going on, even if the IO issuing query
errors out. Tracking the buffer in LocalRefCount does not work, as it would
cause CheckForLocalBufferLeaks() to assert out.

Instead, also track the refcount in BufferDesc.state, not just
LocalRefCount. This also makes local buffers behave a bit more akin to shared
buffers.

Note that we still don't need locking, AIO completion callbacks for local
buffers are executed in the issuing session (i.e. nobody else has access to
the BufferDesc).

Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt

aio, bufmgr: Comment fixes/improvements

Some of these comments have been wrong for a while (12f3867f5534), some I
recently introduced (da7226993fd, 55b454d0e14). This includes an update to a
comment in FlushBuffer(), which will be copied in a future commit.

These changes seem big enough to be worth doing in separate commits.

Suggested-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/20250319212530 [email protected]

aio: Implement support for reads in smgr/md/fd

This implements the following:

1) An smgr AIO target, for AIO on smgr files. This should be usable not just
for md.c but also other SMGR implementation if we ever get them.
2) readv support in fd.c, which requires a small bit of infrastructure work in
fd.c
3) smgr.c and md.c support for readv

There still is nothing performing AIO, but as of this commit it would be
possible.

As part of this change FileGetRawDesc() actually ensures that the file is
opened - previously it was basically not usable. It's used to reopen a file in
IO workers.

Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344 [email protected]
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m

Fix mis-attribution of checksum failure stats to the wrong database

Checksum failure stats could be attributed to the wrong database in two cases:

- when a read of a shared relation encountered a checksum error , it would be
  attributed to the current database, instead of the "database" representing
  shared relations

- when using CREATE DATABASE ... STRATEGY WAL_LOG checksum errors in the
  source database would be attributed to the current database

The checksum stats reporting via PageIsVerifiedExtended(PIV_REPORT_STAT) does
not have access to the information about what database a page belongs to.

This fixes the issue by removing PIV_REPORT_STAT and delegating the
responsibility to report stats to the caller, which now can learn about the
number of stats via a new optional argument.

As this changes the signature of PageIsVerifiedExtended() and all callers
should adapt to the new signature, use the occasion to rename the function to
PageIsVerified() and remove the compatibility macro.

We could instead have fixed this by adding information about the database to
the args of PageIsVerified(), but there are soon-to-be-applied patches that
need to separate the stats reporting from the PageIsVerified() call
anyway. Those patches also include testing for the failure paths, something we
inexplicably have not had.

As there is no caller of pgstat_report_checksum_failure() left, remove it.

It'd be possible, but awkward to fix this in the back branches. We considered
doing the work not quite worth it, as mis-attributed stats should still elicit
concern. The emitted error messages do allow to attribute the errors
correctly.

Discussion: https://postgr.es/m/5tyic6epvdlmd6eddgelv47syg2b5cpwffjam54axp25xyq2ga@ptwkinxqo3az
Discussion: https://postgr.es/m/mglpvvbhighzuwudjxzu4br65qqcxsnyvio3nl4fbog3qknwhg@e4gt7npsohuz

amcheck: Add a GIN index to the CREATE INDEX CONCURRENTLY tests

The existing CREATE INDEX CONCURRENTLY tests checking only B-Tree, but
can be cheaply extended to also check GIN. This helps increasing test
coverage for GIN amcheck, especially related to handling concurrent page
splits and posting list trees.

This already helped to identify several issues during development of the
GIN amcheck support.

Author: Mark Dilger <[email protected]>
Reviewed-By: Tomas Vondra <[email protected]>
Reviewed-By: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/BC221A56-977C-418E-A1B8-9EFC881D80C5%40enterprisedb.com

amcheck: Add a test with GIN index on JSONB data

Extend the existing test of GIN checks to also include an index on JSONB
data, using the jsonb_path_ops opclass. This is a common enough usage of
GIN that it makes sense to have better test coverage for it.

Author: Mark Dilger <[email protected]>
Reviewed-By: Tomas Vondra <[email protected]>
Reviewed-By: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/BC221A56-977C-418E-A1B8-9EFC881D80C5%40enterprisedb.com

amcheck: Fix indentation in verify_gin.c

I forgot to reindent the code after a couple last-minute adjustments
just before committing 14ffaece0fb53fed8ddbc46d2b353e1c4834863a.

Discussion: https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru

Fix "‘static’ is not at beginning of declaration" warning

b98be8a2a2a used "const static" instead of "static const". We normally use the
latter form.

Discussion: https://postgr.es/m/z4mc2hzecahyq3paupfsouhuupmzmgum45md3k5my6bmo7gvn7@z5j26doqamqy

amcheck: Add gin_index_check() to verify GIN index

Adds a new function, validating two kinds of invariants on a GIN index:

- parent-child consistency: Paths in a GIN graph have to contain
  consistent keys. Tuples on parent pages consistently include tuples
  from child pages; parent tuples do not require any adjustments.

- balanced-tree / graph: Each internal page has at least one downlink,
  and can reference either only leaf pages or only internal pages.

The GIN verification is based on work by Grigory Kryachko, reworked by
Heikki Linnakangas and with various improvements by Andrey Borodin.
Investigation and fixes for multiple bugs by Kirill Reshke.

Author: Grigory Kryachko <[email protected]>
Author: Heikki Linnakangas <[email protected]>
Author: Andrey Borodin <[email protected]>
Reviewed-By: José Villanova <[email protected]>
Reviewed-By: Aleksander Alekseev <[email protected]>
Reviewed-By: Nikolay Samokhvalov <[email protected]>
Reviewed-By: Andres Freund <[email protected]>
Reviewed-By: Tomas Vondra <[email protected]>
Reviewed-By: Kirill Reshke <[email protected]>
Reviewed-By: Mark Dilger <[email protected]>
Reviewed-By: Peter Geoghegan <[email protected]>
Discussion: https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru

pgbench: Make set_random_seed() 64-bit everywhere.

Delete an intermediate variable, a redundant cast, a use of long and a
use of long long.  scanf() the seed directly into a uint64, now that we
can do that with SCNu64 from <inttypes.h>.

The previous coding was from pre-C99 times when %lld might not have been
there, so it read into an unsigned long.  Therefore behavior varied
by OS, and --random-seed would accept either 32 or 64 bit seeds.  Now
it's the same everywhere.

Author: Thomas Munro <[email protected]>
Discussion: https://postgr.es/m/b936d2fb-590d-49c3-a615-92c3a88c6c19%40eisentraut.org

amcheck: Move common routines into a separate module

Before performing checks on an index, we need to take some safety
measures that apply to all index AMs. This includes:

* verifying that the index can be checked - Only selected AMs are
supported by amcheck (right now only B-Tree). The index has to be
valid and not a temporary index from another session.

* changing (and then restoring) user's security context

* obtaining proper locks on the index (and table, if needed)

* discarding GUC changes from the index functions

Until now this was implemented in the B-Tree amcheck module, but it's
something every AM will have to do. So relocate the code into a new
module verify_common for reuse.

The shared steps are implemented by amcheck_lock_relation_and_check(),
receiving the AM-specific verification as a callback. Custom parameters
may be supplied using a pointer.

Author: Andrey Borodin <[email protected]>
Reviewed-By: José Villanova <[email protected]>
Reviewed-By: Aleksander Alekseev <[email protected]>
Reviewed-By: Nikolay Samokhvalov <[email protected]>
Reviewed-By: Andres Freund <[email protected]>
Reviewed-By: Tomas Vondra <[email protected]>
Reviewed-By: Mark Dilger <[email protected]>
Reviewed-By: Peter Geoghegan <[email protected]>
Reviewed-By: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru

Fix grammar in GIN README

Author: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/CALdSSPgu9uAhVYojQ0yjG%3Dq5MaqmiSLUJPhz%2B-u7cA6K6Mc9UA%40mail.gmail.com

Fix MERGE with DO NOTHING actions into a partitioned table.

ExecInitPartitionInfo() duplicates much of the logic in
ExecInitMerge(), except that it failed to handle DO NOTHING
actions. This would cause an "unknown action in MERGE WHEN clause"
error if a MERGE with any DO NOTHING actions attempted to insert into
a partition not already initialised by ExecInitModifyTable().

Bug: #18871
Reported-by: Alexander Lakhin <[email protected]>
Author: Tender Wang <[email protected]>
Reviewed-by: Gurjeet Singh <[email protected]>
Discussion: https://postgr.es/m/18871-b44e3c96de3bd2e8%40postgresql.org
Backpatch-through: 15

Use PRI?64 instead of "ll?" in format strings (continued).

Continuation of work started in commit 15a79c73, after initial trial.

Author: Thomas Munro <[email protected]>
Discussion: https://postgr.es/m/b936d2fb-590d-49c3-a615-92c3a88c6c19%40eisentraut.org

Matview statistics depend on matview data.

REFRESH MATERIALIZED VIEW replaces the storage, which resets
statistics, so statistics must be restored afterward.

If both statistics and data are being dumped for a materialized view,
add a dependency from the former to the latter. Defer the statistics
to SECTION_POST_DATA, and use RESTORE_PASS_POST_ACL.

Reported-by: Ashutosh Bapat <[email protected]>
Reviewed-by: Ashutosh Bapat <[email protected]>
Discussion: https://postgr.es/m/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com

Make group_similar_or_args() reorder clause list as little as possible

Currently, group_similar_or_args() permutes original positions of clauses
independently on whether it manages to find any groups of similar clauses.
While we are not providing any strict warranties on saving the original order
of OR-clauses, it is preferred that the original order be modified as little
as possible.

This commit changes the reordering algorithm of group_similar_or_args() in
the following way.  We reorder each group of similar clauses so that the
first item of the group stays in place, but all the other items are moved
after it.  So, if there are no similar clauses, the order of clauses stays
the same.  When there are some groups, only required reordering happens while
the rest of the clauses remain in their places.

Reported-by: Andrei Lepikhov <[email protected]>
Discussion: https://postgr.es/m/3ac7c436-81e1-4191-9caf-b0dd70b51511%40gmail.com
Reviewed-by: Pavel Borisov <[email protected]>
Reviewed-by: Andrei Lepikhov <[email protected]>
Reviewed-by: Alena Rybakina <[email protected]>

Optimize popcount functions with ARM SVE intrinsics.

This commit introduces SVE implementations of pg_popcount{32,64}.
Unlike the Neon versions, we need an additional configure-time
check to determine if the compiler supports SVE intrinsics, and we
need a runtime check to determine if the current CPU supports SVE
instructions. Our testing showed that the SVE implementations are
much faster for larger inputs and are comparable to the status
quo for smaller inputs.

Author: "[email protected]" <[email protected]>
Co-authored-by: "[email protected]" <[email protected]>
Co-authored-by: "Malladi, Rama" <[email protected]>
Reviewed-by: John Naylor <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com
Discussion: https://postgr.es/m/OSZPR01MB84990A9A02A3515C6E85A65B8B2A2%40OSZPR01MB8499.jpnprd01.prod.outlook.com

Revert "Tidy up locale thread safety in ECPG library."

This reverts commit 8e993bff5326b00ced137c837fce7cd1e0ecae14.

It causes various build failures on the buildfarm, to be investigated.

Discussion: https://postgr.es/m/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech

Optimize popcount functions with ARM Neon intrinsics.

This commit introduces Neon implementations of pg_popcount{32,64},
pg_popcount(), and pg_popcount_masked(). As in simd.h, we assume
that all available AArch64 hardware supports Neon, so we don't need
any new configure-time or runtime checks. Some compilers already
emit Neon instructions for these functions, but our hand-rolled
implementations for pg_popcount() and pg_popcount_masked()
performed better in testing, likely due to better instruction-level
parallelism.

Author: "[email protected]" <[email protected]>
Reviewed-by: John Naylor <[email protected]>
Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com

Fix crash if LockErrorCleanup() is called twice

The refactoring in commit 3c0fd64fec removed the clearing of
awaitedLock from LockErrorCleanup(). It's still needed, otherwise
LockErrorCleanup() during abort processing will try to update the
LOCALLOCK struct even after the lock has already been released. Put it
back.

Reported-by: Richard Guo <[email protected]>
Reported-by: Robins Tharakan <[email protected]>
Reported-by: Alexander Lakhin <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4_dNX1SzBmvFdoY-LxJh_4W_BjtVd5i008ihfU-wFF=eg@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/18832-38e5575b1bbd7277@postgresql.org
Discussion: https://www.postgresql.org/message-id/e11a30e5-c0d8-491d-8546-3a1b50c10ad4@gmail.com

Rename TRY_POPCNT_FAST to TRY_POPCNT_X86_64.

This macro protects x86_64-specific code, and a subsequent commit
will introduce AArch64-specific versions of that code. To prevent
confusion, let's rename it to clearly indicate that it's for
x86_64. We should likely move this code to its own file (perhaps
merging it with the AVX-512 popcount code), but that is left as a
future exercise.

Reviewed-by: "[email protected]" <[email protected]>
Reviewed-by: John Naylor <[email protected]>
Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com

Fix timestamp overflow in UUIDv7 implementation.

The uuidv7_interval() function previously converted a shifted
microsecond-precision timestamp (64-bit integer) to another 64-bit
integer representing a timestamp with nanosecond precision. This
conversion caused overflow for dates beyond the year 2262. The
millisecond and sub-millisecond parts were then extracted from this
nanosecond-precision timestamp and stored in UUIDv7 values.

With this commit, the millisecond and sub-millisecond parts are stored
directly into the UUIDv7 value without being converted back to a
nanosecond precision timestamp. Following RFC 9562, the timestamp is
stored as an unsigned integer, enabling support for dates up to the
year 10889.

Reported and fixed by Andrey Borodin, with cosmetic changes and
regression tests by me.

Reported-by: Andrey Borodin <[email protected]>
Author: Andrey Borodin <[email protected]>
Discussion: https://postgr.es/m/96DEC2D9-659A-40E8-B7BA-AF5D162A9E21@yandex-team.ru

Tidy up locale thread safety in ECPG library.

Remove setlocale() and _configthreadlocal() as fallback strategy on
systems that don't have uselocale(), where ECPG tries to control
LC_NUMERIC formatting on input and output of floating point numbers.  It
was probably broken on some systems (NetBSD), and the code was also
quite messy and complicated, with obsolete configure tests (Windows).
It was also arguably broken, or at least had unstated environmental
requirements, if pgtypeslib code was called directly.

Instead, introduce PG_C_LOCALE to refer to the "C" locale as a locale_t
value.  It maps to the special constant LC_C_LOCALE when defined by libc
(macOS, NetBSD), or otherwise uses a process-lifetime locale_t that is
allocated on first use, just as ECPG previously did itself.  The new
replacement might be more widely useful.  Then change the float parsing
and printing code to pass that to _l() functions where appropriate.

Unfortunately the portability of those functions is a bit complicated.
First, many obvious and useful _l() functions are missing from POSIX,
though most standard libraries define some of them anyway.  Second,
although the thread-safe save/restore technique can be used to replace
the missing ones, Windows and NetBSD refused to implement standard
uselocale().  They might have a point: "wide scope" uselocale() is hard
to combine with other code and error-prone, especially in library code.
Luckily they have the  _l() functions we want so far anyway.  So we have
to be prepared for both ways of doing things:

1.  In ECPG, use strtod_l() for parsing, and supply a port.h replacement
using uselocale() over a limited scope if missing.

2.  Inside our own snprintf.c, use three different approaches to format
floats.  For frontend code, call libc's snprintf_l(), or wrap libc's
snprintf() in uselocale() if it's missing.  For backend code, snprintf.c
can keep assuming that the global locale's LC_NUMERIC is "C" and call
libc's snprintf() without change, for now.

(It might eventually be possible to call our in-tree Ryū routines to
display floats in snprintf.c, given the C-locale-always remit of our
in-tree snprintf(), but this patch doesn't risk changing anything that
complicated.)

Author: Thomas Munro <[email protected]>
Reviewed-by: Peter Eisentraut <[email protected]>
Reviewed-by: Tristan Partin <[email protected]>
Reviewed-by: Heikki Linnakangas <[email protected]>
Discussion: https://postgr.es/m/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech

Cast result of i64abs() back to int64

Without the cast, the return type could be long or long long,
depending on what int64 is underneath. This doesn't affect code
correctness, but it could result in format-mismatch warnings when
attempting to printf such values using PRId64.

Reported-by: Thomas Munro <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CA+hUKGJc4s+Wyb3EFOQNN9VVK+Qv40r2LK41o9PkS9ThxviTvQ@mail.gmail.com

pg_overexplain: Use PG_MODULE_MAGIC_EXT.

I committed this contrib module just after Tom committed
55527368bd07248e91e3d37a782bf66b76f06865; adjust it to match.

Author: Man Zeng <[email protected]>
Discussion: http://postgr.es/m/174313513707.60295.16516085012903412705 [email protected]

pg_overexplain: Call previous hooks as appropriate.

It makes no sense to remember the previous values of the hook variables
and then never bother calling those functions. Thanks to Andrei for
spotting my goof.

Author: Andrei Lepikhov <[email protected]>
Discussion: http://postgr.es/m/41a344e3-ffb1-4296-8ba7-801f1e9642e5@gmail.com

Add support for not-null constraints on virtual generated columns

This was left out of the original patch for virtual generated columns
(commit 83ea6c54025).

This just involves a bit of extra work in the executor to expand the
generation expressions and run a "IS NOT NULL" test against them.

There is also a bit of work to make sure that not-null constraints are
checked during a table rewrite.

Author: jian he <[email protected]>
Reviewed-by: Xuneng Zhou <[email protected]>
Reviewed-by: Navneet Kumar <[email protected]>
Reviewed-by: Álvaro Herrera <[email protected]>
Discussion: https://postgr.es/m/CACJufxHArQysbDkWFmvK+D1TPHQWWTxWN15cMuUaTYX3xhQXgg@mail.gmail.com

Modernize some code a bit

Modernize code in ExecRelCheck() and ExecConstraints() a bit,
preparing the way for some new code.

Co-authored-by: jian he <[email protected]>
Reviewed-by: Xuneng Zhou <[email protected]>
Reviewed-by: Navneet Kumar <[email protected]>
Reviewed-by: Álvaro Herrera <[email protected]>
Discussion: https://postgr.es/m/CACJufxHArQysbDkWFmvK+D1TPHQWWTxWN15cMuUaTYX3xhQXgg@mail.gmail.com

Rename a node field for clarity

Rename ResultRelInfo.ri_ConstraintExprs to ri_CheckConstraintExprs.
This reflects its specific purpose better and avoids confusion with
adjacent fields with similar but distinct purposes.

Discussion: https://postgr.es/m/CACJufxHArQysbDkWFmvK+D1TPHQWWTxWN15cMuUaTYX3xhQXgg@mail.gmail.com

pg_createsubscriber: Add '--all' option.

The '--all' option indicates that the tool queries the source server
(publisher) for all databases and creates subscriptions on the target
server (subscriber) for databases with matching names. Without this user
needs to explicitly specify all databases by using -d option for each
database.

This simplifies converting a physical standby to a logical subscriber,
particularly during upgrades.

The options '--database', '--publication', '--subscription', and
'--replication-slot' cannot be used when '--all' is specified.

Author: Shubham Khanna <[email protected]>
Reviewed-by: vignesh C <[email protected]>
Reviewed-by: Ashutosh Bapat <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Reviewed-by: Hayato Kuroda <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Reviewed-by: Peter Smith <[email protected]>
Reviewed-by: Shlok Kyal <[email protected]>
Discussion: https://postgr.es/m/CAHv8RjKhA=_h5vAbozzJ1Opnv=KXYQHQ-fJyaMfqfRqPpnC2bA@mail.gmail.com

Use thread-safe strftime_l() instead of strftime().

This removes some setlocale() calls and a lot of commentary about how
dangerous that is. strftime_l() is from POSIX 2008, and on Windows we
use _wcsftime_l().

While here, adjust error message for strftime_l() failure: it does not
in practice set errno (even though POSIX says it could), so no %m.

Author: Thomas Munro <[email protected]>
Reviewed-by: Heikki Linnakangas <[email protected]>
Reviewed-by: Peter Eisentraut <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGJqVe0%2BPv9dvC9dSums_PXxGo9SWcxYAMBguWJUGbWz-A%40mail.gmail.com

Stablize tests added in 3abe9dc188.

The problem is that after the ALTER SUBSCRIPTION tap_sub SET PUBLICATION
command, we didn't wait for the new walsender to start on the publisher.
Immediately after ALTER, we performed Insert and expected it to replicate.
However, the replication could start from a point after the INSERT location,
and as the subscription isn't copying initial data, we could miss such an
Insert.

The fix is to wait for connection to be established between publisher and
subscriber before starting DML operations that are expected to replicate.

As per CI.

Reported-by: Andres Freund <[email protected]>
Author: Hayato Kuroda <[email protected]>
Discussion: https://postgr.es/m/CALDaNm2ms1deM5EYNLFEfESv_Kw=Y4AiTB0LP=qGS-UpFwGbPg@mail.gmail.com

Fix guc_malloc calls for consistency and OOM checks

check_createrole_self_grant and check_synchronized_standby_slots
were allocating memory on a LOG elevel without checking if the
allocation succeeded or not, which would have led to a segfault
on allocation failure.

On top of that, a number of callsites were using the ERROR level,
relying on erroring out rather than returning false to allow the
GUC machinery handle it gracefully.  Other callsites used WARNING
instead of LOG.  While neither being not wrong, this changes all
check_ functions do it consistently with LOG.

init_custom_variable gets a promoted elevel to FATAL to keep
the guc_malloc error handling in line with the rest of the
error handling in that function which already call FATAL.  If
we encounter an OOM in this callsite there is no graceful
handling to be had, better to error out hard.

Backpatch the fix to check_createrole_self_grant down to v16
and the fix to check_synchronized_standby_slots down to v17
where they were introduced.

Author: Daniel Gustafsson <[email protected]>
Reported-by: Nikita <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Bug: #18845
Discussion: https://postgr.es/m/18845-582c6e10247377ec@postgresql.org
Backpatch-through: 16

Use streaming read I/O in heap amcheck

Instead of directly invoking ReadBuffer() for each unskippable block in
the heap relation, verify_heapam() now uses the read stream API to
acquire the next buffer to check for corruption.

Author: Matheus Alcantara <[email protected]>
Co-authored-by: Melanie Plageman <[email protected]>
Reviewed-by: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: jian he <[email protected]>
Discussion: https://postgr.es/m/flat/CAFY6G8eLyz7%2BsccegZYFj%3D5tAUR-GZ9uEq4Ch5gvwKqUwb_hCA%40mail.gmail.com

Prevent assertion failure in contrib/pg_freespacemap.

Applying pg_freespacemap() to a relation lacking storage (such as a
view) caused an assertion failure, although there was no ill effect
in non-assert builds. Add an error check for that case.

Bug: #18866
Reported-by: Robins Tharakan <[email protected]>
Author: Tender Wang <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Discussion: https://postgr.es/m/18866-d68926d0f1c72d44@postgresql.org
Backpatch-through: 13

Avoid mixing designated and non-designated field initializers.

As revised by commit 9324c8c58, PG_MODULE_MAGIC constructed a
struct initializer containing both designated fields and a
non-designated "0". That's okay in C, but not in C++, with
the result that extensions written in C++ failed to compile.
Change it to use only designated field initializers.

Author: Yurii Rashkovskii <[email protected]>
Discussion: https://postgr.es/m/CAG=VW14mctsR543gpzLCuJ9JgJqwa=ptmBfGvxEjs+k8Jf7-Bg@mail.gmail.com

psql: Fix incorrect equality comparison

Commit 1a759c83278 contained an incorrect equality comparison
which was discovered by Coverity.

Reported-by: Ranier Vilela <[email protected]>
Discussion: https://postgr.es/m/CAEudQApfAWzLo+oSuy2byXktdr7R8KJC_ACT5VV8fontrL35Pw@mail.gmail.com

pg_overexplain: Filter out actual row count from test result.

Per buildfarm, these are not stable. In particular, 1/8 is sometimes
0.12 and sometimes 0.13.

Remove the query_id_squash_values GUC

Commit 62d712ecfd94 introduced the capability to calculate the same
queryId for queries with different lengths of constants in a list for an
IN clause. This behavior was originally enabled with a GUC
query_id_squash_values. After a discussion about the value of such a
GUC, it was decided to back out of the use of a GUC and make the
squashing behavior the only available option.

Author: Sami Imseih <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Discussion: https://postgr.es/m/CA+q6zcVTK-3C-8NWV1oY2NZrvtnMCDqnyYYyk1T7WMUG65MeOQ@mail.gmail.com

Expand test a bit

Make pg_constraint output in inherit test show the convalidated column
as well. This shows the interaction between convalidated and
conenforced.

This is extracted from a larger patch so that this reformatting isn't
distracting there.

Author: Amul Sul <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com

Provide thread-safe pg_localeconv_r().

This involves four different implementation strategies:

1.  For Windows, we now require _configthreadlocale() to be available
and work (commit f1da075d9a0), and the documentation says that the
object returned by localeconv() is in thread-local memory.

2.  For glibc, we translate to nl_langinfo_l() calls, because it
offers the same information that way as an extension, and that API is
thread-safe.

3.  For macOS/*BSD, use localeconv_l(), which is thread-safe.

4.  For everything else, use uselocale() to set the locale for the
thread, and use a big ugly lock to defend against the returned object
being concurrently clobbered.  In practice this currently means only
Solaris.

The new call is used in pg_locale.c, replacing calls to setlocale() and
localeconv().

Author: Thomas Munro <[email protected]>
Reviewed-by: Heikki Linnakangas <[email protected]>
Reviewed-by: Peter Eisentraut <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGJqVe0%2BPv9dvC9dSums_PXxGo9SWcxYAMBguWJUGbWz-A%40mail.gmail.com

Simplify syntax for ALTER TABLE ALTER CONSTRAINT NO INHERIT

Commit d45597f72fe5 introduced the ability to change a not-null
constraint from NO INHERIT to INHERIT and vice versa, but we included
the SET noise word in the syntax for it. The SET turns out not to be
necessary and goes against what the SQL standard says for other ALTER
TABLE subcommands, so remove it.

This changes the way this command is processed for constraint types
other than not-null, so there are some error message changes.

Reviewed-by: Peter Eisentraut <[email protected]>
Reviewed-by: Suraj Kharage <[email protected]>
Discussion: https://postgr.es/m/202503251602 [email protected]

libpq: Add TAP tests for service files and names

This commit adds a set of regression tests that checks various patterns
with service names and service files, with:
- Service file with no contents, used as default for PGSERVICEFILE to
prevent any lookups at the HOME directory of an environment where the
test is run.
- Service file with valid service name and its section.
- Service file at the root of PGSYSCONFDIR, named pg_service.conf.
- Missing service file.
- Service name defined as a connection parameter or as PGSERVICE.

Note that PGSYSCONFDIR is set to always point at a temporary directory
created by the test, so as we never try to look at SYSCONFDIR.

This set of tests has come up as a useful independent addition while
discussing a patch that adds an equivalent of PGSERVICEFILE as a
connection parameter as there have never been any tests for service
files and service names. Torsten Foertsch and Ryo Kanbayashi have
provided a basic implementation, that I have expanded to what is
introduced in this commit.

Author: Torsten Foertsch <[email protected]>
Author: Ryo Kanbayashi <[email protected]>
Author: Michael Paquier <[email protected]>
Discussion: https://postgr.es/m/CAKkG4_nCjx3a_F3gyXHSPWxD8Sd8URaM89wey7fG_9g7KBkOCQ@mail.gmail.com

Optimize Query jumble

f31aad9b0 adjusted query jumbling so it no longer ignores NULL nodes
during the jumble. This added some overhead. Here we tune a few
things to make jumbling faster again. This makes jumbling perform
similar or even slightly faster than prior to that change.

Author: David Rowley <[email protected]>
Reviewed-by: Michael Paquier <[email protected]>
Discussion: https://postgr.es/m/CAApHDvreP04nhTKuYsPw0F-YN+4nr4f=L72SPeFb81jfv+2c7w@mail.gmail.com

Fix query jumbling to account for NULL nodes

Previously NULL nodes were ignored.  This could cause issues where the
computed query ID could match for queries where fields that are next to
each other in their Node struct where one field was NULL and the other
non-NULL.  For example, the Query struct had distinctClause and sortClause
next to each other.  If someone wrote;

SELECT DISTINCT c1 FROM t;

and then;

SELECT c1 FROM t ORDER BY c1;

these would produce the same query ID since, in the first query, we
ignored the NULL sortClause and appended the jumble bytes for the
distictClause.  In the latter query, since we did nothing for the NULL
distinctClause then jumble the non-NULL sortClause, and since the node
representation stored is the same in both cases, the query IDs were
identical.

Here we fix this by always accounting for NULL nodes by recording that
we saw a NULL in the jumble buffer.  This fixes the issue as the order that
the NULL is recorded isn't the same in the above two queries.

Author: Bykov Ivan <[email protected]>
Author: Michael Paquier <[email protected]>
Author: David Rowley <[email protected]>
Discussion: https://postgr.es/m/aafce7966e234372b2ba876c0193f1e9%40localhost.localdomain

doc: Correct description of values used in FSM for indexes

The implementation of FSM for indexes is simpler than heap, where 0 is
used to track if a page is in-use and (BLCKSZ - 1) if a page is free.
One comment in indexfsm.c and one description in the documentation of
pg_freespacemap were incorrect about that.

Author: Alex Friedman <[email protected]>
Discussion: https://postgr.es/m/71eef655-c192-453f-ac45-2772fec2cb04@gmail.com
Backpatch-through: 13

aio: Add io_method=io_uring

Performing AIO using io_uring can be considerably faster than
io_method=worker, particularly when lots of small IOs are issued, as
a) the context-switch overhead for worker based AIO becomes more significant
b) the number of IO workers can become limiting

io_uring, however, is linux specific and requires an additional compile-time
dependency (liburing).

This implementation is fairly simple and there are substantial optimization
opportunities.

The description of the existing AIO_IO_COMPLETION wait event is updated to
make the difference between it and the new AIO_IO_URING_EXECUTION clearer.

Reviewed-by: Noah Misch <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344 [email protected]
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m

aio: Add liburing dependency

Will be used in a subsequent commit, to implement io_method=io_uring. Kept
separate for easier review.

Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt

doc: Mention possible ephemeral discrepancies in pg_stat_activity

Ephemeral inconsistencies across multiple attributes of pg_stat_activity
can exist as the system is designed to be efficient with a low overhead.
This question is raised by users from time to time based on the data
read in the view, so let's add a note in the docs about this
possibility.

Author: Alex Friedman <[email protected]>
Reviewed-by: Sami Imseih <[email protected]>
Discussion: https://postgr.es/m/8a275154-a654-44b0-ab37-197802f04c7b@gmail.com

aio: Rename pgaio_io_prep_* to pgaio_io_start_*

The old naming pattern (mirroring liburing's naming) was inconsistent with
the (not yet introduced) callers. It seems better to get rid of the
inconsistency now than to grow more users of the odd naming.

Reported-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/20250326001915 [email protected]

aio: Pass result of local callbacks to ->report_return

Otherwise the results of e.g. temp table buffer verification errors will not
reach bufmgr.c. Obviously that's not right. Found while expanding the tests
for invalid buffer contents.

Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/20250326001915 [email protected]

aio: Be more paranoid about interrupts

As reported by Noah, it's possible, although practically very unlikely, that
interrupts could be processed in between pgaio_io_reopen() and
pgaio_io_perform_synchronously(). Prevent that by explicitly holding
interrupts.

It also seems good to add an assertion to pgaio_io_before_prep() to ensure
that interrupts are held, as otherwise FDs referenced by the IO could be
closed during interrupt processing. All code in the aio series currently runs
the code with interrupts held, but it seems better to be paranoid.

Reviewed-by: Noah Misch <[email protected]>
Reported-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/20250324002939 [email protected]

pg_overexplain: SET jit=off when running tests.

Per buildfarm.

Fix oversights in commit 8d5ceb113e3f7ddb627bd40b26438a9d2fa05512

It added bogus whitespace at the end of a line in the documentation.
It should not have done that.

The pg_overexplain tests must SET debug_parallel_query = false,
not just RESET debug_parallel_query, or we get failures on test
machines that make debug_parallel_query = true the defualt.

pg_overexplain: Additional EXPLAIN options for debugging.

There's a fair amount of information in the Plan and PlanState trees
that isn't printed by any existing EXPLAIN option. This means that,
when working on the planner, it's often necessary to rely on facilities
such as debug_print_plan, which produce excessively voluminous
output. Hence, use the new EXPLAIN extension facilities to implement
EXPLAIN (DEBUG) and EXPLAIN (RANGE_TABLE) as extensions to the core
EXPLAIN facility.

A great deal more could be done here, and the specific choices about
what to print and how are definitely arguable, but this is at least
a starting point for discussion and a jumping-off point for possible
future improvements.

Reviewed-by: Sami Imseih <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Reviweed-by: Andrei Lepikhov <[email protected]> (who didn't like it)
Discussion: http://postgr.es/m/CA+TgmoZfvQUBWQ2P8iO30jywhfEAKyNzMZSR+uc2xr9PZBw6eQ@mail.gmail.com

Keep the decompressed filter in brin_bloom_union

The brin_bloom_union() function combines two BRIN summaries, by merging
one filter into the other. With bloom, we have to decompress the filters
first, but the function failed to update the summary to store the merged
filter. As a consequence, the index may be missing some of the data, and
return false negatives.

This issue exists since BRIN bloom indexes were introduced in Postgres
14, but at that point the union function was called only when two
sessions happened to summarize a range concurrently, which is rare. It
got much easier to hit in 17, as parallel builds use the union function
to merge summaries built by workers.

Fixed by storing a pointer to the decompressed filter, and freeing the
original one. Free the second filter too, if it was decompressed. The
freeing is not strictly necessary, because the union is called in
short-lived contexts, but it's tidy.

Backpatch to 14, where BRIN bloom indexes were introduced.

Reported by Arseniy Mukhin, investigation and fix by me.

Reported-by: Arseniy Mukhin
Discussion: https://postgr.es/m/18855-1cf1c8bcc22150e6%40postgresql.org
Backpatch-through: 14

Use PG_MODULE_MAGIC_EXT in our installable shared libraries.

It seems potentially useful to label our shared libraries with version
information, now that a facility exists for retrieving that.  This
patch labels them with the PG_VERSION string.  There was some
discussion about using semantic versioning conventions, but that
doesn't seem terribly helpful for modules with no SQL-level presence;
and for those that do have SQL objects, we typically expect them
to support multiple revisions of the SQL definitions, so it'd still
not be very helpful.

I did not label any of src/test/modules/.  It seems unnecessary since
we don't install those, and besides there ought to be someplace that
still provides test coverage for the original PG_MODULE_MAGIC macro.

Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/dd4d1b59-d0fe-49d5-b28f-1e463b68fa32@gmail.com

Introduce PG_MODULE_MAGIC_EXT macro.

This macro allows dynamically loaded shared libraries (modules) to
provide a wired-in module name and version, and possibly other
compile-time-constant fields in future. This information can be
retrieved with the new pg_get_loaded_modules() function.

This feature is expected to be particularly useful for modules
that do not have any exposed SQL functionality and thus are
not associated with a SQL-level extension object. But even for
modules that do belong to extensions, being able to verify the
actual code version can be useful.

Author: Andrei Lepikhov <[email protected]>
Reviewed-by: Yurii Rashkovskii <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/dd4d1b59-d0fe-49d5-b28f-1e463b68fa32@gmail.com

Move GSSAPI includes into its own header

Due to a conflict in macro names on Windows between <wincrypt.h>
and <openssl/ssl.h> these headers need to be included using a
predictable pattern with an undef to handle that. The GSSAPI
header <gssapi.h> does include <wincrypt.h> which cause problems
with compiling PostgreSQL using MSVC when OpenSSL and GSSAPI are
both enabled in the tree. Rather than fixing piecemeal for each
file including gssapi headers, move the the includes and undef
to a new file which should be used to centralize the logic.

This patch is a reworked version of a patch by Imran Zaheer
proposed earlier in the thread. Once this has proven effective
in master we should look at backporting this as the problem
exist at least since v16.

Author: Daniel Gustafsson <[email protected]>
Co-authored-by: Imran Zaheer <[email protected]>
Reported-by: Dave Page <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: vignesh C <[email protected]>
Discussion: https://postgr.es/m/20240708173204 [email protected]

psql: Make test robust against locale variations

The test committed in 1a759c83278 was prone to failing when using
locales with a different decimal separator. Since the test value
isn't the important part, change to using an integer instead.

Author: Daniel Gustafsson <[email protected]>
Reported-by: Pavel Stehule <[email protected]>
Reviewed-by: Pavel Stehule <[email protected]>
Discussion: https://postgr.es/m/CAFj8pRDE=7uW7QP4rg-OQLE2i-puYsUUt+eHE-L6_b_J9w=eWg@mail.gmail.com

dblink: SCRAM authentication pass-through

This enables SCRAM authentication for dblink (using dblink_fdw) when
connecting to a foreign server without having to store a plain-text
password on user mapping options

This uses the same approach as it was implemented for postgres_fdw in
commit 761c79508e7. (It also contains the equivalent of the
subsequent fixes 76563f88cfb and d2028e9bbc1.)

Author: Matheus Alcantara <[email protected]>
Reviewed-by: Jacob Champion <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CAFY6G8ercA1KES%3DE_0__R9QCTR805TTyYr1No8qF8ZxmMg8z2Q%40mail.gmail.com

Add support for gamma() and lgamma() functions.

These are useful general-purpose math functions which are included in
POSIX and C99, and are commonly included in other math libraries, so
expose them as SQL-callable functions.

Author: Dean Rasheed <[email protected]>
Reviewed-by: Stepan Neretin <[email protected]>
Reviewed-by: Peter Eisentraut <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Reviewed-by: Dmitry Koval <[email protected]>
Reviewed-by: Alexandra Wang <[email protected]>
Discussion: https://postgr.es/m/CAEZATCXpGyfjXCirFk9au+FvM0y2Ah+2-0WSJx7MO368ysNUPA@mail.gmail.com

Fix integer-overflow problem in scram_SaltedPassword()

Setting the iteration count for SCRAM secret generation to INT_MAX
will cause an infinite loop in scram_SaltedPassword() due to integer
overflow, as the loop uses the "i <= iterations" comparison. To fix,
use "i < iterations" instead.

Back-patch to v16 where the user-settable GUC scram_iterations has
been added.

Author: Kevin K Biju <[email protected]>
Reviewed-by: Richard Guo <[email protected]>
Reviewed-by: Michael Paquier <[email protected]>
Discussion: https://postgr.es/m/CAM45KeEMm8hnxdTOxA98qhfZ9CzGDdgy3mxgJmy0c+2WwjA6Zg@mail.gmail.com

Use relation name instead of OID in query jumbling for RangeTblEntry

custom_query_jumble (introduced in 5ac462e2b7ac as a node field
attribute) is now assigned to the expanded reference name "eref" of
RangeTblEntry, adding in the query jumble computation the non-qualified
aliased relation name, without the list of column names.  The relation
OID is removed from the query jumbling.

The effects of this change can be seen in the tests added by
3430215fe35f, where pg_stat_statements (PGSS) entries are now grouped
using the relation name, ignoring the relation search_path may point at.
For example, these two relations are different, but are now grouped in a
single PGSS entry as they are assigned the same query ID:
CREATE TABLE foo1.tab (a int);
CREATE TABLE foo2.tab (b int);
SET search_path = 'foo1';
SELECT count(*) FROM tab;
SET search_path = 'foo2';
SELECT count(*) FROM tab;
SELECT count(*) FROM foo1.tab;
SELECT count(*) FROM foo2.tab;
SELECT query, calls FROM pg_stat_statements WHERE query ~ 'FROM tab';
          query           | calls
--------------------------+-------
SELECT count(*) FROM tab |     4
(1 row)

It is still possible to use an alias in the FROM clause to split these.
This behavior is useful for relations re-created with the same name,
where queries based on such relations would be grouped in the same
PGSS entry.  For permanent schemas, it should not really matter in
practice.  The main benefit is for workloads that use a lot of temporary
relations, which are usually re-created with the same name continuously.
These can be a heavy source of bloat in PGSS depending on the workload.
Such entries can now be grouped together, improving the user experience.

The original idea from Christoph Berg used catalog lookups to find
temporary relations, something that the query jumble has never done, and
it could cause some performance regressions.  The idea to use
RangeTblEntry.eref and the relation name, applying the same rules for
all relations, temporary and not temporary, has been proposed by Tom
Lane.  The documentation additions have been suggested by Sami Imseih.

Author: Michael Paquier <[email protected]>
Co-authored-by: Sami Imseih <[email protected]>
Reviewed-by: Christoph Berg <[email protected]>
Reviewed-by: Lukas Fittl <[email protected]>
Reviewed-by: Sami Imseih <[email protected]>
Discussion: https://postgr.es/m/[email protected]

postgres_fdw: Fix tests on some Windows variants

The tests introduced by commit 76563f88cfb only work when Unix-domain
sockets are available. This is optional on Windows, and buildfarm
member drongo runs without them. To fix, skip the test if Unix-domain
sockets are not enabled.

Add pg_dump --with-{schema|data|statistics} options.

By adding the positive variants of options, in addition to the
negative variants that already exist, users can be explicit about what
pg_dump should produce.

Discussion: https://postgr.es/m/bd0513e4b1ea2b2f2d06f02720c6579711cb62a6 [email protected]
Reviewed-by: Corey Huinker <[email protected]>
Reviewed-by: Andres Freund <[email protected]>

Fix two issues with custom_query_jumble in gen_node_support.pl

A node field marked with custom_query_jumble and query_jumble_ignore
would generate some code of a custom routine. The script is changed so
as custom_query_jumble behaves like the other options in this case,
query_jumble_ignore taking priority, with no code generated.

A comment related to the code generated for node types was misplaced.

Thinkos introduced in 5ac462e2b7ac.

Reported-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/1324036.1742945060@sss.pgh.pa.us

Fix order of -I switches for building pg_regress.o.

We need the -I switch for libpq_srcdir to come before any -I switches
injected by configure.  Otherwise there is a risk of pulling in a
mismatched version of libpq_fe.h from someplace like
/usr/local/include, if the platform has another Postgres version
installed there.  This evidently accounts for today's buildfarm
failures on "anaconda".

In principle the -I switch for src/port/ is at similar hazard, and has
been for a very long time.  But the only .h files we keep there are
pg_config_paths.h and pthread-win32.h, neither of which get installed
on Unix-ish systems, so the odds of picking up a conflicting header
seem pretty small.  That doubtless accounts for the lack of prior
reports.

Back-patch to v17 where pg_regress acquired a build dependency on
libpq_fe.h.  We could go back further to fix the hazard for src/port/
in older branches, but it seems unlikely to be worth troubling over.

Reported-by: Nathan Bossart <[email protected]>
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/Z-MhRzoc7t-nPUQG@nathan
Backpatch-through: 17

pg_stat_statements: Add more tests with temp tables and namespaces

These tests provide coverage for RangeTblEntry and how query jumbling
works with search_path, as well as the case where relations are
re-created, generating a different query ID as the relation OID is used
in the computation.

A patch is under discussion to switch to a different approach based on
the relation name, and there was no test coverage for this area,
including how queries are currently grouped with search_path. This is
useful to track how the situation changes between HEAD and any patches
proposed.

Christoph has proposed the test with ON COMMIT DROP temporary tables,
and I have written the second part.

Author: Christoph Berg <[email protected]>
Author: Michael Paquier <[email protected]>
Discussion: https://postgr.es/m/[email protected]

pg_upgrade: Add --swap for faster file transfer.

This new option instructs pg_upgrade to move the data directories
from the old cluster to the new cluster and then to replace the
catalog files with those generated for the new cluster.  This mode
can outperform --link, --clone, --copy, and --copy-file-range,
especially on clusters with many relations.

However, this mode creates many garbage files in the old cluster,
which can prolong the file synchronization step if
--sync-method=syncfs is used.  To handle that, we recommend using
--sync-method=fsync with this mode, and pg_upgrade internally uses
"initdb --sync-only --no-sync-data-files" for file synchronization.
pg_upgrade will synchronize the catalog files as they are
transferred.  We assume that the database files transferred from
the old cluster were synchronized prior to upgrade.

This mode also complicates reverting to the old cluster, so we
recommend restoring from backup upon failure during or after file
transfer.  We did consider teaching pg_upgrade how to generate a
revert script for such failures, but we decided against it due to
the rarity of failing during file transfer, the complexity of
generating the script, and the potential for misusing the script.

The new mode is limited to clusters located in the same file
system.  With some effort, we could probably support upgrades
between different file systems, but this mode is unlikely to offer
much benefit if we have to copy the files across file system
boundaries.

It is also limited to upgrades from version 10 or newer.  There are
a few known obstacles for using swap mode to upgrade from older
versions.  For example, the visibility map format changed in v9.6,
and the sequence tuple format changed in v10.  In fact, swap mode
omits the --sequence-data option in its uses of pg_dump and instead
reuses the old cluster's sequence data files.  While teaching swap
mode to deal with these kinds of changes is surely possible (and we
may have to deal with similar problems in the future, anyway), it
doesn't seem worth the effort to support upgrades from
long-unsupported versions.

Reviewed-by: Greg Sabino Mullane <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan

pg_dump: Add --sequence-data.

This new option instructs pg_dump to dump sequence data when the
--no-data, --schema-only, or --statistics-only option is specified.
This was originally considered for commit a7e5457db8, but it was
left out at that time because there was no known use-case. A
follow-up commit will use this to optimize pg_upgrade's file
transfer step.

Reviewed-by: Robert Haas <[email protected]>
Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan

initdb: Add --no-sync-data-files.

This new option instructs initdb to skip synchronizing any files
in database directories, the database directories themselves, and
the tablespace directories, i.e., everything in the base/
subdirectory and any other tablespace directories.  Other files,
such as those in pg_wal/ and pg_xact/, will still be synchronized
unless --no-sync is also specified.  --no-sync-data-files is
primarily intended for internal use by tools that separately ensure
the skipped files are synchronized to disk.  A follow-up commit
will use this to help optimize pg_upgrade's file transfer step.

The --sync-method=fsync implementation of this option makes use of
a new exclude_dir parameter for walkdir().  When not NULL,
exclude_dir specifies a directory to skip processing.  The
--sync-method=syncfs implementation of this option just skips
synchronizing the non-default tablespace directories.  This means
that initdb will still synchronize some or all of the database
files, but there's not much we can do about that.

Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan

Stats: use schemaname/relname instead of regclass.

For import and export, use schemaname/relname rather than
regclass.

This is more natural during export, fits with the other arguments
better, and it gives better control over error handling in case we
need to downgrade more errors to warnings.

Also, use text for the argument types for schemaname, relname, and
attname so that casts to "name" are not required.

Author: Corey Huinker <[email protected]>
Discussion: https://postgr.es/m/CADkLM=ceOSsx_=oe73QQ-BxUFR2Cwqum7-UP_fPe22DBY0NerA@mail.gmail.com

Minor doc update for commit 99f8f3fbbc.

Author: Corey Huinker <[email protected]>

psql: Make default \watch interval configurable

The default interval for \watch to wait between executing queries,
when executed without a specified interval, was hardcoded to two
seconds. This adds the new variable WATCH_INTERVAL which is used
to set the default interval, making it configurable for the user.
This makes \watch the first command which has a user configurable
default setting.

Author: Daniel Gustafsson <[email protected]>
Reviewed-by: Heikki Linnakangas <[email protected]>
Reviewed-by: Michael Paquier <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Masahiro Ikeda <[email protected]>
Reviewed-by: Laurenz Albe <[email protected]>
Reviewed-by: Greg Sabino Mullane <[email protected]>
Reviewed-by: Ashutosh Bapat <[email protected]>
Discussion: https://postgr.es/m/B2FD26B4-8F64-4552-A603-5CC3DF1C7103@yesql.se

pg_basebackup: Add missing PQclear in error path

This adds a missing PQclear in the error path of StreamLogicalLog, a
fix in the same vein as e889422d98e with an equivalent low impact.

Author: Steven Niu <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Discussion: https://postgr.es/m/c4b1c627-a3e4-4347-a670-1e28a43ce0eb@gmail.com

refactor: Pass relation OID instead of Relation to createForeignKeyCheckTriggers()

Currently, createForeignKeyCheckTriggers() takes a Relation type as
its first argument, but it doesn't use that argument directly.
Instead, it fetches the relation OID by calling RelationGetRelid().
Therefore, it would be more consistent with other functions (e.g.,
createForeignKeyCheckTriggers()) to pass the relation OID directly
instead of the whole Relation.

Author: Amul Sul <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com

refactor: Split ATExecAlterConstraintInternal()

Split ATExecAlterConstraintInternal() into two functions:
ATExecAlterConstrDeferrability() and
ATExecAlterConstrInheritability(). This simplifies the code and
avoids unnecessary confusion caused by recursive code, which isn't
needed for ATExecAlterConstrInheritability().

(This also takes over the changes in commit 64224a834ce, as the new
AlterConstrDeferrabilityRecurse() is essentially the old
ATExecAlterChildConstr().)

Author: Amul Sul <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com

refactor: Move some code that updates pg_constraint to a separate function

This extracts common/duplicate code for different ALTER CONSTRAINT
variants into a common function. We plan to add more variants that
would use the same code.

Author: Amul Sul <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com

Small fixes for Add ALTER TABLE ... ALTER CONSTRAINT ... SET [NO] INHERIT

Small fixes for commit f4e53e10b6c: Add missing calls to
InvokeObjectPostAlterHook() and also CacheInvalidateRelcache().  The
former change could have a user-visible effect.  The latter omission
might have caused other bugs, but it is not clear whether one actually
existed.  With these changes, the code is now more consistent with
similar ALTER CONSTRAINT variants, especially the ones that set the
deferrability.

Reviewed-by: Álvaro Herrera <[email protected]>
Discussion: https://postgr.es/m/CAF1DzPVfOW6Kk=7SSh7LbneQDJWh=PbJrEC_Wkzc24tHOyQWGg@mail.gmail.com

postgres_fdw: Remove redundant check in semijoin_target_ok()

If a var belongs to the innerrel of the joinrel, it's not possible that
it belongs to the outerrel. This commit removes the redundant check from
the if-clause but keeps it as an assertion.

Discussion: https://postgr.es/m/flat/CAHewXN=8aW4hd_W71F7Ua4+_w0=bppuvvTEBFBF6G0NuSXLwUw@mail.gmail.com
Author: Tender Wang <[email protected]>
Reviewed-by: Alexander Pyhalov <[email protected]>
Backpatch-through: 17

libpq: Deprecate pg_int64.

Previously we used pg_int64 in three function prototypes in libpq.  It
was added by commit 461ef73f to expose the platform-dependent type used
for int64 in the C89 era.  As of commit 962da900 it is defined as
standard int64_t, and the dust seems to have settled.

Let's just use int64_t directly in these three client-facing functions
instead of (yet) another name.  We've required C99 and thus <stdint.h>
since PostgreSQL 12, C89 and C++98 compilers are long gone, and client
applications very likely use standard types for their own 64-bit needs.
This also cleans up the obscure placement of a new #include <stdint.h>
directive in postgres_ext.h, required for the new definition.  The
typedef was hiding in there for historical reasons, but it doesn't fit
postgres_ext.h's own description of its purpose and there is no evidence
of client applications including postgres_ext.h directly to see it.

Keep a typedef marked deprecated for backward compatibility, but move it
into libpq-fe.h where it was used.

Reviewed-by: Peter Eisentraut <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGKn_EkNNGMY5RzMcKP%2Ba6urT4JF%3DCPhw_zHtQwjvX6P2g%40mail.gmail.com

Generalize index support in network support function

The network (inet) support functions currently only supported a
hardcoded btree operator family. With the generalized compare type
facility, we can generalize this to support any operator family from
any index type that supports the required operators.

Author: Mark Dilger <[email protected]>
Co-authored-by: Peter Eisentraut <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com

Add support for custom_query_jumble as a node field attribute

This option gives the possibility for query jumble to define a custom
routine for the field of a Node, extending support for
custom_query_jumble as a node field attribute.  When dealing with
complex node structures, this can be simpler than having to enforce a
custom function across a full node.

Custom functions need to be defined in queryjumblefuncs.c, named as
_jumble${node}_${field}(), and use in input the JumbleState, the node
and its field.  The field is not really required if we have the Node,
but it makes custom implementations somewhat easier to think about.  The
code generated by gen_node_support.pl uses a macro called
JUMBLE_CUSTOM(), hiding the internals of the logic inside
queryjumblefuncs.c.

This will be used by an upcoming patch manipulating adding a custom
routine into a field of RangeTblEntry, but this facility can become
useful in more cases.

Reviewed-by: Christoph Berg <[email protected]>
Discussion: https://postgr.es/m/[email protected]

Remove 'additional' pointer from TupleHashEntryData.

Reduces memory required for hash aggregation by avoiding an allocation
and a pointer in the TupleHashEntryData structure. That structure is
used for all buckets, whether occupied or not, so the savings is
substantial.

Discussion: https://postgr.es/m/AApHDvpN4v3t_sdz4dvrv1Fx_ZPw=twSnxuTEytRYP7LFz5K9A@mail.gmail.com
Reviewed-by: David Rowley <[email protected]>

Add ExecCopySlotMinimalTupleExtra().

Allows an "extra" argument that allocates extra memory at the end of
the MinimalTuple. This is important for callers that need to store
additional data, but do not want to perform an additional allocation.

Suggested-by: David Rowley <[email protected]>
Discussion: https://postgr.es/m/CAApHDvppeqw2pNM-+ahBOJwq2QmC0hOAGsmCpC89QVmEoOvsdg@mail.gmail.com

Create accessor functions for TupleHashEntry.

Refactor for upcoming optimizations.

Reviewed-by: David Rowley <[email protected]>
Discussion: https://postgr.es/m/1cc3b400a0e8eead18ff967436fa9e42c0c14cfb [email protected]