Fix race in SSI interaction with empty btrees.
authorThomas Munro <[email protected]>
Mon, 3 Jul 2023 04:16:27 +0000 (16:16 +1200)
committerThomas Munro <[email protected]>
Mon, 3 Jul 2023 21:07:31 +0000 (09:07 +1200)
When predicate-locking btrees, we have a special case for completely
empty btrees, since there is no page to lock.  This was racy, because,
without buffer lock held, a matching key could be inserted between the
_bt_search() and the PredicateLockRelation() calls.

Fix, by rechecking _bt_search() after taking the relation-level SIREAD
lock, if using SERIALIZABLE isolation and an empty btree is discovered.

Back-patch to all supported releases.  Fixes one aspect of bug #17949.

Reported-by: Artem Anisimov <[email protected]>
Reviewed-by: Dmitry Dolgov <[email protected]>
Reviewed-by: Heikki Linnakangas <[email protected]>
Discussion: https://postgr.es/m/17949-a0f17035294a55e2%40postgresql.org

src/backend/access/nbtree/nbtsearch.c

index 7e05e58676895baee69c0fc31b46fb671f756d1c..3230b3b894066703467b6f20f3259706e48c631a 100644 (file)
@@ -17,6 +17,7 @@
 
 #include "access/nbtree.h"
 #include "access/relscan.h"
+#include "access/xact.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/predicate.h"
@@ -1382,22 +1383,34 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
        {
                /*
                 * We only get here if the index is completely empty. Lock relation
-                * because nothing finer to lock exists.
+                * because nothing finer to lock exists.  Without a buffer lock, it's
+                * possible for another transaction to insert data between
+                * _bt_search() and PredicateLockRelation().  We have to try again
+                * after taking the relation-level predicate lock, to close a narrow
+                * window where we wouldn't scan concurrently inserted tuples, but the
+                * writer wouldn't see our predicate lock.
                 */
-               PredicateLockRelation(rel, scan->xs_snapshot);
-
-               /*
-                * mark parallel scan as done, so that all the workers can finish
-                * their scan
-                */
-               _bt_parallel_done(scan);
-               BTScanPosInvalidate(so->currPos);
+               if (IsolationIsSerializable())
+               {
+                       PredicateLockRelation(rel, scan->xs_snapshot);
+                       stack = _bt_search(rel, NULL, &inskey, &buf, BT_READ,
+                                                          scan->xs_snapshot);
+                       _bt_freestack(stack);
+               }
 
-               return false;
+               if (!BufferIsValid(buf))
+               {
+                       /*
+                        * Mark parallel scan as done, so that all the workers can finish
+                        * their scan.
+                        */
+                       _bt_parallel_done(scan);
+                       BTScanPosInvalidate(so->currPos);
+                       return false;
+               }
        }
-       else
-               PredicateLockPage(rel, BufferGetBlockNumber(buf),
-                                                 scan->xs_snapshot);
+
+       PredicateLockPage(rel, BufferGetBlockNumber(buf), scan->xs_snapshot);
 
        _bt_initialize_more_data(so, dir);