From 62f54780fdcfafc1ccf694a32bc6ab24f2508222 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Mon, 12 May 2014 15:48:44 +0300 Subject: [PATCH] Update readme. Fix locking in GetOldestSnapshotLSN --- src/backend/access/transam/README | 143 +++++++++++----------------- src/backend/storage/ipc/procarray.c | 6 +- 2 files changed, 62 insertions(+), 87 deletions(-) diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README index 3a32471e95..77cab9fcc2 100644 --- a/src/backend/access/transam/README +++ b/src/backend/access/transam/README @@ -244,41 +244,37 @@ transaction Y as committed, then snapshot A must consider transaction Y as committed". What we actually enforce is strict serialization of commits and rollbacks -with snapshot-taking: we do not allow any transaction to exit the set of -running transactions while a snapshot is being taken. (This rule is -stronger than necessary for consistency, but is relatively simple to -enforce, and it assists with some other issues as explained below.) The -implementation of this is that GetSnapshotData takes the ProcArrayLock in -shared mode (so that multiple backends can take snapshots in parallel), -but ProcArrayEndTransaction must take the ProcArrayLock in exclusive mode -while clearing MyPgXact->xid at transaction end (either commit or abort). - -ProcArrayEndTransaction also holds the lock while advancing the shared -latestCompletedXid variable. This allows GetSnapshotData to use -latestCompletedXid + 1 as xmax for its snapshot: there can be no -transaction >= this xid value that the snapshot needs to consider as -completed. - -In short, then, the rule is that no transaction may exit the set of -currently-running transactions between the time we fetch latestCompletedXid -and the time we finish building our snapshot. However, this restriction -only applies to transactions that have an XID --- read-only transactions -can end without acquiring ProcArrayLock, since they don't affect anyone -else's snapshot nor latestCompletedXid. - -Transaction start, per se, doesn't have any interlocking with these -considerations, since we no longer assign an XID immediately at transaction -start. But when we do decide to allocate an XID, GetNewTransactionId must -store the new XID into the shared ProcArray before releasing XidGenLock. -This ensures that all top-level XIDs <= latestCompletedXid are either -present in the ProcArray, or not running anymore. (This guarantee doesn't -apply to subtransaction XIDs, because of the possibility that there's not -room for them in the subxid array; instead we guarantee that they are -present or the overflow flag is set.) If a backend released XidGenLock -before storing its XID into MyPgXact, then it would be possible for another -backend to allocate and commit a later XID, causing latestCompletedXid to -pass the first backend's XID, before that value became visible in the -ProcArray. That would break GetOldestXmin, as discussed below. +with snapshot-taking. We use the LSNs generated by Write-Ahead-Logging as +a convenient monotonically-increasing counter, to serialize commits with +snapshots. Each commit is naturally assigned an LSN; it's the LSN of the +commit WAL record. Snapshots are also represented by an LSN; all commits +with a commit record's LSN <= the snapshot's LSN are considered as visible +to the snapshot. Therefore acquiring a snapshot is a matter of reading the +current WAL insert location. + +That means that we need to be able to look up the commit LSN of each +transaction, by XID. For that purpose, we store the commit LSN of each +transaction in the commit log (clog). However, storing the LSN in the +clog is not atomic with writing the WAL record, hence it's possible that +another backend takes a snapshot right after the commit, but sees the +transaction as in-progress in the clog, even though it wrote the commit +record before the snapshot was taken. To close that race condition, just +before writing the commit WAL record, the committing backend sets the +clog entry to a special value, COMMITLSN_COMMITTING. It is replaced with +the commit record's LSN after the WAL record has been written. When a +backend looks up a transaction's commit LSN in the clog and sees +COMMITLSN_COMMITTING, it must wait for the commit to finish, by calling +XactLockTableWait(). That's quite heavy-weight, but the race should +happen rarely. + +So, a snapshot is simply an LSN, such that all transactions that committed +before that LSN are visible, and everything later is still considered +as in-progress. However, to avoid consulting the clog every time the +visibility of a tuple is checked, we also record a lower and upper bound of +the XIDs considered visible by the snapshot, in SnapshotData. When a snapshot +is taken, xmin is set to the current nextXid value; any transaction that +begins after the snapshot is surely still running. The xmin is tracked +lazily in shared memory, by AdvanceGlobalXmin(). We allow GetNewTransactionId to store the XID into MyPgXact->xid (or the subxid array) without taking ProcArrayLock. This was once necessary to @@ -290,43 +286,29 @@ once, rather than assume they can read it multiple times and get the same answer each time. (Use volatile-qualified pointers when doing this, to ensure that the C compiler does exactly what you tell it to.) -Another important activity that uses the shared ProcArray is GetOldestXmin, -which must determine a lower bound for the oldest xmin of any active MVCC -snapshot, system-wide. Each individual backend advertises the smallest -xmin of its own snapshots in MyPgXact->xmin, or zero if it currently has no +Another important activity that uses the shared ProcArray is GetOldestSnapshot +which must determine a lower bound for the oldest of any active MVCC +snapshots, system-wide. Each individual backend advertises the earliest +of its own snapshots in MyPgXact->snapshotlsn, or zero if it currently has no live snapshots (eg, if it's between transactions or hasn't yet set a -snapshot for a new transaction). GetOldestXmin takes the MIN() of the -valid xmin fields. It does this with only shared lock on ProcArrayLock, -which means there is a potential race condition against other backends -doing GetSnapshotData concurrently: we must be certain that a concurrent -backend that is about to set its xmin does not compute an xmin less than -what GetOldestXmin returns. We ensure that by including all the active -XIDs into the MIN() calculation, along with the valid xmins. The rule that -transactions can't exit without taking exclusive ProcArrayLock ensures that -concurrent holders of shared ProcArrayLock will compute the same minimum of -currently-active XIDs: no xact, in particular not the oldest, can exit -while we hold shared ProcArrayLock. So GetOldestXmin's view of the minimum -active XID will be the same as that of any concurrent GetSnapshotData, and -so it can't produce an overestimate. If there is no active transaction at -all, GetOldestXmin returns latestCompletedXid + 1, which is a lower bound -for the xmin that might be computed by concurrent or later GetSnapshotData -calls. (We know that no XID less than this could be about to appear in -the ProcArray, because of the XidGenLock interlock discussed above.) - -GetSnapshotData also performs an oldest-xmin calculation (which had better -match GetOldestXmin's) and stores that into RecentGlobalXmin, which is used -for some tuple age cutoff checks where a fresh call of GetOldestXmin seems -too expensive. Note that while it is certain that two concurrent -executions of GetSnapshotData will compute the same xmin for their own -snapshots, as argued above, it is not certain that they will arrive at the -same estimate of RecentGlobalXmin. This is because we allow XID-less -transactions to clear their MyPgXact->xmin asynchronously (without taking -ProcArrayLock), so one execution might see what had been the oldest xmin, -and another not. This is OK since RecentGlobalXmin need only be a valid -lower bound. As noted above, we are already assuming that fetch/store -of the xid fields is atomic, so assuming it for xmin as well is no extra -risk. - +snapshot for a new transaction). GetOldestSnapshot takes the MIN() of the +snapshots. + +For freezing tuples, vacuum needs to know the oldest XID that is still +considered running by any active transaction. That is, the oldest XID still +considered running by the oldest active snapshot, as returned by +GetOldestSnapshotLSN(). This value is somewhat expensive to calculate, so +the most recently calculated value is kept in shared memory +(SharedVariableCache->recentXmin), and is recalculated lazily by +AdvanceRecentGlobalXmin() function. AdvanceRecentGlobalXmin() first scans +the proc array, and makes note of the oldest active XID. That XID - 1 will +become the new xmin. It then waits until all currently active snapshots have +finished. Any snapshot that begins later will see the xmin as finished, so +after all the active snapshots have finished, xmin will be visible to +everyone. However, AdvanceRecentGlobalXmin() does not actually block waiting +for anything; instead it contains a state machine that advances if possible, +when AdvanceRecentGlobalXmin() is called. AdvanceRecentGlobalXmin() is +called periodically by the WAL writer, so that it doesn't get very stale. pg_clog and pg_subtrans ----------------------- @@ -340,21 +322,10 @@ from disk. They also allow information to be permanent across server restarts. pg_clog records the commit status for each transaction that has been assigned an XID. A transaction can be in progress, committed, aborted, or -"sub-committed". This last state means that it's a subtransaction that's no -longer running, but its parent has not updated its state yet. It is not -necessary to update a subtransaction's transaction status to subcommit, so we -can just defer it until main transaction commit. The main role of marking -transactions as sub-committed is to provide an atomic commit protocol when -transaction status is spread across multiple clog pages. As a result, whenever -transaction status spreads across multiple pages we must use a two-phase commit -protocol: the first phase is to mark the subtransactions as sub-committed, then -we mark the top level transaction and all its subtransactions committed (in -that order). Thus, subtransactions that have not aborted appear as in-progress -even when they have already finished, and the subcommit status appears as a -very short transitory state during main transaction commit. Subtransaction -abort is always marked in clog as soon as it occurs. When the transaction -status all fit in a single CLOG page, we atomically mark them all as committed -without bothering with the intermediate sub-commit state. +"committing". For committed transactions, the clog stores the commit WAL +record's LSN. This last state means that the transaction is just about to +write its commit WAL record, or just did so, but it hasn't yet updated the +clog with the record's LSN. Savepoints are implemented using subtransactions. A subtransaction is a transaction inside a transaction; its commit or abort status is not only diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index f254095f21..15a433be0d 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -662,7 +662,11 @@ GetOldestSnapshotLSN(Relation rel, bool ignoreVacuum) result = GetXLogInsertRecPtr(); - LWLockAcquire(ProcArrayLock, LW_SHARED); + /* + * Take an exclusive lock to ensure that no-one is in the process of + * taking a snapshot while we scan the array. + */ + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); for (index = 0; index < arrayP->numProcs; index++) { int pgprocno = arrayP->pgprocnos[index]; -- 2.39.5