From 58565d78db67cfb4752ef3d3e315aa9ce3c9d946 Mon Sep 17 00:00:00 2001
From: Simon Riggs <simon@2ndQuadrant.com>
Date: Thu, 21 Jan 2010 00:53:58 +0000
Subject: [PATCH] Better internal documentation of locking for Hot Standby
 conflict resolution. Discuss the reasons for the lock type we hold on
 ProcArrayLock while deriving the conflict list. Cover the idea of false
 positive conflicts and seemingly strange effects on snapshot derivation.

---
 src/backend/storage/ipc/procarray.c | 48 ++++++++++++++++++++++++++---
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 41b794196b..1793783cab 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -37,7 +37,7 @@
  *
  *
  * IDENTIFICATION
- *	  $PostgreSQL: pgsql/src/backend/storage/ipc/procarray.c,v 1.57 2010/01/16 17:17:26 tgl Exp $
+ *	  $PostgreSQL: pgsql/src/backend/storage/ipc/procarray.c,v 1.58 2010/01/21 00:53:58 sriggs Exp $
  *
  *-------------------------------------------------------------------------
  */
@@ -1623,17 +1623,57 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
 /*
  * GetConflictingVirtualXIDs -- returns an array of currently active VXIDs.
  *
- * The array is palloc'd and is terminated with an invalid VXID.
- *
  * Usage is limited to conflict resolution during recovery on standby servers.
  * limitXmin is supplied as either latestRemovedXid, or InvalidTransactionId
  * in cases where we cannot accurately determine a value for latestRemovedXid.
- * If limitXmin is InvalidTransactionId then we know that the very
+ *
+ * If limitXmin is InvalidTransactionId then we are forced to assume that
  * latest xid that might have caused a cleanup record will be
  * latestCompletedXid, so we set limitXmin to be latestCompletedXid instead.
  * We then skip any backends with xmin > limitXmin. This means that
  * cleanup records don't conflict with some recent snapshots.
  *
+ * The reason for using latestCompletedxid is that we aren't certain which
+ * of the xids in KnownAssignedXids are actually FATAL errors that did
+ * not write abort records. In almost every case they won't be, but we
+ * don't know that for certain. So we need to conflict with all current
+ * snapshots whose xmin is less than latestCompletedXid to be safe. This
+ * causes false positives in our assessment of which vxids conflict.
+ *
+ * By using exclusive lock we prevent new snapshots from being taken while
+ * we work out which snapshots to conflict with. This protects those new
+ * snapshots from also being included in our conflict list. 
+ *
+ * After the lock is released, we allow snapshots again. It is possible
+ * that we arrive at a snapshot that is identical to one that we just
+ * decided we should conflict with. This a case of false positives, not an
+ * actual problem.
+ * 
+ * There are two cases: (1) if we were correct in using latestCompletedXid
+ * then that means that all xids in the snapshot lower than that are FATAL
+ * errors, so not xids that ever commit. We can make no visibility errors
+ * if we allow such xids into the snapshot. (2) if we erred on the side of
+ * caution and in fact the latestRemovedXid should have been earlier than
+ * latestCompletedXid then we conflicted with a snapshot needlessly. Taking
+ * another identical snapshot is OK, because the earlier conflicted
+ * snapshot was a false positive.
+ * 
+ * In either case, a snapshot taken after conflict assessment will still be
+ * valid and non-conflicting even if an identical snapshot that existed
+ * before conflict assessment was assessed as conflicting.
+ * 
+ * If we allowed concurrent snapshots while we were deciding who to
+ * conflict with we would need to include all concurrent snapshotters in
+ * the conflict list as well. We'd have difficulty in working out exactly
+ * who that was, so it is happier for all concerned if we take an exclusive
+ * lock. Notice that we only hold that lock for as long as it takes to
+ * make the conflict list, not for the whole duration of the conflict
+ * resolution.
+ * 
+ * It also means that users waiting for a snapshot is a good thing, since
+ * it is more likely that they will live longer after having waited. So it
+ * is a benefit, not an oversight that we use exclusive lock here.
+ *
  * We replace InvalidTransactionId with latestCompletedXid here because
  * this is the most convenient place to do that, while we hold ProcArrayLock.
  * The originator of the cleanup record wanted to avoid checking the value of