Increase the number of fast-path lock slots

Replace the fixed-size array of fast-path locks with arrays, sized on
startup based on max_locks_per_transaction. This allows using fast-path
locking for workloads that need more locks.

The fast-path locking introduced in 9.2 allowed each backend to acquire
a small number (16) of weak relation locks cheaply. If a backend needs
to hold more locks, it has to insert them into the shared lock table.
This is considerably more expensive, and may be subject to contention
(especially on many-core systems).

The limit of 16 fast-path locks was always rather low, because we have
to lock all relations - not just tables, but also indexes, views, etc.
For planning we need to lock all relations that might be used in the
plan, not just those that actually get used in the final plan. So even
with rather simple queries and schemas, we often need significantly more
than 16 locks.

As partitioning gets used more widely, and the number of partitions
increases, this limit is trivial to hit. Complex queries may easily use
hundreds or even thousands of locks. For workloads doing a lot of I/O
this is not noticeable, but for workloads accessing only data in RAM,
the access to the shared lock table may be a serious issue.

This commit removes the hard-coded limit of the number of fast-path
locks. Instead, the size of the fast-path arrays is calculated at
startup, and can be set much higher than the original 16-lock limit.
The overall fast-path locking protocol remains unchanged.

The variable-sized fast-path arrays can no longer be part of PGPROC, but
are allocated as a separate chunk of shared memory and then references
from the PGPROC entries.

The fast-path slots are organized as a 16-way set associative cache. You
can imagine it as a hash table of 16-slot "groups". Each relation is
mapped to exactly one group using hash(relid), and the group is then
processed using linear search, just like the original fast-path cache.
With only 16 entries this is cheap, with good locality.

Treating this as a simple hash table with open addressing would not be
efficient, especially once the hash table gets almost full. The usual
remedy is to grow the table, but we can't do that here easily. The
access would also be more random, with worse locality.

The fast-path arrays are sized using the max_locks_per_transaction GUC.
We try to have enough capacity for the number of locks specified in the
GUC, using the traditional 2^n formula, with an upper limit of 1024 lock
groups (i.e. 16k locks). The default value of max_locks_per_transaction
is 64, which means those instances will have 64 fast-path slots.

The main purpose of the max_locks_per_transaction GUC is to size the
shared lock table. It is often set to the "average" number of locks
needed by backends, with some backends using significantly more locks.
This should not be a major issue, however. Some backens may have to
insert locks into the shared lock table, but there can't be too many of
them, limiting the contention.

The only solution is to increase the GUC, even if the shared lock table
already has sufficient capacity. That is not free, especially in terms
of memory usage (the shared lock table entries are fairly large). It
should only happen on machines with plenty of memory, though.

In the future we may consider a separate GUC for the number of fast-path
slots, but let's try without one first.

Reviewed-by: Robert Haas, Jakub Wartak
Discussion: https://postgr.es/m/510b887e-c0ce-4a0c-a17a-2c6abb8d9a5c@enterprisedb.com
This commit is contained in:
Tomas Vondra 2024-09-21 20:06:49 +02:00
parent b524974106
commit c4d5cb71d2
9 changed files with 212 additions and 27 deletions

View File

@ -309,6 +309,8 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
InitializeMaxBackends();
InitializeFastPathLocks();
CreateSharedMemoryAndSemaphores();
/*

View File

@ -903,6 +903,11 @@ PostmasterMain(int argc, char *argv[])
*/
InitializeMaxBackends();
/*
* Calculate the size of the PGPROC fast-path lock arrays.
*/
InitializeFastPathLocks();
/*
* Give preloaded libraries a chance to request additional shared memory.
*/

View File

@ -178,6 +178,12 @@ AttachSharedMemoryStructs(void)
Assert(MyProc != NULL);
Assert(IsUnderPostmaster);
/*
* In EXEC_BACKEND mode, backends don't inherit the number of fast-path
* groups we calculated before setting the shmem up, so recalculate it.
*/
InitializeFastPathLocks();
CreateOrAttachShmemStructs();
/*

View File

@ -166,8 +166,13 @@ typedef struct TwoPhaseLockRecord
* might be higher than the real number if another backend has transferred
* our locks to the primary lock table, but it can never be lower than the
* real value, since only we can acquire locks on our own behalf.
*
* XXX Allocate a static array of the maximum size. We could use a pointer
* and then allocate just the right size to save a couple kB, but then we
* would have to initialize that, while for the static array that happens
* automatically. Doesn't seem worth the extra complexity.
*/
static int FastPathLocalUseCount = 0;
static int FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND_MAX];
/*
* Flag to indicate if the relation extension lock is held by this backend.
@ -184,23 +189,68 @@ static int FastPathLocalUseCount = 0;
*/
static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
/*
* Number of fast-path locks per backend - size of the arrays in PGPROC.
* This is set only once during start, before initializing shared memory,
* and remains constant after that.
*
* We set the limit based on max_locks_per_transaction GUC, because that's
* the best information about expected number of locks per backend we have.
* See InitializeFastPathLocks() for details.
*/
int FastPathLockGroupsPerBackend = 0;
/*
* Macros to calculate the fast-path group and index for a relation.
*
* The formula is a simple hash function, designed to spread the OIDs a bit,
* so that even contiguous values end up in different groups. In most cases
* there will be gaps anyway, but the multiplication should help a bit.
*
* The selected constant (49157) is a prime not too close to 2^k, and it's
* small enough to not cause overflows (in 64-bit).
*/
#define FAST_PATH_REL_GROUP(rel) \
(((uint64) (rel) * 49157) % FastPathLockGroupsPerBackend)
/*
* Given the group/slot indexes, calculate the slot index in the whole array
* of fast-path lock slots.
*/
#define FAST_PATH_SLOT(group, index) \
(AssertMacro(((group) >= 0) && ((group) < FastPathLockGroupsPerBackend)), \
AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_GROUP)), \
((group) * FP_LOCK_SLOTS_PER_GROUP + (index)))
/*
* Given a slot index (into the whole per-backend array), calculated using
* the FAST_PATH_SLOT macro, split it into group and index (in the group).
*/
#define FAST_PATH_GROUP(index) \
(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
((index) / FP_LOCK_SLOTS_PER_GROUP))
#define FAST_PATH_INDEX(index) \
(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
((index) % FP_LOCK_SLOTS_PER_GROUP))
/* Macros for manipulating proc->fpLockBits */
#define FAST_PATH_BITS_PER_SLOT 3
#define FAST_PATH_LOCKNUMBER_OFFSET 1
#define FAST_PATH_MASK ((1 << FAST_PATH_BITS_PER_SLOT) - 1)
#define FAST_PATH_BITS(proc, n) (proc)->fpLockBits[FAST_PATH_GROUP(n)]
#define FAST_PATH_GET_BITS(proc, n) \
(((proc)->fpLockBits >> (FAST_PATH_BITS_PER_SLOT * n)) & FAST_PATH_MASK)
((FAST_PATH_BITS(proc, n) >> (FAST_PATH_BITS_PER_SLOT * FAST_PATH_INDEX(n))) & FAST_PATH_MASK)
#define FAST_PATH_BIT_POSITION(n, l) \
(AssertMacro((l) >= FAST_PATH_LOCKNUMBER_OFFSET), \
AssertMacro((l) < FAST_PATH_BITS_PER_SLOT+FAST_PATH_LOCKNUMBER_OFFSET), \
AssertMacro((n) < FP_LOCK_SLOTS_PER_BACKEND), \
((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (n)))
((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (FAST_PATH_INDEX(n))))
#define FAST_PATH_SET_LOCKMODE(proc, n, l) \
(proc)->fpLockBits |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
FAST_PATH_BITS(proc, n) |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
#define FAST_PATH_CLEAR_LOCKMODE(proc, n, l) \
(proc)->fpLockBits &= ~(UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l))
FAST_PATH_BITS(proc, n) &= ~(UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l))
#define FAST_PATH_CHECK_LOCKMODE(proc, n, l) \
((proc)->fpLockBits & (UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)))
(FAST_PATH_BITS(proc, n) & (UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)))
/*
* The fast-path lock mechanism is concerned only with relation locks on
@ -926,7 +976,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
* for now we don't worry about that case either.
*/
if (EligibleForRelationFastPath(locktag, lockmode) &&
FastPathLocalUseCount < FP_LOCK_SLOTS_PER_BACKEND)
FastPathLocalUseCounts[FAST_PATH_REL_GROUP(locktag->locktag_field2)] < FP_LOCK_SLOTS_PER_GROUP)
{
uint32 fasthashcode = FastPathStrongLockHashPartition(hashcode);
bool acquired;
@ -2065,7 +2115,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
/* Attempt fast release of any lock eligible for the fast path. */
if (EligibleForRelationFastPath(locktag, lockmode) &&
FastPathLocalUseCount > 0)
FastPathLocalUseCounts[FAST_PATH_REL_GROUP(locktag->locktag_field2)] > 0)
{
bool released;
@ -2633,12 +2683,18 @@ LockReassignOwner(LOCALLOCK *locallock, ResourceOwner parent)
static bool
FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
{
uint32 f;
uint32 i;
uint32 unused_slot = FP_LOCK_SLOTS_PER_BACKEND;
/* fast-path group the lock belongs to */
uint32 group = FAST_PATH_REL_GROUP(relid);
/* Scan for existing entry for this relid, remembering empty slot. */
for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
{
/* index into the whole per-backend array */
uint32 f = FAST_PATH_SLOT(group, i);
if (FAST_PATH_GET_BITS(MyProc, f) == 0)
unused_slot = f;
else if (MyProc->fpRelId[f] == relid)
@ -2654,7 +2710,7 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
{
MyProc->fpRelId[unused_slot] = relid;
FAST_PATH_SET_LOCKMODE(MyProc, unused_slot, lockmode);
++FastPathLocalUseCount;
++FastPathLocalUseCounts[group];
return true;
}
@ -2670,12 +2726,18 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
static bool
FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
{
uint32 f;
uint32 i;
bool result = false;
FastPathLocalUseCount = 0;
for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
/* fast-path group the lock belongs to */
uint32 group = FAST_PATH_REL_GROUP(relid);
FastPathLocalUseCounts[group] = 0;
for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
{
/* index into the whole per-backend array */
uint32 f = FAST_PATH_SLOT(group, i);
if (MyProc->fpRelId[f] == relid
&& FAST_PATH_CHECK_LOCKMODE(MyProc, f, lockmode))
{
@ -2685,7 +2747,7 @@ FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
/* we continue iterating so as to update FastPathLocalUseCount */
}
if (FAST_PATH_GET_BITS(MyProc, f) != 0)
++FastPathLocalUseCount;
++FastPathLocalUseCounts[group];
}
return result;
}
@ -2714,7 +2776,8 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
for (i = 0; i < ProcGlobal->allProcCount; i++)
{
PGPROC *proc = &ProcGlobal->allProcs[i];
uint32 f;
uint32 j,
group;
LWLockAcquire(&proc->fpInfoLock, LW_EXCLUSIVE);
@ -2739,10 +2802,16 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
continue;
}
for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
/* fast-path group the lock belongs to */
group = FAST_PATH_REL_GROUP(relid);
for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
{
uint32 lockmode;
/* index into the whole per-backend array */
uint32 f = FAST_PATH_SLOT(group, j);
/* Look for an allocated slot matching the given relid. */
if (relid != proc->fpRelId[f] || FAST_PATH_GET_BITS(proc, f) == 0)
continue;
@ -2793,14 +2862,21 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
PROCLOCK *proclock = NULL;
LWLock *partitionLock = LockHashPartitionLock(locallock->hashcode);
Oid relid = locktag->locktag_field2;
uint32 f;
uint32 i,
group;
/* fast-path group the lock belongs to */
group = FAST_PATH_REL_GROUP(relid);
LWLockAcquire(&MyProc->fpInfoLock, LW_EXCLUSIVE);
for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
{
uint32 lockmode;
/* index into the whole per-backend array */
uint32 f = FAST_PATH_SLOT(group, i);
/* Look for an allocated slot matching the given relid. */
if (relid != MyProc->fpRelId[f] || FAST_PATH_GET_BITS(MyProc, f) == 0)
continue;
@ -2957,7 +3033,8 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
for (i = 0; i < ProcGlobal->allProcCount; i++)
{
PGPROC *proc = &ProcGlobal->allProcs[i];
uint32 f;
uint32 j,
group;
/* A backend never blocks itself */
if (proc == MyProc)
@ -2979,10 +3056,16 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
continue;
}
for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
/* fast-path group the lock belongs to */
group = FAST_PATH_REL_GROUP(relid);
for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
{
uint32 lockmask;
/* index into the whole per-backend array */
uint32 f = FAST_PATH_SLOT(group, j);
/* Look for an allocated slot matching the given relid. */
if (relid != proc->fpRelId[f])
continue;

View File

@ -103,6 +103,8 @@ ProcGlobalShmemSize(void)
Size size = 0;
Size TotalProcs =
add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
Size fpLockBitsSize,
fpRelIdSize;
/* ProcGlobal */
size = add_size(size, sizeof(PROC_HDR));
@ -113,6 +115,15 @@ ProcGlobalShmemSize(void)
size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->subxidStates)));
size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->statusFlags)));
/*
* Memory needed for PGPROC fast-path lock arrays. Make sure the sizes are
* nicely aligned in each backend.
*/
fpLockBitsSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(uint64));
fpRelIdSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP);
size = add_size(size, mul_size(TotalProcs, (fpLockBitsSize + fpRelIdSize)));
return size;
}
@ -163,6 +174,12 @@ InitProcGlobal(void)
bool found;
uint32 TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS + max_prepared_xacts;
/* Used for setup of per-backend fast-path slots. */
char *fpPtr,
*fpEndPtr PG_USED_FOR_ASSERTS_ONLY;
Size fpLockBitsSize,
fpRelIdSize;
/* Create the ProcGlobal shared structure */
ProcGlobal = (PROC_HDR *)
ShmemInitStruct("Proc Header", sizeof(PROC_HDR), &found);
@ -211,12 +228,38 @@ InitProcGlobal(void)
ProcGlobal->statusFlags = (uint8 *) ShmemAlloc(TotalProcs * sizeof(*ProcGlobal->statusFlags));
MemSet(ProcGlobal->statusFlags, 0, TotalProcs * sizeof(*ProcGlobal->statusFlags));
/*
* Allocate arrays for fast-path locks. Those are variable-length, so
* can't be included in PGPROC directly. We allocate a separate piece of
* shared memory and then divide that between backends.
*/
fpLockBitsSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(uint64));
fpRelIdSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP);
fpPtr = ShmemAlloc(TotalProcs * (fpLockBitsSize + fpRelIdSize));
MemSet(fpPtr, 0, TotalProcs * (fpLockBitsSize + fpRelIdSize));
/* For asserts checking we did not overflow. */
fpEndPtr = fpPtr + (TotalProcs * (fpLockBitsSize + fpRelIdSize));
for (i = 0; i < TotalProcs; i++)
{
PGPROC *proc = &procs[i];
/* Common initialization for all PGPROCs, regardless of type. */
/*
* Set the fast-path lock arrays, and move the pointer. We interleave
* the two arrays, to (hopefully) get some locality for each backend.
*/
proc->fpLockBits = (uint64 *) fpPtr;
fpPtr += fpLockBitsSize;
proc->fpRelId = (Oid *) fpPtr;
fpPtr += fpRelIdSize;
Assert(fpPtr <= fpEndPtr);
/*
* Set up per-PGPROC semaphore, latch, and fpInfoLock. Prepared xact
* dummy PGPROCs don't need these though - they're never associated
@ -278,6 +321,9 @@ InitProcGlobal(void)
pg_atomic_init_u64(&(proc->waitStart), 0);
}
/* Should have consumed exactly the expected amount of fast-path memory. */
Assert(fpPtr = fpEndPtr);
/*
* Save pointers to the blocks of PGPROC structures reserved for auxiliary
* processes and prepared transactions.

View File

@ -4190,6 +4190,9 @@ PostgresSingleUserMain(int argc, char *argv[],
/* Initialize MaxBackends */
InitializeMaxBackends();
/* Initialize size of fast-path lock cache. */
InitializeFastPathLocks();
/*
* Give preloaded libraries a chance to request additional shared memory.
*/

View File

@ -557,6 +557,40 @@ InitializeMaxBackends(void)
MAX_BACKENDS)));
}
/*
* Initialize the number of fast-path lock slots in PGPROC.
*
* This must be called after modules have had the chance to alter GUCs in
* shared_preload_libraries and before shared memory size is determined.
*
* The default max_locks_per_xact=64 means 4 groups by default.
*
* We allow anything between 1 and 1024 groups, with the usual power-of-2
* logic. The 1 is the "old" size with only 16 slots, 1024 is an arbitrary
* limit (matching max_locks_per_xact = 16k). Values over 1024 are unlikely
* to be beneficial - there are bottlenecks we'll hit way before that.
*/
void
InitializeFastPathLocks(void)
{
/* Should be initialized only once. */
Assert(FastPathLockGroupsPerBackend == 0);
/* we need at least one group */
FastPathLockGroupsPerBackend = 1;
while (FastPathLockGroupsPerBackend < FP_LOCK_GROUPS_PER_BACKEND_MAX)
{
/* stop once we exceed max_locks_per_xact */
if (FastPathLockGroupsPerBackend * FP_LOCK_SLOTS_PER_GROUP >= max_locks_per_xact)
break;
FastPathLockGroupsPerBackend *= 2;
}
Assert(FastPathLockGroupsPerBackend <= FP_LOCK_GROUPS_PER_BACKEND_MAX);
}
/*
* Early initialization of a backend (either standalone or under postmaster).
* This happens even before InitPostgres.

View File

@ -475,6 +475,7 @@ extern PGDLLIMPORT ProcessingMode Mode;
#define INIT_PG_OVERRIDE_ROLE_LOGIN 0x0004
extern void pg_split_opts(char **argv, int *argcp, const char *optstr);
extern void InitializeMaxBackends(void);
extern void InitializeFastPathLocks(void);
extern void InitPostgres(const char *in_dbname, Oid dboid,
const char *username, Oid useroid,
bits32 flags,

View File

@ -78,12 +78,17 @@ struct XidCache
#define PROC_XMIN_FLAGS (PROC_IN_VACUUM | PROC_IN_SAFE_IC)
/*
* We allow a small number of "weak" relation locks (AccessShareLock,
* We allow a limited number of "weak" relation locks (AccessShareLock,
* RowShareLock, RowExclusiveLock) to be recorded in the PGPROC structure
* rather than the main lock table. This eases contention on the lock
* manager LWLocks. See storage/lmgr/README for additional details.
* (or rather in shared memory referenced from PGPROC) rather than the main
* lock table. This eases contention on the lock manager LWLocks. See
* storage/lmgr/README for additional details.
*/
#define FP_LOCK_SLOTS_PER_BACKEND 16
extern PGDLLIMPORT int FastPathLockGroupsPerBackend;
#define FP_LOCK_GROUPS_PER_BACKEND_MAX 1024
#define FP_LOCK_SLOTS_PER_GROUP 16 /* don't change */
#define FP_LOCK_SLOTS_PER_BACKEND (FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)
/*
* Flags for PGPROC.delayChkptFlags
@ -292,8 +297,8 @@ struct PGPROC
/* Lock manager data, recording fast-path locks taken by this backend. */
LWLock fpInfoLock; /* protects per-backend fast-path state */
uint64 fpLockBits; /* lock modes held for each fast-path slot */
Oid fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */
uint64 *fpLockBits; /* lock modes held for each fast-path slot */
Oid *fpRelId; /* slots for rel oids */
bool fpVXIDLock; /* are we holding a fast-path VXID lock? */
LocalTransactionId fpLocalTransactionId; /* lxid for fast-path VXID
* lock */