Improve fix for not entering parallel mode when holding interrupts.

Commit ac04aa84a put the shutoff for this into the planner, which is not ideal because it doesn't prevent us from re-using a previously made parallel plan. Revert the planner change and instead put the shutoff into InitializeParallelDSM, modeling it on the existing code there for recovering from failure to allocate a DSM segment. However, that code path is mostly untested, and testing a bit harder showed there's at least one bug: ExecHashJoinReInitializeDSM is not prepared for us to have skipped doing parallel DSM setup. I also thought the Assert in ReinitializeParallelWorkers is pretty ill-advised, and replaced it with a silent Min() operation. The existing test case added by ac04aa84a serves fine to test this version of the fix, so no change needed there. Patch by me, but thanks to Noah Misch for the core idea that we could shut off worker creation when !INTERRUPTS_CAN_BE_PROCESSED. Back-patch to v12, as ac04aa84a was. Discussion: https://postgr.es/m/CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com
2025-03-07 19:47:50 +08:00 · 2024-11-08 13:42:01 -05:00 · 2024-11-08 13:42:01 -05:00 · 6e39ca6e7e
commit 6e39ca6e7e
parent 9c47574916
3 changed files with 16 additions and 8 deletions
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@ -219,6 +219,15 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FixedParallelState));
 	shm_toc_estimate_keys(&pcxt->estimator, 1);

+	/*
+	 * If we manage to reach here while non-interruptible, it's unsafe to
+	 * launch any workers: we would fail to process interrupts sent by them.
+	 * We can deal with that edge case by pretending no workers were
+	 * requested.
+	 */
+	if (!INTERRUPTS_CAN_BE_PROCESSED())
+		pcxt->nworkers = 0;
+
 	/*
 	 * Normally, the user will have requested at least one worker process, but
 	 * if by chance they have not, we can skip a bunch of things here.
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@ -1516,8 +1516,13 @@ void
 ExecHashJoinReInitializeDSM(HashJoinState *state, ParallelContext *cxt)
 {
 	int			plan_node_id = state->js.ps.plan->plan_node_id;
-	ParallelHashJoinState *pstate =
-	shm_toc_lookup(cxt->toc, plan_node_id, false);
+	ParallelHashJoinState *pstate;
+
+	/* Nothing to do if we failed to create a DSM segment. */
+	if (cxt->seg == NULL)
+		return;
+
+	pstate = shm_toc_lookup(cxt->toc, plan_node_id, false);

 	/*
 	 * It would be possible to reuse the shared hash table in single-batch
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@ -331,11 +331,6 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	 * general; updates and deletes have additional problems especially around
 	 * combo CIDs.)
 	 *
-	 * We don't try to use parallel mode unless interruptible.  The leader
-	 * expects ProcessInterrupts() calls to reach HandleParallelMessages().
-	 * Even if we called HandleParallelMessages() another way, starting a
-	 * parallel worker is too delay-prone to be prudent when uncancellable.
-	 *
 	 * For now, we don't try to use parallel mode if we're running inside a
 	 * parallel worker.  We might eventually be able to relax this
 	 * restriction, but for now it seems best not to have parallel workers
@ -346,7 +341,6 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 		parse->commandType == CMD_SELECT &&
 		!parse->hasModifyingCTE &&
 		max_parallel_workers_per_gather > 0 &&
-		INTERRUPTS_CAN_BE_PROCESSED() &&
 		!IsParallelWorker())
 	{
 		/* all the cheap tests pass, so scan the query tree */