mirror of
https://git.postgresql.org/git/postgresql.git
synced 2024-12-21 08:29:39 +08:00
Update to describe new set of globally-known contexts planned for support
of extended query features in new FE/BE protocol. TransactionCommandContext is gone (PortalContext replaces it for some purposes), and QueryContext has taken on a new meaning (MessageContext plays its old role).
This commit is contained in:
parent
aa282d4446
commit
0c57d69dd7
@ -1,4 +1,4 @@
|
|||||||
$Header: /cvsroot/pgsql/src/backend/utils/mmgr/README,v 1.3 2001/02/15 21:38:26 tgl Exp $
|
$Header: /cvsroot/pgsql/src/backend/utils/mmgr/README,v 1.4 2003/04/30 19:04:12 tgl Exp $
|
||||||
|
|
||||||
Notes about memory allocation redesign
|
Notes about memory allocation redesign
|
||||||
--------------------------------------
|
--------------------------------------
|
||||||
@ -110,109 +110,121 @@ children of a given context, but don't reset or delete that context
|
|||||||
itself".
|
itself".
|
||||||
|
|
||||||
|
|
||||||
Top-level contexts
|
Globally known contexts
|
||||||
------------------
|
-----------------------
|
||||||
|
|
||||||
There will be several top-level contexts --- these contexts have no parent
|
There will be several widely-known contexts that will typically be
|
||||||
and will be referenced by global variables. At any instant the system may
|
referenced through global variables. At any instant the system may
|
||||||
contain many additional contexts, but all other contexts should be direct
|
contain many additional contexts, but all other contexts should be direct
|
||||||
or indirect children of one of the top-level contexts to ensure they are
|
or indirect children of one of these contexts to ensure they are not
|
||||||
not leaked in event of an error. I presently envision these top-level
|
leaked in event of an error.
|
||||||
contexts:
|
|
||||||
|
|
||||||
TopMemoryContext --- allocating here is essentially the same as "malloc",
|
TopMemoryContext --- this is the actual top level of the context tree;
|
||||||
because this context will never be reset or deleted. This is for stuff
|
every other context is a direct or indirect child of this one. Allocating
|
||||||
that should live forever, or for stuff that you know you will delete
|
here is essentially the same as "malloc", because this context will never
|
||||||
at the appropriate time. An example is fd.c's tables of open files,
|
be reset or deleted. This is for stuff that should live forever, or for
|
||||||
as well as the context management nodes for memory contexts themselves.
|
stuff that the controlling module will take care of deleting at the
|
||||||
Avoid allocating stuff here unless really necessary, and especially
|
appropriate time. An example is fd.c's tables of open files, as well as
|
||||||
avoid running with CurrentMemoryContext pointing here.
|
the context management nodes for memory contexts themselves. Avoid
|
||||||
|
allocating stuff here unless really necessary, and especially avoid
|
||||||
|
running with CurrentMemoryContext pointing here.
|
||||||
|
|
||||||
PostmasterContext --- this is the postmaster's normal working context.
|
PostmasterContext --- this is the postmaster's normal working context.
|
||||||
After a backend is spawned, it can delete PostmasterContext to free its
|
After a backend is spawned, it can delete PostmasterContext to free its
|
||||||
copy of memory the postmaster was using that it doesn't need. (Anything
|
copy of memory the postmaster was using that it doesn't need. (Anything
|
||||||
that has to be passed from postmaster to backends will be passed in
|
that has to be passed from postmaster to backends will be passed in
|
||||||
TopMemoryContext. The postmaster will probably have only TopMemoryContext,
|
TopMemoryContext. The postmaster will have only TopMemoryContext,
|
||||||
PostmasterContext, and possibly ErrorContext --- the remaining top-level
|
PostmasterContext, and ErrorContext --- the remaining top-level contexts
|
||||||
contexts will be set up in each backend during startup.)
|
will be set up in each backend during startup.)
|
||||||
|
|
||||||
CacheMemoryContext --- permanent storage for relcache, catcache, and
|
CacheMemoryContext --- permanent storage for relcache, catcache, and
|
||||||
related modules. This will never be reset or deleted, either, so it's
|
related modules. This will never be reset or deleted, either, so it's
|
||||||
not truly necessary to distinguish it from TopMemoryContext. But it
|
not truly necessary to distinguish it from TopMemoryContext. But it
|
||||||
seems worthwhile to maintain the distinction for debugging purposes.
|
seems worthwhile to maintain the distinction for debugging purposes.
|
||||||
(Note: CacheMemoryContext may well have child-contexts with shorter
|
(Note: CacheMemoryContext will have child-contexts with shorter lifespans.
|
||||||
lifespans. For example, a child context seems like the best place to
|
For example, a child context is the best place to keep the subsidiary
|
||||||
keep the subsidiary storage associated with a relcache entry; that way
|
storage associated with a relcache entry; that way we can free rule
|
||||||
we can free rule parsetrees and so forth easily, without having to depend
|
parsetrees and so forth easily, without having to depend on constructing
|
||||||
on constructing a reliable version of freeObject().)
|
a reliable version of freeObject().)
|
||||||
|
|
||||||
QueryContext --- this is where the storage holding a received query string
|
MessageContext --- this context holds the current command message from the
|
||||||
is kept, as well as storage that should live as long as the query string,
|
frontend, as well as any derived storage that need only live as long as
|
||||||
notably the parsetree constructed from it. This context will be reset at
|
the current message (for example, in simple-Query mode the parse and plan
|
||||||
the top of each cycle of the outer loop of PostgresMain, thereby freeing
|
trees can live here). This context will be reset, and any children
|
||||||
the old query and parsetree. We must keep this separate from
|
deleted, at the top of each cycle of the outer loop of PostgresMain. This
|
||||||
TopTransactionContext because a query string might need to live either a
|
is kept separate from per-transaction and per-portal contexts because a
|
||||||
longer or shorter time than a transaction, depending on whether it
|
query string might need to live either a longer or shorter time than any
|
||||||
contains begin/end commands or not. (This'll also fix the nasty bug that
|
single transaction or portal.
|
||||||
"vacuum; anything else" crashes if submitted as a single query string,
|
|
||||||
because vacuum's xact commit frees the memory holding the parsetree...)
|
|
||||||
|
|
||||||
TopTransactionContext --- this holds everything that lives until end of
|
TopTransactionContext --- this holds everything that lives until end of
|
||||||
transaction (longer than one statement within a transaction!). An example
|
transaction (longer than one statement within a transaction!). An example
|
||||||
of what has to be here is the list of pending NOTIFY messages to be sent
|
of what has to be here is the list of pending NOTIFY messages to be sent
|
||||||
at xact commit. This context will be reset, and all its children deleted,
|
at xact commit. This context will be reset, and all its children deleted,
|
||||||
at conclusion of each transaction cycle. Note: presently I envision that
|
at conclusion of each transaction cycle. Note: this context is NOT
|
||||||
this context will NOT be cleared immediately upon error; its contents
|
cleared immediately upon error; its contents will survive until the
|
||||||
will survive anyway until the transaction block is exited by
|
transaction block is exited by COMMIT/ROLLBACK.
|
||||||
COMMIT/ROLLBACK. This seems appropriate since we want to move in the
|
(If we ever implement nested transactions, TopTransactionContext may need
|
||||||
direction of allowing a transaction to continue processing after an error.
|
to be split into a true "top" pointer and a "current transaction" pointer.)
|
||||||
|
|
||||||
TransactionCommandContext --- this is really a child of
|
QueryContext --- this is not actually a separate context, but a global
|
||||||
TopTransactionContext, not a top-level context, but we'll probably store a
|
variable pointing to the context that holds the current command's parse
|
||||||
link to it in a global variable anyway for convenience. All the memory
|
and plan trees. (In simple-Query mode this points to MessageContext;
|
||||||
allocated during planning and execution lives here or in a child context.
|
when executing a prepared statement it will point at the prepared
|
||||||
This context is deleted at statement completion, whether normal completion
|
statement's private context.) Generally it is not appropriate for any
|
||||||
or error abort.
|
code to use QueryContext as an allocation target --- from the point of
|
||||||
|
view of any code that would be referencing the QueryContext variable,
|
||||||
|
it's a read-only context.
|
||||||
|
|
||||||
ErrorContext --- this permanent context will be switched into
|
PortalContext --- this is not actually a separate context either, but a
|
||||||
for error recovery processing, and then reset on completion of recovery.
|
global variable pointing to the per-portal context of the currently active
|
||||||
We'll arrange to have, say, 8K of memory available in it at all times.
|
execution portal. This can be used if it's necessary to allocate storage
|
||||||
In this way, we can ensure that some memory is available for error
|
that will live just as long as the execution of the current portal requires.
|
||||||
recovery even if the backend has run out of memory otherwise. This should
|
|
||||||
allow out-of-memory to be treated as a normal ERROR condition, not a FATAL
|
|
||||||
error.
|
|
||||||
|
|
||||||
If we ever implement nested transactions, there may need to be some
|
ErrorContext --- this permanent context will be switched into for error
|
||||||
additional levels of transaction-local contexts between
|
recovery processing, and then reset on completion of recovery. We'll
|
||||||
TopTransactionContext and TransactionCommandContext, but that's beyond
|
arrange to have, say, 8K of memory available in it at all times. In this
|
||||||
the scope of this proposal.
|
way, we can ensure that some memory is available for error recovery even
|
||||||
|
if the backend has run out of memory otherwise. This allows out-of-memory
|
||||||
|
to be treated as a normal ERROR condition, not a FATAL error.
|
||||||
|
|
||||||
|
|
||||||
|
Contexts for prepared statements and portals
|
||||||
|
--------------------------------------------
|
||||||
|
|
||||||
|
A prepared-statement object has an associated private context, in which
|
||||||
|
the parse and plan trees for its query are stored. Because these trees
|
||||||
|
are read-only to the executor, the prepared statement can be re-used many
|
||||||
|
times without further copying of these trees. QueryContext points at this
|
||||||
|
private context while executing any portal built from the prepared
|
||||||
|
statement.
|
||||||
|
|
||||||
|
An execution-portal object has a private context that is referenced by
|
||||||
|
PortalContext when the portal is active. In the case of a portal created
|
||||||
|
by DECLARE CURSOR, this private context contains the query parse and plan
|
||||||
|
trees (there being no other object that can hold them). Portals created
|
||||||
|
from prepared statements simply reference the prepared statements' trees,
|
||||||
|
and won't actually need any storage allocated in their private contexts.
|
||||||
|
|
||||||
|
|
||||||
Transient contexts during execution
|
Transient contexts during execution
|
||||||
-----------------------------------
|
-----------------------------------
|
||||||
|
|
||||||
The planner will probably have a transient context in which it stores
|
When creating a prepared statement, the parse and plan trees will be built
|
||||||
pathnodes; this will allow it to release the bulk of its temporary space
|
in a temporary context that's a child of MessageContext (so that it will
|
||||||
usage (which can be a lot, for large joins) at completion of planning.
|
go away automatically upon error). On success, the finished plan is
|
||||||
The completed plan tree will be in TransactionCommandContext.
|
copied to the prepared statement's private context, and the temp context
|
||||||
|
is released; this allows planner temporary space to be recovered before
|
||||||
|
execution begins. (In simple-Query mode we'll not bother with the extra
|
||||||
|
copy step, so the planner temp space stays around till end of query.)
|
||||||
|
|
||||||
The top-level executor routines, as well as most of the "plan node"
|
The top-level executor routines, as well as most of the "plan node"
|
||||||
execution code, will normally run in a context with command lifetime.
|
execution code, will normally run in a context that is created by
|
||||||
(This will be TransactionCommandContext for normal queries, but when
|
ExecutorStart and destroyed by ExecutorEnd; this context also holds the
|
||||||
executing a cursor, it will be a context associated with the cursor.)
|
"plan state" tree built during ExecutorStart. Most of the memory
|
||||||
Most of the memory allocated in these routines is intended to live until
|
allocated in these routines is intended to live until end of query,
|
||||||
end of query, so this is appropriate for those purposes. We already have
|
so this is appropriate for those purposes. The executor's top context
|
||||||
a mechanism --- "tuple table slots" --- for avoiding leakage of tuples,
|
is a child of PortalContext, that is, the per-portal context of the
|
||||||
which is the major kind of short-lived data handled by these routines.
|
portal that represents the query's execution.
|
||||||
This still leaves a certain amount of explicit pfree'ing needed by plan
|
|
||||||
node code, but that code largely exists already and is probably not worth
|
|
||||||
trying to remove. I looked at the possibility of running in a shorter-
|
|
||||||
lived context (such as a context that gets reset per-tuple), but this
|
|
||||||
seems fairly impractical. The biggest problem with it is that code in
|
|
||||||
the index access routines, as well as some other complex algorithms like
|
|
||||||
tuplesort.c, assumes that palloc'd storage will live across tuples.
|
|
||||||
For example, rtree uses a palloc'd state stack to keep track of an index
|
|
||||||
scan.
|
|
||||||
|
|
||||||
The main improvement needed in the executor is that expression evaluation
|
The main improvement needed in the executor is that expression evaluation
|
||||||
--- both for qual testing and for computation of targetlist entries ---
|
--- both for qual testing and for computation of targetlist entries ---
|
||||||
@ -277,7 +289,7 @@ be released on error. Currently it does that through a "portal",
|
|||||||
which is essentially a child context of TopMemoryContext. While that
|
which is essentially a child context of TopMemoryContext. While that
|
||||||
way still works, it's ugly since xact abort needs special processing
|
way still works, it's ugly since xact abort needs special processing
|
||||||
to delete the portal. Better would be to use a context that's a child
|
to delete the portal. Better would be to use a context that's a child
|
||||||
of QueryContext and hence is certain to go away as part of normal
|
of PortalContext and hence is certain to go away as part of normal
|
||||||
processing. (Eventually we might have an even better solution from
|
processing. (Eventually we might have an even better solution from
|
||||||
nested transactions, but this'll do fine for now.)
|
nested transactions, but this'll do fine for now.)
|
||||||
|
|
||||||
@ -371,12 +383,14 @@ the relcache's per-relation contexts).
|
|||||||
Also, it will be possible to specify a minimum context size. If this
|
Also, it will be possible to specify a minimum context size. If this
|
||||||
value is greater than zero then a block of that size will be grabbed
|
value is greater than zero then a block of that size will be grabbed
|
||||||
immediately upon context creation, and cleared but not released during
|
immediately upon context creation, and cleared but not released during
|
||||||
context resets. This feature is needed for ErrorContext (see above).
|
context resets. This feature is needed for ErrorContext (see above),
|
||||||
It is also useful for per-tuple contexts, which will be reset frequently
|
but will most likely not be used for other contexts.
|
||||||
and typically will not allocate very much space per tuple cycle. We can
|
|
||||||
save a lot of unnecessary malloc traffic if these contexts hang onto one
|
We expect that per-tuple contexts will be reset frequently and typically
|
||||||
allocation block rather than releasing and reacquiring the block on
|
will not allocate very much space per tuple cycle. To make this usage
|
||||||
each tuple cycle.
|
pattern cheap, the first block allocated in a context is not given
|
||||||
|
back to malloc() during reset, but just cleared. This avoids malloc
|
||||||
|
thrashing.
|
||||||
|
|
||||||
|
|
||||||
Other notes
|
Other notes
|
||||||
|
Loading…
Reference in New Issue
Block a user