mirror of
https://github.com/HDFGroup/hdf5.git
synced 2024-11-27 02:10:55 +08:00
511 lines
18 KiB
HTML
511 lines
18 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
|
|
<html>
|
|
<head>
|
|
<title>Memory Management in HDF5</title>
|
|
</head>
|
|
|
|
<body>
|
|
<h1>Memory Management in HDF5</h1>
|
|
|
|
<!-- ---------------------------------------------------------------- -->
|
|
<h2>Is a Memory Manager Necessary?</h2>
|
|
|
|
<p>Some form of memory management may be necessary in HDF5 when
|
|
the various deletion operators are implemented so that the
|
|
file memory is not permanently orphaned. However, since an
|
|
HDF5 file was designed with persistent data in mind, the
|
|
importance of a memory manager is questionable.
|
|
|
|
<p>On the other hand, when certain meta data containers (file glue)
|
|
grow, they may need to be relocated in order to keep the
|
|
container contiguous.
|
|
|
|
<blockquote>
|
|
<b>Example:</b> An object header consists of up to two
|
|
chunks of contiguous memory. The first chunk is a fixed
|
|
size at a fixed location when the header link count is
|
|
greater than one. Thus, inserting additional items into an
|
|
object header may require the second chunk to expand. When
|
|
this occurs, the second chunk may need to move to another
|
|
location in the file, freeing the file memory which that
|
|
chunk originally occupied.
|
|
</blockquote>
|
|
|
|
<p>The relocation of meta data containers could potentially
|
|
orphan a significant amount of file memory if the application
|
|
has made poor estimates for preallocation sizes.
|
|
|
|
<!-- ---------------------------------------------------------------- -->
|
|
<h2>Levels of Memory Management</h2>
|
|
|
|
<p>Memory management by the library can be independent of memory
|
|
management support by the file format. The file format can
|
|
support no memory management, some memory management, or full
|
|
memory management. Similarly with the library.
|
|
|
|
<h3>Support in the Library</h3>
|
|
|
|
<dl>
|
|
<dt><b>No Support: I</b>
|
|
<dd>When memory is deallocated it simply becomes unreferenced
|
|
(orphaned) in the file. Memory allocation requests are
|
|
satisfied by extending the file.
|
|
|
|
<dd>A separate off-line utility can be used to detect the
|
|
unreferenced bytes of a file and "bubble" them up to the end
|
|
of the file and then truncate the file.
|
|
|
|
<dt><b>Some Support: II</b>
|
|
<dd>The library could support partial memory management all
|
|
the time, or full memory management some of the time.
|
|
Orphaning free blocks instead of adding them to a free list
|
|
should not affect the file integrity, nor should fulfilling
|
|
new requests by extending the file instead of using the free
|
|
list.
|
|
|
|
<dt><b>Full Support: III</b>
|
|
<dd>The library supports space-efficient memory management by
|
|
always fulfilling allocation requests from the free list when
|
|
possible, and by coalescing adjacent free blocks into a
|
|
single larger free block.
|
|
</dl>
|
|
|
|
<h3>Support in the File Format</h3>
|
|
|
|
<dl>
|
|
<dt><b>No Support: A</b>
|
|
<dd>The file format does not support memory management; any
|
|
unreferenced block in the file is assumed to be free. If
|
|
the library supports full memory management then it will
|
|
have to traverse the entire file to determine which blocks
|
|
are unreferenced.
|
|
|
|
<dt><b>Some Support: B</b>
|
|
<dd>Assuming that unreferenced blocks are free can be
|
|
dangerous in a situation where the file is not consistent.
|
|
For instance, if a directory tree becomes detached from the
|
|
main directory hierarchy, then the detached directory and
|
|
everything that is referenced only through the detached
|
|
directory become unreferenced. File repair utilities will
|
|
be unable to determine which unreferenced blocks need to be
|
|
linked back into the file hierarchy.
|
|
|
|
<dd>Therefore, it might be useful to keep an unsorted,
|
|
doubly-linked list of free blocks in the file. The library
|
|
can add and remove blocks from the list in constant time,
|
|
and can generate its own internal free-block data structure
|
|
in time proportional to the number of free blocks instead of
|
|
the size of the file. Additionally, a library can use a
|
|
subset of the free blocks, an alternative which is not
|
|
feasible if the file format doesn't support any form of
|
|
memory management.
|
|
|
|
<dt><b>Full Support: C</b>
|
|
<dd>The file format can mirror library data structures for
|
|
space-efficient memory management. The free blocks are
|
|
linked in unsorted, doubly-linked lists with one list per
|
|
free block size. The heads of the lists are pointed to by a
|
|
B-tree whose nodes are sorted by free block size. At the
|
|
same time, all free blocks are the leaf nodes of another
|
|
B-tree sorted by starting and ending address. When the
|
|
trees are used in combination we can deallocate and allocate
|
|
memory in O(log <em>N</em>) time where <em>N</em> is the
|
|
number of free blocks.
|
|
</dl>
|
|
|
|
<h3>Combinations of Library and File Format Support</h3>
|
|
|
|
<p>We now evaluate each combination of library support with file
|
|
support:
|
|
|
|
<dl>
|
|
<dt><b>I-A</b>
|
|
<dd>If neither the library nor the file support memory
|
|
management, then each allocation request will come from the
|
|
end of the file and each deallocation request is a no-op
|
|
that simply leaves the free block unreferenced.
|
|
|
|
<ul>
|
|
<li>Advantages
|
|
<ul>
|
|
<li>No file overhead for allocation or deallocation.
|
|
<li>No library overhead for allocation or
|
|
deallocation.
|
|
<li>No file traversal required at time of open.
|
|
<li>No data needs to be written back to the file when
|
|
it's closed.
|
|
<li>Trivial to implement (already implemented).
|
|
</ul>
|
|
|
|
<li>Disadvantages
|
|
<ul>
|
|
<li>Inefficient use of file space.
|
|
<li>A file repair utility must reclaim lost file space.
|
|
<li>Difficulties for file repair utilities. (Is an
|
|
unreferenced block a free block or orphaned data?)
|
|
</ul>
|
|
</ul>
|
|
|
|
<dt><b>II-A</b>
|
|
<dd>In order for the library to support memory management, it
|
|
will be required to build the internal free block
|
|
representation by traversing the entire file looking for
|
|
unreferenced blocks.
|
|
|
|
<ul>
|
|
<li>Advantages
|
|
<ul>
|
|
<li>No file overhead for allocation or deallocation.
|
|
<li>Variable amount of library overhead for allocation
|
|
and deallocation depending on how much work the
|
|
library wants to do.
|
|
<li>No data needs to be written back to the file when
|
|
it's closed.
|
|
<li>Might use file space efficiently.
|
|
</ul>
|
|
<li>Disadvantages
|
|
<ul>
|
|
<li>Might use file space inefficiently.
|
|
<li>File traversal required at time of open.
|
|
<li>A file repair utility must reclaim lost file space.
|
|
<li>Difficulties for file repair utilities.
|
|
<li>Sharing of the free list between processes falls
|
|
outside the HDF5 file format documentation.
|
|
</ul>
|
|
</ul>
|
|
|
|
<dt><b>III-A</b>
|
|
<dd>In order for the library to support full memory
|
|
management, it will be required to build the internal free
|
|
block representation by traversing the entire file looking
|
|
for unreferenced blocks.
|
|
|
|
<ul>
|
|
<li>Advantages
|
|
<ul>
|
|
<li>No file overhead for allocation or deallocation.
|
|
<li>Efficient use of file space.
|
|
<li>No data needs to be written back to the file when
|
|
it's closed.
|
|
</ul>
|
|
<li>Disadvantages
|
|
<ul>
|
|
<li>Moderate amount of library overhead for allocation
|
|
and deallocation.
|
|
<li>File traversal required at time of open.
|
|
<li>A file repair utility must reclaim lost file space.
|
|
<li>Difficulties for file repair utilities.
|
|
<li>Sharing of the free list between processes falls
|
|
outside the HDF5 file format documentation.
|
|
</ul>
|
|
</ul>
|
|
|
|
<dt><b>I-B</b>
|
|
<dd>If the library doesn't support memory management but the
|
|
file format supports some level of management, then a file
|
|
repair utility will have to be run occasionally to reclaim
|
|
unreferenced blocks.
|
|
|
|
<ul>
|
|
<li>Advantages
|
|
<ul>
|
|
<li>No file overhead for allocation or deallocation.
|
|
<li>No library overhead for allocation or
|
|
deallocation.
|
|
<li>No file traversal required at time of open.
|
|
<li>No data needs to be written back to the file when
|
|
it's closed.
|
|
</ul>
|
|
<li>Disadvantages
|
|
<ul>
|
|
<li>A file repair utility must reclaim lost file space.
|
|
<li>Difficulties for file repair utilities.
|
|
</ul>
|
|
</ul>
|
|
|
|
<dt><b>II-B</b>
|
|
<dd>Both the library and the file format support some level
|
|
of memory management.
|
|
|
|
<ul>
|
|
<li>Advantages
|
|
<ul>
|
|
<li>Constant file overhead per allocation or
|
|
deallocation.
|
|
<li>Variable library overhead per allocation or
|
|
deallocation depending on how much work the library
|
|
wants to do.
|
|
<li>Traversal at file open time is on the order of the
|
|
free list size instead of the file size.
|
|
<li>The library has the option of reading only part of
|
|
the free list.
|
|
<li>No data needs to be written at file close time if
|
|
it has been amortized into the cost of allocation
|
|
and deallocation.
|
|
<li>File repair utilties don't have to be run to
|
|
reclaim memory.
|
|
<li>File repair utilities can detect whether an
|
|
unreferenced block is a free block or orphaned data.
|
|
<li>Sharing of the free list between processes might
|
|
be easier.
|
|
<li>Possible efficient use of file space.
|
|
</ul>
|
|
<li>Disadvantages
|
|
<ul>
|
|
<li>Possible inefficient use of file space.
|
|
</ul>
|
|
</ul>
|
|
|
|
<dt><b>III-B</b>
|
|
<dd>The library provides space-efficient memory management but
|
|
the file format only supports an unsorted list of free
|
|
blocks.
|
|
|
|
<ul>
|
|
<li>Advantages
|
|
<ul>
|
|
<li>Constant time file overhead per allocation or
|
|
deallocation.
|
|
<li>No data needs to be written at file close time if
|
|
it has been amortized into the cost of allocation
|
|
and deallocation.
|
|
<li>File repair utilities don't have to be run to
|
|
reclaim memory.
|
|
<li>File repair utilities can detect whether an
|
|
unreferenced block is a free block or orphaned data.
|
|
<li>Sharing of the free list between processes might
|
|
be easier.
|
|
<li>Efficient use of file space.
|
|
</ul>
|
|
<li>Disadvantages
|
|
<ul>
|
|
<li>O(log <em>N</em>) library overhead per allocation or
|
|
deallocation where <em>N</em> is the total number of
|
|
free blocks.
|
|
<li>O(<em>N</em>) time to open a file since the entire
|
|
free list must be read to construct the in-core
|
|
trees used by the library.
|
|
<li>Library is more complicated.
|
|
</ul>
|
|
</ul>
|
|
|
|
<dt><b>I-C</b>
|
|
<dd>This has the same advantages and disadvantages as I-C with
|
|
the added disadvantage that the file format is much more
|
|
complicated.
|
|
|
|
<dt><b>II-C</b>
|
|
<dd>If the library only provides partial memory management but
|
|
the file requires full memory management, then this method
|
|
degenerates to the same as II-A with the added disadvantage
|
|
that the file format is much more complicated.
|
|
|
|
<dt><b>III-C</b>
|
|
<dd>The library and file format both provide complete data
|
|
structures for space-efficient memory management.
|
|
|
|
<ul>
|
|
<li>Advantages
|
|
<ul>
|
|
<li>Files can be opened in constant time since the
|
|
free list is read on demand and amortised into the
|
|
allocation and deallocation requests.
|
|
<li>No data needs to be written back to the file when
|
|
it's closed.
|
|
<li>File repair utilities don't have to be run to
|
|
reclaim memory.
|
|
<li>File repair utilities can detect whether an
|
|
unreferenced block is a free block or orphaned data.
|
|
<li>Sharing the free list between processes is easy.
|
|
<li>Efficient use of file space.
|
|
</ul>
|
|
<li>Disadvantages
|
|
<ul>
|
|
<li>O(log <em>N</em>) file allocation and deallocation
|
|
cost where <em>N</em> is the total number of free
|
|
blocks.
|
|
<li>O(log <em>N</em>) library allocation and
|
|
deallocation cost.
|
|
<li>Much more complicated file format.
|
|
<li>More complicated library.
|
|
</ul>
|
|
</ul>
|
|
|
|
</dl>
|
|
|
|
<!-- ---------------------------------------------------------------- -->
|
|
<h2>The Algorithm for II-B</h2>
|
|
|
|
<p>The file contains an unsorted, doubly-linked list of free
|
|
blocks. The address of the head of the list appears in the
|
|
super block. Each free block contains the following fields:
|
|
|
|
<center>
|
|
<table border cellpadding=4 width="60%">
|
|
<tr align=center>
|
|
<th width="25%">byte</th>
|
|
<th width="25%">byte</th>
|
|
<th width="25%">byte</th>
|
|
<th width="25%">byte</th>
|
|
|
|
<tr align=center>
|
|
<th colspan=4>Free Block Signature</th>
|
|
|
|
<tr align=center>
|
|
<th colspan=4>Total Free Block Size</th>
|
|
|
|
<tr align=center>
|
|
<th colspan=4>Address of Left Sibling</th>
|
|
|
|
<tr align=center>
|
|
<th colspan=4>Address of Right Sibling</th>
|
|
|
|
<tr align=center>
|
|
<th colspan=4><br><br>Remainder of Free Block<br><br><br></th>
|
|
</table>
|
|
</center>
|
|
|
|
<p>The library reads as much of the free list as convenient when
|
|
convenient and pushes those entries onto stacks. This can
|
|
occur when a file is opened or any time during the life of the
|
|
file. There is one stack for each free block size and the
|
|
stacks are sorted by size in a balanced tree in memory.
|
|
|
|
<p>Deallocation involves finding the correct stack or creating
|
|
a new one (an O(log <em>K</em>) operation where <em>K</em> is
|
|
the number of stacks), pushing the free block info onto the
|
|
stack (a constant-time operation), and inserting the free
|
|
block into the file free block list (a constant-time operation
|
|
which doesn't necessarily involve any I/O since the free blocks
|
|
can be cached like other objects). No attempt is made to
|
|
coalesce adjacent free blocks into larger blocks.
|
|
|
|
<p>Allocation involves finding the correct stack (an O(log
|
|
<em>K</em>) operation), removing the top item from the stack
|
|
(a constant-time operation), and removing the block from the
|
|
file free block list (a constant-time operation). If there is
|
|
no free block of the requested size or larger, then the file
|
|
is extended.
|
|
|
|
<p>To provide sharability of the free list between processes,
|
|
the last step of an allocation will check for the free block
|
|
signature and if it doesn't find one will repeat the process.
|
|
Alternatively, a process can temporarily remove free blocks
|
|
from the file and hold them in it's own private pool.
|
|
|
|
<p>To summarize...
|
|
<dl>
|
|
<dt>File opening
|
|
<dd>O(<em>N</em>) amortized over the time the file is open,
|
|
where <em>N</em> is the number of free blocks. The library
|
|
can still function without reading any of the file free
|
|
block list.
|
|
|
|
<dt>Deallocation
|
|
<dd>O(log <em>K</em>) where <em>K</em> is the number of unique
|
|
sizes of free blocks. File access is constant.
|
|
|
|
<dt>Allocation
|
|
<dd>O(log <em>K</em>). File access is constant.
|
|
|
|
<dt>File closing
|
|
<dd>O(1) even if the library temporarily removes free
|
|
blocks from the file to hold them in a private pool since
|
|
the pool can still be a linked list on disk.
|
|
</dl>
|
|
|
|
<!-- ---------------------------------------------------------------- -->
|
|
<h2>The Algorithm for III-C</h2>
|
|
|
|
<p>The HDF5 file format supports a general B-tree mechanism
|
|
for storing data with keys. If we use a B-tree to represent
|
|
all parts of the file that are free and the B-tree is indexed
|
|
so that a free file chunk can be found if we know the starting
|
|
or ending address, then we can efficiently determine whether a
|
|
free chunk begins or ends at the specified address. Call this
|
|
the <em>Address B-Tree</em>.
|
|
|
|
<p>If a second B-tree points to a set of stacks where the
|
|
members of a particular stack are all free chunks of the same
|
|
size, and the tree is indexed by chunk size, then we can
|
|
efficiently find the best-fit chunk size for a memory request.
|
|
Call this the <em>Size B-Tree</em>.
|
|
|
|
<p>All free blocks of a particular size can be linked together
|
|
with an unsorted, doubly-linked, circular list and the left
|
|
and right sibling addresses can be stored within the free
|
|
chunk, allowing us to remove or insert items from the list in
|
|
constant time.
|
|
|
|
<p>Deallocation of a block fo file memory consists of:
|
|
|
|
<ol type="I">
|
|
<li>Add the new free block whose address is <em>ADDR</em> to the
|
|
address B-tree.
|
|
|
|
<ol type="A">
|
|
<li>If the address B-tree contains an entry for a free
|
|
block that ends at <em>ADDR</em>-1 then remove that
|
|
block from the B-tree and from the linked list (if the
|
|
block was the first on the list then the size B-tree
|
|
must be updated). Adjust the size and address of the
|
|
block being freed to include the block just removed from
|
|
the free list. The time required to search for and
|
|
possibly remove the left block is O(log <em>N</em>)
|
|
where <em>N</em> is the number of free blocks.
|
|
|
|
<li>If the address B-tree contains an entry for the free
|
|
block that begins at <em>ADDR</em>+<em>LENGTH</em> then
|
|
remove that block from the B-tree and from the linked
|
|
list (if the block was the first on the list then the
|
|
size B-tree must be updated). Adjust the size of the
|
|
block being freed to include the block just removed from
|
|
the free list. The time required to search for and
|
|
possibly remove the right block is O(log <em>N</em>).
|
|
|
|
<li>Add the new (adjusted) block to the address B-tree.
|
|
The time for this operation is O(log <em>N</em>).
|
|
</ol>
|
|
|
|
<li>Add the new block to the size B-tree and linked list.
|
|
|
|
<ol type="A">
|
|
<li>If the size B-tree has an entry for this particular
|
|
size, then add the chunk to the tail of the list. This
|
|
is an O(log <em>K</em>) operation where <em>K</em> is
|
|
the number of unique free block sizes.
|
|
|
|
<li>Otherwise make a new entry in the B-tree for chunks of
|
|
this size. This is also O(log <em>K</em>).
|
|
</ol>
|
|
</ol>
|
|
|
|
<p>Allocation is similar to deallocation.
|
|
|
|
<p>To summarize...
|
|
|
|
<dl>
|
|
<dt>File opening
|
|
<dd>O(1)
|
|
|
|
<dt>Deallocation
|
|
<dd>O(log <em>N</em>) where <em>N</em> is the total number of
|
|
free blocks. File access time is O(log <em>N</em>).
|
|
|
|
<dt>Allocation
|
|
<dd>O(log <em>N</em>). File access time is O(log <em>N</em>).
|
|
|
|
<dt>File closing
|
|
<dd>O(1).
|
|
</dl>
|
|
|
|
|
|
<hr>
|
|
<address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
|
|
<!-- Created: Thu Jul 24 15:16:40 PDT 1997 -->
|
|
<!-- hhmts start -->
|
|
Last modified: Thu Jul 31 14:41:01 EST
|
|
<!-- hhmts end -->
|
|
</body>
|
|
</html>
|