mirror of
https://github.com/HDFGroup/hdf5.git
synced 2025-01-12 15:04:59 +08:00
314 lines
14 KiB
Plaintext
314 lines
14 KiB
Plaintext
A number of issues involving caching of object header messages in
|
|
symbol table entries must be resolved.
|
|
|
|
What is the motivation for these changes?
|
|
|
|
If we make objects completely independent of object name it allows
|
|
us to refer to one object by multiple names (a concept called hard
|
|
links in Unix file systems), which in turn provides an easy way to
|
|
share data between datasets.
|
|
|
|
Every object in an HDF5 file has a unique, constant object header
|
|
address which serves as a handle (or OID) for the object. The
|
|
object header contains messages which describe the object.
|
|
|
|
HDF5 allows some of the object header messages to be cached in
|
|
symbol table entries so that the object header doesn't have to be
|
|
read from disk. For instance, an entry for a directory caches the
|
|
directory disk addresses required to access that directory, so the
|
|
object header for that directory is seldom read.
|
|
|
|
If an object has multiple names (that is, a link count greater than
|
|
one), then it has multiple symbol table entries which point to it.
|
|
All symbol table entries must agree on header messages. The
|
|
current mechanism is to turn off the caching of header messages in
|
|
symbol table entries when the header link count is more than one,
|
|
and to allow caching once the link count returns to one.
|
|
|
|
However, in the current implementation, a package is allowed to
|
|
copy a symbol table entry and use it as a private cache for the
|
|
object header. This doesn't work for a number of reasons (all but
|
|
one require a `delete symbol entry' operation).
|
|
|
|
1. If two packages hold copies of the same symbol table entry,
|
|
they don't notify each other of changes to the symbol table
|
|
entry. Eventually, one package reads a cached message and
|
|
gets the wrong value because the other package changed the
|
|
message in the object header.
|
|
|
|
2. If one package holds a copy of the symbol table entry and
|
|
some other part of HDF5 removes the object and replaces it
|
|
with some other object, then the original package will
|
|
continue to access the non-existent object using the new
|
|
object header.
|
|
|
|
3. If one package holds a copy of the symbol table entry and
|
|
some other part of HDF5 (re)moves the directory which
|
|
contains the object, then the package will be unable to
|
|
update the symbol table entry with the new cached
|
|
data. Packages that refer to the object by the new name will
|
|
use old cached data.
|
|
|
|
|
|
The basic problem is that there may be multiple copies of the object
|
|
symbol table entry floating around in the code when there should
|
|
really be at most one per hard link.
|
|
|
|
Level 0: A copy may exist on disk as part of a symbol table node, which
|
|
is a small 1d array of symbol table entries.
|
|
|
|
Level 1: A copy may be cached in memory as part of a symbol table node
|
|
in the H5Gnode.c file by the H5AC layer.
|
|
|
|
Level 2a: Another package may be holding a copy so it can perform
|
|
fast lookup of any header messages that might be cached in
|
|
the symbol table entry. It can't point directly to the
|
|
cached symbol table node because that node can dissappear
|
|
at any time.
|
|
|
|
Level 2b: Packages may hold more than one copy of a symbol table
|
|
entry. For instance, if H5D_open() is called twice for
|
|
the same name, then two copies of the symbol table entry
|
|
for the dataset exist in the H5D package.
|
|
|
|
How can level 2a and 2b be combined?
|
|
|
|
If package data structures contained pointers to symbol table
|
|
entries instead of copies of symbol table entries and if H5G
|
|
allocated one symbol table entry per hard link, then it's trivial
|
|
for Level 2a and 2b to benefit from one another's actions since
|
|
they share the same cache.
|
|
|
|
How does this work conceptually?
|
|
|
|
Level 2a and 2b must notify Level 1 of their intent to use (or stop
|
|
using) a symbol table entry to access an object header. The
|
|
notification of the intent to access an object header is called
|
|
`opening' the object and releasing the access is `closing' the
|
|
object.
|
|
|
|
Opening an object requires an object name which is used to locate
|
|
the symbol table entry to use for caching of object header
|
|
messages. The return value is a handle for the object. Figure 1
|
|
shows the state after Dataset1 opens Object with a name that maps
|
|
through Entry1. The open request created a copy of Entry1 called
|
|
Shadow1 which exists even if SymNode1 is preempted from the H5AC
|
|
layer.
|
|
|
|
______
|
|
Object / \
|
|
SymNode1 +--------+ |
|
|
+--------+ _____\ | Header | |
|
|
| | / / +--------+ |
|
|
+--------+ +---------+ \______/
|
|
| Entry1 | | Shadow1 | /____
|
|
+--------+ +---------+ \ \
|
|
: : \
|
|
+--------+ +----------+
|
|
| Dataset1 |
|
|
+----------+
|
|
FIGURE 1
|
|
|
|
|
|
|
|
The SymNode1 can appear and disappear from the H5AC layer at any
|
|
time without affecting the Object Header data cached in the Shadow.
|
|
The rules are:
|
|
|
|
* If the SymNode1 is present and is about to disappear and the
|
|
Shadow1 dirty bit is set, then Shadow1 is copied over Entry1, the
|
|
Entry1 dirty bit is set, and the Shadow1 dirty bit is cleared.
|
|
|
|
* If something requests a copy of Entry1 (for a read-only peek
|
|
request), and Shadow1 exists, then a copy (not pointer) of Shadow1
|
|
is returned instead.
|
|
|
|
* Entry1 cannot be deleted while Shadow1 exists.
|
|
|
|
* Entry1 cannot change directly if Shadow1 exists since this means
|
|
that some other package has opened the object and may be modifying
|
|
it. I haven't decided if it's useful to ever change Entry1
|
|
directly (except of course within the H5G layer itself).
|
|
|
|
* Shadow1 is created when Dataset1 `opens' the object through
|
|
Entry1. Dataset1 is given a pointer to Shadow1 and Shadow1's
|
|
reference count is incremented.
|
|
|
|
* When Dataset1 `closes' the Object the Shadow1 reference count is
|
|
decremented. When the reference count reaches zero, if the
|
|
Shadow1 dirty bit is set, then Shadow1's contents are copied to
|
|
Entry1, and the Entry1 dirty bit is set. Shadow1 is then deleted
|
|
if its reference count is zero. This may require reading SymNode1
|
|
back into the H5AC layer.
|
|
|
|
What happens when another Dataset opens the Object through Entry1?
|
|
|
|
If the current state is represented by the top part of Figure 2,
|
|
then Dataset2 will be given a pointer to Shadow1 and the Shadow1
|
|
reference count will be incremented to two. The Object header link
|
|
count remains at one so Object Header messages continue to be cached
|
|
by Shadow1. Dataset1 and Dataset2 benefit from one another
|
|
actions. The resulting state is represented by Figure 2.
|
|
|
|
_____
|
|
SymNode1 Object / \
|
|
+--------+ _____\ +--------+ |
|
|
| | / / | Header | |
|
|
+--------+ +---------+ +--------+ |
|
|
| Entry1 | | Shadow1 | /____ \_____/
|
|
+--------+ +---------+ \ \
|
|
: : _ \
|
|
+--------+ |\ +----------+
|
|
\ | Dataset1 |
|
|
\________ +----------+
|
|
\ \
|
|
+----------+ |
|
|
| Dataset2 | |- New Dataset
|
|
+----------+ |
|
|
/
|
|
FIGURE 2
|
|
|
|
|
|
What happens when the link count for Object increases while Dataset
|
|
has the Object open?
|
|
|
|
SymNode2
|
|
+--------+
|
|
SymNode1 Object | |
|
|
+--------+ ____\ +--------+ /______ +--------+
|
|
| | / / | header | \ `| Entry2 |
|
|
+--------+ +---------+ +--------+ +--------+
|
|
| Entry1 | | Shadow1 | /____ : :
|
|
+--------+ +---------+ \ \ +--------+
|
|
: : \
|
|
+--------+ +----------+ \________________/
|
|
| Dataset1 | |
|
|
+----------+ New Link
|
|
|
|
FIGURE 3
|
|
|
|
The current state is represented by the left part of Figure 3. To
|
|
create a new link the Object Header had to be located by traversing
|
|
through Entry1/Shadow1. On the way through, the Entry1/Shadow1
|
|
cache is invalidated and the Object Header link count is
|
|
incremented. Entry2 is then added to SymNode2.
|
|
|
|
Since the Object Header link count is greater than one, Object
|
|
header data will not be cached in Entry1/Shadow1.
|
|
|
|
If the initial state had been all of Figure 3 and a third link is
|
|
being added and Object is open by Entry1 and Entry2, then creation
|
|
of the third link will invalidate the cache in Entry1 or Entry2. It
|
|
doesn't matter which since both caches are already invalidated
|
|
anyway.
|
|
|
|
What happens if another Dataset opens the same object by another name?
|
|
|
|
If the current state is represented by Figure 3, then a Shadow2 is
|
|
created and associated with Entry2. However, since the Object
|
|
Header link count is more than one, nothing gets cached in Shadow2
|
|
(or Shadow1).
|
|
|
|
What happens if the link count decreases?
|
|
|
|
If the current state is represented by all of Figure 3 then it isn't
|
|
possible to delete Entry1 because the object is currently open
|
|
through that entry. Therefore, the link count must have
|
|
decreased because Entry2 was removed.
|
|
|
|
As Dataset1 reads/writes messages in the Object header they will
|
|
begin to be cached in Shadow1 again because the Object header link
|
|
count is one.
|
|
|
|
What happens if the object is removed while it's open?
|
|
|
|
That operation is not allowed.
|
|
|
|
What happens if the directory containing the object is deleted?
|
|
|
|
That operation is not allowed since deleting the directory requires
|
|
that the directory be empty. The directory cannot be emptied
|
|
because the open object cannot be removed from the directory.
|
|
|
|
What happens if the object is moved?
|
|
|
|
Moving an object is a process consisting of creating a new
|
|
hard-link with the new name and then deleting the old name.
|
|
This will fail if the object is open.
|
|
|
|
What happens if the directory containing the entry is moved?
|
|
|
|
The entry and the shadow still exist and are associated with one
|
|
another.
|
|
|
|
What if a file is flushed or closed when objects are open?
|
|
|
|
Flushing a symbol table with open objects writes correct information
|
|
to the file since Shadow is copied to Entry before the table is
|
|
flushed.
|
|
|
|
Closing a file with open objects will create a valid file but will
|
|
return failure.
|
|
|
|
How is the Shadow associated with the Entry?
|
|
|
|
A symbol table is composed of one or more symbol nodes. A node is a
|
|
small 1-d array of symbol table entries. The entries can move
|
|
around within a node and from node-to-node as entries are added or
|
|
removed from the symbol table and nodes can move around within a
|
|
symbol table, being created and destroyed as necessary.
|
|
|
|
Since a symbol table has an object header with a unique and constant
|
|
file offset, and since H5G contains code to efficiently locate a
|
|
symbol table entry given it's name, we use these two values as a key
|
|
within a shadow to associate the shadow with the symbol table
|
|
entry.
|
|
|
|
struct H5G_shadow_t {
|
|
haddr_t stab_addr; /*symbol table header address*/
|
|
char *name; /*entry name wrt symbol table*/
|
|
hbool_t dirty; /*out-of-date wrt stab entry?*/
|
|
H5G_entry_t ent; /*my copy of stab entry */
|
|
H5G_entry_t *main; /*the level 1 entry or null */
|
|
H5G_shadow_t *next, *prev; /*other shadows for this stab*/
|
|
};
|
|
|
|
The set of shadows will be organized in a hash table of linked
|
|
lists. Each linked list will contain the shadows associated with a
|
|
particular symbol table header address and the list will be sorted
|
|
lexicographically.
|
|
|
|
Also, each Entry will have a pointer to the corresponding Shadow or
|
|
null if there is no shadow.
|
|
|
|
When a symbol table node is loaded into the main cache, we look up
|
|
the linked list of shadows in the shadow hash table based on the
|
|
address of the symbol table object header. We then traverse that
|
|
list matching shadows with symbol table entries.
|
|
|
|
We assume that opening/closing objects will be a relatively
|
|
infrequent event compared with loading/flushing symbol table
|
|
nodes. Therefore, if we keep the linked list of shadows sorted it
|
|
costs O(N) to open and close objects where N is the number of open
|
|
objects in that symbol table (instead of O(1)) but it costs only
|
|
O(N) to load a symbol table node (instead of O(N^2)).
|
|
|
|
What about the root symbol entry?
|
|
|
|
Level 1 storage for the root symbol entry is always available since
|
|
it's stored in the hdf5_file_t struct instead of a symbol table
|
|
node. However, the contents of that entry can move from the file
|
|
handle to a symbol table node by H5G_mkroot(). Therefore, if the
|
|
root object is opened, we keep a shadow entry for it whose
|
|
`stab_addr' field is zero and whose `name' is null.
|
|
|
|
For this reason, the root object should always be read through the
|
|
H5G interface.
|
|
|
|
One more key invariant: The H5O_STAB message in a symbol table header
|
|
never changes. This allows symbol table entries to cache the H5O_STAB
|
|
message for the symbol table to which it points without worrying about
|
|
whether the cache will ever be invalidated.
|
|
|
|
|