hdf5/doc/html/review1a.html
1998-07-08 09:54:54 -05:00

253 lines
9.8 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<title>Group Examples</title>
</head>
<body>
<center><h1>Group Examples</h1></center>
<hr>
<h1>Background</h1>
<p>Directories (or now <i>Groups</i>) are currently implemented as
a directed graph with a single entry point into the graph which
is the <i>Root Object</i>. The root object is usually a
group. All objects have at least one predecessor (the <i>Root
Object</i> always has the HDF5 file boot block as a
predecessor). The number of predecessors of a group is also
known as the <i>hard link count</i> or just <i>link count</i>.
Unlike Unix directories, HDF5 groups have no ".." entry since
any group can have multiple predecessors. Given the handle or
id of some object and returning a full name for that object
would be an expensive graph traversal.
<p>A special optimization is that a file may contain a single
non-group object and no group(s). The object has one
predecessor which is the file boot block. However, once a root
group is created it never dissappears (although I suppose it
could if we wanted).
<p>A special object called a <i>Symbolic Link</i> is simply a
name. Usually the name refers to some (other) object, but that
object need not exist. Symbolic links in HDF5 will have the
same semantics as symbolic links in Unix.
<p>The symbol table graph contains "entries" for each name. An
entry contains the file address for the object header and
possibly certain messages cached from the object header.
<p>The H5G package understands the notion of <i>opening</i> and object
which means that given the name of the object, a handle to the
object is returned (this isn't an API function). Objects can be
opened multiple times simultaneously through the same name or,
if the object has hard links, through other names. The name of
an object cannot be removed from a group if the object is opened
through that group (although the name can change within the
group).
<p>Below the API, object attributes can be read without opening
the object; object attributes cannot change without first
opening that object. The one exception is that the contents of a
group can change without opening the group.
<hr>
<h1>Building a hierarchy from a flat namespace</h1>
<p>Assuming we have a flat name space (that is, the root object is
a group which contains names for all other objects in the file
and none of those objects are groups), then we can build a
hierarchy of groups that also refer to the objects.
<p>The file initially contains `foo' `bar' `baz' in the root
group. We wish to add groups `grp1' and `grp2' so that `grp1'
contains objects `foo' and `baz' and `grp2' contains objects
`bar' and `baz' (so `baz' appears in both groups).
<p>In either case below, one might want to move the flat objects
into some other group (like `flat') so their names don't
interfere with the rest of the hierarchy (or move the hierarchy
into a directory called `/hierarchy').
<h2>with symbolic links</h2>
<p>Create group `grp1' and add symbolic links called `foo' whose
value is `/foo' and `baz' whose value is `/baz'. Similarly for
`grp2'.
<p>Accessing `grp1/foo' involves searching the root group for
the name `grp1', then searching that group for `foo', then
searching the root directory for `foo'. Alternatively, one
could change working groups to the grp1 group and then ask for
`foo' which searches `grp1' for the name `foo', then searches
the root group for the name `foo'.
<p>Deleting `/grp1/foo' deletes the symbolic link without
affecting the `/foo' object. Deleting `/foo' leaves the
`/grp1/foo' link dangling.
<h2>with hard links</h2>
<p>Creating the hierarchy is the same as with symbolic links.
<p>Accessing `/grp1/foo' searches the root group for the name
`grp1', then searches that group for the name `foo'. If the
current working group is `/grp1' then we just search for the
name `foo'.
<p>Deleting `/grp1/foo' leaves `/foo' and vice versa.
<h2>the code</h2>
<p>Depending on the eventual API...
<code><pre>
H5Gcreate (file_id, "/grp1");
H5Glink (file_id, H5G_HARD, "/foo", "/grp1/foo");
</pre></code>
or
<code><pre>
group_id = H5Gcreate (root_id, "grp1");
H5Glink (file_id, H5G_HARD, root_id, "foo", group_id, "foo");
H5Gclose (group_id);
</pre></code>
<hr>
<h1>Building a flat namespace from a hierarchy</h1>
<p>Similar to abvoe, but in this case we have to watch out that
we don't get two names which are the same: what happens to
`/grp1/baz' and `/grp2/baz'? If they really refer to the same
object then we just have `/baz', but if they point to two
different objects what happens?
<p>The other thing to watch out for cycles in the graph when we
traverse it to build the flat namespace.
<hr>
<h1>Listing the Group Contents</h1>
<p>Two things to watch out for are that the group contents don't
appear to change in a manner which would confuse the
application, and that listing everything in a group is as
efficient as possible.
<h2>Method A</h2>
<p>Query the number of things in a group and then query each item
by index. A trivial implementation would be O(n*n) and wouldn't
protect the caller from changes to the directory which move
entries around and therefore change their indices.
<code><pre>
n = H5GgetNumContents (group_id);
for (i=0; i&lt;n; i++) {
H5GgetNameByIndex (group_id, i, ...); /*don't worry about args yet*/
}
</pre></code>
<h2>Method B</h2>
<p>The API contains a single function that reads all information
from the specified group and returns that info through an array.
The caller is responsible for freeing the array allocated by the
query and the things to which it points. This also makes it
clear the the returned value is a snapshot of the group which
doesn't change if the group is modified.
<code><pre>
n = H5Glist (file_id, "/grp1", info, ...);
for (i=0; i&lt;n; i++) {
printf ("name = %s\n", info[i].name);
free (info[i].name); /*and maybe other fields too?*/
}
free (info);
</pre></code>
Notice that it would be difficult to expand the info struct since
its definition is part of the API.
<h2>Method C</h2>
<p>The caller asks for a snapshot of the group and then accesses
items in the snapshot through various query-by-index API
functions. When finished, the caller notifies the library that
it's done with the snapshot. The word "snapshot" makes it clear
that subsequent changes to the directory will not be reflected in
the shapshot_id.
<code><pre>
snapshot_id = H5Gsnapshot (group_id); /*or perhaps group_name */
n = H5GgetNumContents (snapshot_id);
for (i=0; i&lt;n; i++) {
H5GgetNameByIndex (shapshot_id, i, ...);
}
H5Grelease (shapshot_id);
</pre></code>
In fact, we could allow the user to leave off the H5Gsnapshot and
H5Grelease and use group_id in the H5GgetNumContents and
H5GgetNameByIndex so they can choose between Method A and Method
C.
<hr>
<h1>An implementation of Method C</h1>
<dl>
<dt><code>hid_t H5Gshapshot (hid_t group_id)</code>
<dd>Opens every object in the specified group and stores the
handles in an array managed by the library (linear-time
operation). Open object handles are essentialy symbol table
entries with a little extra info (symbol table entries cache
certain things about the object which are also found in the
object header). Because the objects are open (A) they cannot be
removed from the group, (B) querying the object returns the
latest info even if something else has that object open, (C)
if the object is renamed within the group then its name with
<code>H5GgetNameByIndex</code> is changed. Adding new entries
to a group doesn't affect the snapshot.
<dt><code>char *H5GgetNameByIndex (hid_t shapshot_id, int
index)</code>
<dd>Uses the open object handle from entry <code>index</code> of
the snapshot array to get the object name. This is a
constant-time operation. The name is updated automatically if
the object is renamed within the group.
<dt><code>H5Gget&lt;whatever&gt;ByIndex...()</code>
<dd>Uses the open object handle from entry <code>index</code>,
which is just a symbol table entry, and reads the appropriate
object header message(s) which might be cached in the symbol
table entry. This is a constant-time operation if cached,
linear in the number of messages if not cached.
<dt><code>H5Grelease (hid_t snapshot_id)</code>
<dd>Closes each object refered to by the snapshot and then frees
the snapshot array. This is a linear-time operation.
</dl>
<hr>
<h1>To return <code>char*</code> or some HDF5 string type.</h1>
<p>In either case, the caller has to release resources associated
with the return value, calling free() or some HDF5 function.
<p>Names in the current implementation of the H5G package don't
contain embedded null characters and are always null terminated.
<p>Eventually the caller probably wants a <code>char*</code> so it
can pass it to some non-HDF5 function, does that require
strdup'ing the string again? Then the caller has to free() the
the char* <i>and</i> release the DHF5 string.
<hr>
<address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
<!-- Created: Fri Sep 26 12:03:20 EST 1997 -->
<!-- hhmts start -->
Last modified: Fri Oct 3 09:32:10 EST 1997
<!-- hhmts end -->
</body>
</html>