mirror of
https://github.com/HDFGroup/hdf5.git
synced 2024-12-03 02:32:04 +08:00
253 lines
9.8 KiB
HTML
253 lines
9.8 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
|
|
<html>
|
|
<head>
|
|
<title>Group Examples</title>
|
|
</head>
|
|
<body>
|
|
<center><h1>Group Examples</h1></center>
|
|
|
|
<hr>
|
|
<h1>Background</h1>
|
|
|
|
<p>Directories (or now <i>Groups</i>) are currently implemented as
|
|
a directed graph with a single entry point into the graph which
|
|
is the <i>Root Object</i>. The root object is usually a
|
|
group. All objects have at least one predecessor (the <i>Root
|
|
Object</i> always has the HDF5 file super block as a
|
|
predecessor). The number of predecessors of a group is also
|
|
known as the <i>hard link count</i> or just <i>link count</i>.
|
|
Unlike Unix directories, HDF5 groups have no ".." entry since
|
|
any group can have multiple predecessors. Given the handle or
|
|
id of some object and returning a full name for that object
|
|
would be an expensive graph traversal.
|
|
|
|
<p>A special optimization is that a file may contain a single
|
|
non-group object and no group(s). The object has one
|
|
predecessor which is the file super block. However, once a root
|
|
group is created it never dissappears (although I suppose it
|
|
could if we wanted).
|
|
|
|
<p>A special object called a <i>Symbolic Link</i> is simply a
|
|
name. Usually the name refers to some (other) object, but that
|
|
object need not exist. Symbolic links in HDF5 will have the
|
|
same semantics as symbolic links in Unix.
|
|
|
|
<p>The symbol table graph contains "entries" for each name. An
|
|
entry contains the file address for the object header and
|
|
possibly certain messages cached from the object header.
|
|
|
|
<p>The H5G package understands the notion of <i>opening</i> and object
|
|
which means that given the name of the object, a handle to the
|
|
object is returned (this isn't an API function). Objects can be
|
|
opened multiple times simultaneously through the same name or,
|
|
if the object has hard links, through other names. The name of
|
|
an object cannot be removed from a group if the object is opened
|
|
through that group (although the name can change within the
|
|
group).
|
|
|
|
<p>Below the API, object attributes can be read without opening
|
|
the object; object attributes cannot change without first
|
|
opening that object. The one exception is that the contents of a
|
|
group can change without opening the group.
|
|
|
|
<hr>
|
|
<h1>Building a hierarchy from a flat namespace</h1>
|
|
|
|
<p>Assuming we have a flat name space (that is, the root object is
|
|
a group which contains names for all other objects in the file
|
|
and none of those objects are groups), then we can build a
|
|
hierarchy of groups that also refer to the objects.
|
|
|
|
<p>The file initially contains `foo' `bar' `baz' in the root
|
|
group. We wish to add groups `grp1' and `grp2' so that `grp1'
|
|
contains objects `foo' and `baz' and `grp2' contains objects
|
|
`bar' and `baz' (so `baz' appears in both groups).
|
|
|
|
<p>In either case below, one might want to move the flat objects
|
|
into some other group (like `flat') so their names don't
|
|
interfere with the rest of the hierarchy (or move the hierarchy
|
|
into a directory called `/hierarchy').
|
|
|
|
<h2>with symbolic links</h2>
|
|
|
|
<p>Create group `grp1' and add symbolic links called `foo' whose
|
|
value is `/foo' and `baz' whose value is `/baz'. Similarly for
|
|
`grp2'.
|
|
|
|
<p>Accessing `grp1/foo' involves searching the root group for
|
|
the name `grp1', then searching that group for `foo', then
|
|
searching the root directory for `foo'. Alternatively, one
|
|
could change working groups to the grp1 group and then ask for
|
|
`foo' which searches `grp1' for the name `foo', then searches
|
|
the root group for the name `foo'.
|
|
|
|
<p>Deleting `/grp1/foo' deletes the symbolic link without
|
|
affecting the `/foo' object. Deleting `/foo' leaves the
|
|
`/grp1/foo' link dangling.
|
|
|
|
<h2>with hard links</h2>
|
|
|
|
<p>Creating the hierarchy is the same as with symbolic links.
|
|
|
|
<p>Accessing `/grp1/foo' searches the root group for the name
|
|
`grp1', then searches that group for the name `foo'. If the
|
|
current working group is `/grp1' then we just search for the
|
|
name `foo'.
|
|
|
|
<p>Deleting `/grp1/foo' leaves `/foo' and vice versa.
|
|
|
|
<h2>the code</h2>
|
|
|
|
<p>Depending on the eventual API...
|
|
|
|
<code><pre>
|
|
H5Gcreate (file_id, "/grp1");
|
|
H5Glink (file_id, H5G_HARD, "/foo", "/grp1/foo");
|
|
</pre></code>
|
|
|
|
or
|
|
|
|
<code><pre>
|
|
group_id = H5Gcreate (root_id, "grp1");
|
|
H5Glink (file_id, H5G_HARD, root_id, "foo", group_id, "foo");
|
|
H5Gclose (group_id);
|
|
</pre></code>
|
|
|
|
|
|
<hr>
|
|
<h1>Building a flat namespace from a hierarchy</h1>
|
|
|
|
<p>Similar to abvoe, but in this case we have to watch out that
|
|
we don't get two names which are the same: what happens to
|
|
`/grp1/baz' and `/grp2/baz'? If they really refer to the same
|
|
object then we just have `/baz', but if they point to two
|
|
different objects what happens?
|
|
|
|
<p>The other thing to watch out for cycles in the graph when we
|
|
traverse it to build the flat namespace.
|
|
|
|
<hr>
|
|
<h1>Listing the Group Contents</h1>
|
|
|
|
<p>Two things to watch out for are that the group contents don't
|
|
appear to change in a manner which would confuse the
|
|
application, and that listing everything in a group is as
|
|
efficient as possible.
|
|
|
|
<h2>Method A</h2>
|
|
|
|
<p>Query the number of things in a group and then query each item
|
|
by index. A trivial implementation would be O(n*n) and wouldn't
|
|
protect the caller from changes to the directory which move
|
|
entries around and therefore change their indices.
|
|
|
|
<code><pre>
|
|
n = H5GgetNumContents (group_id);
|
|
for (i=0; i<n; i++) {
|
|
H5GgetNameByIndex (group_id, i, ...); /*don't worry about args yet*/
|
|
}
|
|
</pre></code>
|
|
|
|
<h2>Method B</h2>
|
|
|
|
<p>The API contains a single function that reads all information
|
|
from the specified group and returns that info through an array.
|
|
The caller is responsible for freeing the array allocated by the
|
|
query and the things to which it points. This also makes it
|
|
clear the the returned value is a snapshot of the group which
|
|
doesn't change if the group is modified.
|
|
|
|
<code><pre>
|
|
n = H5Glist (file_id, "/grp1", info, ...);
|
|
for (i=0; i<n; i++) {
|
|
printf ("name = %s\n", info[i].name);
|
|
free (info[i].name); /*and maybe other fields too?*/
|
|
}
|
|
free (info);
|
|
</pre></code>
|
|
|
|
Notice that it would be difficult to expand the info struct since
|
|
its definition is part of the API.
|
|
|
|
<h2>Method C</h2>
|
|
|
|
<p>The caller asks for a snapshot of the group and then accesses
|
|
items in the snapshot through various query-by-index API
|
|
functions. When finished, the caller notifies the library that
|
|
it's done with the snapshot. The word "snapshot" makes it clear
|
|
that subsequent changes to the directory will not be reflected in
|
|
the shapshot_id.
|
|
|
|
<code><pre>
|
|
snapshot_id = H5Gsnapshot (group_id); /*or perhaps group_name */
|
|
n = H5GgetNumContents (snapshot_id);
|
|
for (i=0; i<n; i++) {
|
|
H5GgetNameByIndex (shapshot_id, i, ...);
|
|
}
|
|
H5Grelease (shapshot_id);
|
|
</pre></code>
|
|
|
|
In fact, we could allow the user to leave off the H5Gsnapshot and
|
|
H5Grelease and use group_id in the H5GgetNumContents and
|
|
H5GgetNameByIndex so they can choose between Method A and Method
|
|
C.
|
|
|
|
<hr>
|
|
<h1>An implementation of Method C</h1>
|
|
|
|
<dl>
|
|
<dt><code>hid_t H5Gshapshot (hid_t group_id)</code>
|
|
<dd>Opens every object in the specified group and stores the
|
|
handles in an array managed by the library (linear-time
|
|
operation). Open object handles are essentialy symbol table
|
|
entries with a little extra info (symbol table entries cache
|
|
certain things about the object which are also found in the
|
|
object header). Because the objects are open (A) they cannot be
|
|
removed from the group, (B) querying the object returns the
|
|
latest info even if something else has that object open, (C)
|
|
if the object is renamed within the group then its name with
|
|
<code>H5GgetNameByIndex</code> is changed. Adding new entries
|
|
to a group doesn't affect the snapshot.
|
|
|
|
<dt><code>char *H5GgetNameByIndex (hid_t shapshot_id, int
|
|
index)</code>
|
|
<dd>Uses the open object handle from entry <code>index</code> of
|
|
the snapshot array to get the object name. This is a
|
|
constant-time operation. The name is updated automatically if
|
|
the object is renamed within the group.
|
|
|
|
<dt><code>H5Gget<whatever>ByIndex...()</code>
|
|
<dd>Uses the open object handle from entry <code>index</code>,
|
|
which is just a symbol table entry, and reads the appropriate
|
|
object header message(s) which might be cached in the symbol
|
|
table entry. This is a constant-time operation if cached,
|
|
linear in the number of messages if not cached.
|
|
|
|
<dt><code>H5Grelease (hid_t snapshot_id)</code>
|
|
<dd>Closes each object refered to by the snapshot and then frees
|
|
the snapshot array. This is a linear-time operation.
|
|
</dl>
|
|
|
|
<hr>
|
|
<h1>To return <code>char*</code> or some HDF5 string type.</h1>
|
|
|
|
<p>In either case, the caller has to release resources associated
|
|
with the return value, calling free() or some HDF5 function.
|
|
|
|
<p>Names in the current implementation of the H5G package don't
|
|
contain embedded null characters and are always null terminated.
|
|
|
|
<p>Eventually the caller probably wants a <code>char*</code> so it
|
|
can pass it to some non-HDF5 function, does that require
|
|
strdup'ing the string again? Then the caller has to free() the
|
|
the char* <i>and</i> release the DHF5 string.
|
|
|
|
<hr>
|
|
<address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
|
|
<!-- Created: Fri Sep 26 12:03:20 EST 1997 -->
|
|
<!-- hhmts start -->
|
|
Last modified: Fri Oct 3 09:32:10 EST 1997
|
|
<!-- hhmts end -->
|
|
</body>
|
|
</html>
|