hdf5/doc/html/symtab

A number of issues involving caching of object header messages in
symbol table entries must be resolved.

What is the motivation for these changes?

   If we make objects completely independent of object name it allows
   us to refer to one object by multiple names (a concept called hard
   links in Unix file systems), which in turn provides an easy way to
   share data between datasets.

   Every object in an HDF5 file has a unique, constant object header
   address which serves as a handle (or OID) for the object.  The
   object header contains messages which describe the object.

   HDF5 allows some of the object header messages to be cached in
   symbol table entries so that the object header doesn't have to be
   read from disk.  For instance, an entry for a directory caches the
   directory disk addresses required to access that directory, so the
   object header for that directory is seldom read.

   If an object has multiple names (that is, a link count greater than
   one), then it has multiple symbol table entries which point to it.
   All symbol table entries must agree on header messages.  The
   current mechanism is to turn off the caching of header messages in
   symbol table entries when the header link count is more than one,
   and to allow caching once the link count returns to one.

   However, in the current implementation, a package is allowed to
   copy a symbol table entry and use it as a private cache for the
   object header.  This doesn't work for a number of reasons (all but
   one require a `delete symbol entry' operation).

      1. If two packages hold copies of the same symbol table entry,
         they don't notify each other of changes to the symbol table
         entry. Eventually, one package reads a cached message and
         gets the wrong value because the other package changed the
         message in the object header.

      2. If one package holds a copy of the symbol table entry and
         some other part of HDF5 removes the object and replaces it
         with some other object, then the original package will
         continue to access the non-existent object using the new
         object header.

      3. If one package holds a copy of the symbol table entry and
         some other part of HDF5 (re)moves the directory which
         contains the object, then the package will be unable to
         update the symbol table entry with the new cached
         data. Packages that refer to the object by the new name will
         use old cached data.


The basic problem is that there may be multiple copies of the object
symbol table entry floating around in the code when there should
really be at most one per hard link.

   Level 0: A copy may exist on disk as part of a symbol table node, which
            is a small 1d array of symbol table entries.

   Level 1: A copy may be cached in memory as part of a symbol table node
	    in the H5Gnode.c file by the H5AC layer.

   Level 2a: Another package may be holding a copy so it can perform
   	     fast lookup of any header messages that might be cached in
   	     the symbol table entry.  It can't point directly to the
   	     cached symbol table node because that node can dissappear
   	     at any time.

   Level 2b: Packages may hold more than one copy of a symbol table
             entry.  For instance, if H5D_open() is called twice for
             the same name, then two copies of the symbol table entry
             for the dataset exist in the H5D package.

How can level 2a and 2b be combined?

   If package data structures contained pointers to symbol table
   entries instead of copies of symbol table entries and if H5G
   allocated one symbol table entry per hard link, then it's trivial
   for Level 2a and 2b to benefit from one another's actions since
   they share the same cache.

How does this work conceptually?

   Level 2a and 2b must notify Level 1 of their intent to use (or stop
   using) a symbol table entry to access an object header.  The
   notification of the intent to access an object header is called
   `opening' the object and releasing the access is `closing' the
   object.

   Opening an object requires an object name which is used to locate
   the symbol table entry to use for caching of object header
   messages.  The return value is a handle for the object.  Figure 1
   shows the state after Dataset1 opens Object with a name that maps
   through Entry1.  The open request created a copy of Entry1 called
   Shadow1 which exists even if SymNode1 is preempted from the H5AC
   layer.

                                                     ______
                                            Object  /      \
	     SymNode1                     +--------+        |
	    +--------+            _____\  | Header |        |
	    |        |           /     /  +--------+        |
	    +--------+ +---------+                  \______/
	    | Entry1 | | Shadow1 | /____
	    +--------+ +---------+ \    \
	    :        :                   \
	    +--------+                    +----------+
					  | Dataset1 |
					  +----------+
			     FIGURE 1


  The SymNode1 can appear and disappear from the H5AC layer at any
  time without affecting the Object Header data cached in the Shadow.
  The rules are:

  * If the SymNode1 is present and is about to disappear and the
    Shadow1 dirty bit is set, then Shadow1 is copied over Entry1, the
    Entry1 dirty bit is set, and the Shadow1 dirty bit is cleared.

  * If something requests a copy of Entry1 (for a read-only peek
    request), and Shadow1 exists, then a copy (not pointer) of Shadow1
    is returned instead.

  * Entry1 cannot be deleted while Shadow1 exists.

  * Entry1 cannot change directly if Shadow1 exists since this means
    that some other package has opened the object and may be modifying
    it.  I haven't decided if it's useful to ever change Entry1
    directly (except of course within the H5G layer itself).

  * Shadow1 is created when Dataset1 `opens' the object through
    Entry1. Dataset1 is given a pointer to Shadow1 and Shadow1's
    reference count is incremented.

  * When Dataset1 `closes' the Object the Shadow1 reference count is
    decremented.  When the reference count reaches zero, if the
    Shadow1 dirty bit is set, then Shadow1's contents are copied to
    Entry1, and the Entry1 dirty bit is set. Shadow1 is then deleted
    if its reference count is zero.  This may require reading SymNode1
    back into the H5AC layer.

What happens when another Dataset opens the Object through Entry1?

  If the current state is represented by the top part of Figure 2,
  then Dataset2 will be given a pointer to Shadow1 and the Shadow1
  reference count will be incremented to two.  The Object header link
  count remains at one so Object Header messages continue to be cached
  by Shadow1. Dataset1 and Dataset2 benefit from one another
  actions. The resulting state is represented by Figure 2.

                                                     _____
             SymNode1                       Object  /     \
            +--------+            _____\  +--------+       |
            |        |           /     /  | Header |       |
            +--------+ +---------+        +--------+       |
            | Entry1 | | Shadow1 | /____            \_____/
            +--------+ +---------+ \    \
            :        :        _          \
            +--------+       |\           +----------+
                               \          | Dataset1 |
                                \________ +----------+
                                         \              \
                                          +----------+   |
                                          | Dataset2 |   |- New Dataset
                                          +----------+   |
                                                        /
			     FIGURE 2


What happens when the link count for Object increases while Dataset
has the Object open?

                                                     SymNode2
                                                    +--------+
    SymNode1                       Object           |        |
   +--------+             ____\  +--------+ /______ +--------+
   |        |            /    /  | header | \      `| Entry2 |
   +--------+ +---------+        +--------+         +--------+
   | Entry1 | | Shadow1 | /____                     :        :
   +--------+ +---------+ \    \                    +--------+
   :        :                   \
   +--------+                    +----------+   \________________/
                                 | Dataset1 |            |
                                 +----------+         New Link

			     FIGURE 3

  The current state is represented by the left part of Figure 3.  To
  create a new link the Object Header had to be located by traversing
  through Entry1/Shadow1.  On the way through, the Entry1/Shadow1
  cache is invalidated and the Object Header link count is
  incremented. Entry2 is then added to SymNode2.

  Since the Object Header link count is greater than one, Object
  header data will not be cached in Entry1/Shadow1.

  If the initial state had been all of Figure 3 and a third link is
  being added and Object is open by Entry1 and Entry2, then creation
  of the third link will invalidate the cache in Entry1 or Entry2.  It
  doesn't matter which since both caches are already invalidated
  anyway.

What happens if another Dataset opens the same object by another name?

  If the current state is represented by Figure 3, then a Shadow2 is
  created and associated with Entry2.  However, since the Object
  Header link count is more than one, nothing gets cached in Shadow2
  (or Shadow1).

What happens if the link count decreases?

  If the current state is represented by all of Figure 3 then it isn't
  possible to delete Entry1 because the object is currently open
  through that entry.  Therefore, the link count must have
  decreased because Entry2 was removed.

  As Dataset1 reads/writes messages in the Object header they will
  begin to be cached in Shadow1 again because the Object header link
  count is one.

What happens if the object is removed while it's open?

  That operation is not allowed.

What happens if the directory containing the object is deleted?

  That operation is not allowed since deleting the directory requires
  that the directory be empty.  The directory cannot be emptied
  because the open object cannot be removed from the directory.

What happens if the object is moved?

  Moving an object is a process consisting of creating a new
  hard-link with the new name and then deleting the old name.
  This will fail if the object is open.

What happens if the directory containing the entry is moved?

  The entry and the shadow still exist and are associated with one
  another.

What if a file is flushed or closed when objects are open?

  Flushing a symbol table with open objects writes correct information
  to the file since Shadow is copied to Entry before the table is
  flushed.

  Closing a file with open objects will create a valid file but will
  return failure.

How is the Shadow associated with the Entry?

  A symbol table is composed of one or more symbol nodes.  A node is a
  small 1-d array of symbol table entries.  The entries can move
  around within a node and from node-to-node as entries are added or
  removed from the symbol table and nodes can move around within a
  symbol table, being created and destroyed as necessary.

  Since a symbol table has an object header with a unique and constant
  file offset, and since H5G contains code to efficiently locate a
  symbol table entry given it's name, we use these two values as a key
  within a shadow to associate the shadow with the symbol table
  entry.

	struct H5G_shadow_t {
	   haddr_t	stab_addr;    /*symbol table header address*/
	   char         *name;	      /*entry name wrt symbol table*/
           hbool_t      dirty;	      /*out-of-date wrt stab entry?*/
	   H5G_entry_t  ent;	      /*my copy of stab entry      */
	   H5G_entry_t  *main;	      /*the level 1 entry or null  */
           H5G_shadow_t *next, *prev; /*other shadows for this stab*/
      	};

  The set of shadows will be organized in a hash table of linked
  lists.  Each linked list will contain the shadows associated with a
  particular symbol table header address and the list will be sorted
  lexicographically.

  Also, each Entry will have a pointer to the corresponding Shadow or
  null if there is no shadow.

  When a symbol table node is loaded into the main cache, we look up
  the linked list of shadows in the shadow hash table based on the
  address of the symbol table object header.  We then traverse that
  list matching shadows with symbol table entries.

  We assume that opening/closing objects will be a relatively
  infrequent event compared with loading/flushing symbol table
  nodes. Therefore, if we keep the linked list of shadows sorted it
  costs O(N) to open and close objects where N is the number of open
  objects in that symbol table (instead of O(1)) but it costs only
  O(N) to load a symbol table node (instead of O(N^2)).

What about the root symbol entry?

  Level 1 storage for the root symbol entry is always available since
  it's stored in the hdf5_file_t struct instead of a symbol table
  node.  However, the contents of that entry can move from the file
  handle to a symbol table node by H5G_mkroot().  Therefore, if the
  root object is opened, we keep a shadow entry for it whose
  `stab_addr' field is zero and whose `name' is null.

  For this reason, the root object should always be read through the
  H5G interface.

One more key invariant:  The H5O_STAB message in a symbol table header
never changes.  This allows symbol table entries to cache the H5O_STAB
message for the symbol table to which it points without worrying about
whether the cache will ever be invalidated.