Straighten out some convoluted code when links were being deleted, which
could cause the "delete" callback for user-defined links to not get called when
the group they were in was deleted.
Had to compromise on the "delete" callback though - only calls the callback
with the ID for the file the link is in, instead of the group, since the group
is being held open upstream in the calling sequence during a group deletion and
this prevents a group and its ID from being created. (This could possibly be
worked around, but would cause a fair bit of havoc in the code and I'm not
entirely certain it's worth it...)
Tested on:
Linux/32 2.6 (chicago)
Move compact storage routines into separate file, to make room for internal
routines that operate on links within group operations.
Tested on:
Linux/32 2.6 (chicago)
Finished implementation of H5Lget_info_by_idx for all cases: old vs. new
group formats, compact vs. dense new link storage, increasing vs. decreasing
vs. native iteration order.
Also, refactor symbol table "foo by index" routines to be more generic
and share more code by using a single B-tree iteration callback which makes
callbacks to a specific "get <foo>" callback.
Tested on:
Mac OS X/32 10.4.8 (amazon)
FreeBSD/32 4.11 (sleipnir)
Linux/32 2.4 (heping)
Linux/64 2.4 (mir)
AIX/32 5.? (copper)
Implement basic framework for H5Lget_info_by_idx and get it working for
creation order indices on compact groups.
Clean up code a bit.
Close resource link in user-defined link traversal.
Tested on:
Linux/32 2.6 (chicago)
Finish internal work necessary to track creation order in v2 B-tree when
group is in "dense" storage form.
Tested on:
Linux/32 2.6 (chicago)
Linux/64 2.6 (chicago2)
File format is not stable, don't keep files produced!
Description:
First stage of checkins modifying the format of groups to support creation
order. Implement "dense" storage for links in groups.
Try to clarify some of the symbols for the H5L API.
Add the H5Pset_latest_format() flag for FAPLs, to choose to use the newest
file format options (including "dense" link storage in groups)
Add the H5Pset_track_creation_order() flag for GCPLs, to enable creation
order tracking in groups (although no index on creation order yet).
Remove --enable-group-revision configure flag, as file format issues are
now handled in a backwardly/forwardly compatible way.
Clean up lots of compiler warnings and other minor formatting issues.
Tested on:
FreeBSD/32 4.11 (sleipnir) w/threadsafe
Linux/32 2.4 (heping) w/FORTRAN & C++
Linux/64 2.4 (mir) w/enable-v1.6 compa
Mac OSX/32 10.4.8 (amazon)
AIX 5.3 (copper) w/parallel & FORTRAN
Review, revise & checkin in Peter's latest round of object copy changes,
which add basic support for datasets & attributes with reference datatypes.
Tested on:
Linux/32 2.6 (chicago)
Linux/64 2.6 (chicago2)
Object header locations (H5O_loc_t's) can now "hold open" a file and
decrement its open object count when they close. This means that
locations (H5O_loc_t's and H5G_loc_t's) should always be freed.
Added more thorough tests to ensure that external files are closed.
Users can create external links using H5L_create_external(). These links
point to an object in another HDF5 file. Users can alter the behavior of
external links or create new kinds of links by registering callbacks
using the H5L interface.
Added tests, tools support, etc.
Also a number of other, minor changes have been made (some restructuring of
the H5L interface, for instance).
Additional documentation and examples are forthcoming.
Feature
Description:
Revised Link APIs.
Solution:
New link APIs use H5L*
H5*create_expand do not create links to the objects created; this must
be done manually with H5Llink.
Added APIs to link an object given its ID (H5Llink), to copy links (H5Lcopy),
and changed creation APIs (H5Lcreate_hard and H5Lcreate_soft) and query
API (H5Lget_linkinfo instead of H5Gget_objinfo).
All old APIs are still supported in H5Gdeprec.c .
Platforms tested:
sol, mir, copper
Misc. update:
Forgot to update MANIFEST and release docs. Will do after checkin.
New feature
Description:
Check in Peter's code to add support for "shallow copy", "create
intermediate groups", "no attributes" and "expand soft links" support.
Platforms tested:
FreeBSD 4.11 (sleipnir)
Linux 2.4 (chicago) w/ & w/o group-revision enabled
h5committest
"Hide" file format changes (for now)
Description:
Add ifdef's (controlled by the --enable-group-revision configure flag)
to disable group revision changes to the file format, in order to allow alpha
release to go ahead without releasing an unsupported version into the wild.
Platforms tested:
FreeBSD 4.11 (sleipnir)
Linux 2.4 32-bit (heping)
Linux 2.4 64-bit (mir)
Solaris 2.9 (shanti)
New/expanded features
Description:
Check in Peter's changed for the object copy code:
- Allow/fix copying datasets using named variable-length datatypes
- Start adding framework for property list to control how object
copying occurs.
Platforms tested:
FreeBSD 4.11 (sleipnir)
Too minor to require h5committest
Bug fix
Description:
Retrieving an object's name could fail (in various ways) under certain
circumstances (mostly having to do with mounted files).
Solution:
Re-write & simplify "get object name" code to fix error in a better way
than adding yet another hack to the previous pile of hacks... :-)
Platforms tested:
FreeBSD 4.11 (sleipnir)
h5committest
More tests
Description:
Add more tests for proper behavior of groups with different group
creation property settings.
Platforms tested:
FreeBSD 4.11 (sleipnir)
Linux 2.4
Solaris 2.8
New feature
Description:
Check in baseline for compact group revisions, which radically revises the
source code for managing groups and object headers.
WARNING!!!! WARNING!!!! WARNING!!!! WARNING!!!! WARNING!!!! WARNING!!!!
WARNING!!!! WARNING!!!! WARNING!!!! WARNING!!!! WARNING!!!! WARNING!!!!
This initiates the "unstable" phase of the 1.7.x branch, leading up
to the 1.8.0 release. Please test this code, but do _NOT_ keep files created
with it - the format will change again before the release and you will not
be able to read your old files!!!
WARNING!!!! WARNING!!!! WARNING!!!! WARNING!!!! WARNING!!!! WARNING!!!!
WARNING!!!! WARNING!!!! WARNING!!!! WARNING!!!! WARNING!!!! WARNING!!!!
Solution:
There's too many changes to really describe them all, but some of them
include:
- Stop abusing the H5G_entry_t structure and split it into two separate
structures for non-symbol table node use within the library: H5O_loc_t
for object locations in a file and H5G_name_t to store the path to
an opened object. H5G_entry_t is now only used for storing symbol
table entries on disk.
- Retire H5G_namei() in favor of a more general mechanism for traversing
group paths and issuing callbacks on objects located. This gets us out
of the business of hacking H5G_namei() for new features, generally.
- Revised H5O* routines to take a H5O_loc_t instead of H5G_entry_t
- Lots more...
Platforms tested:
h5committested and maybe another dozen configurations.... :-)
New feature
Description:
Add in baseline "object copy" code from Peter [in the form of a new API
routine: H5Gcopy()]. There's still some work to do (like handling variable-
length datatypes and possibly support for references) and it hasn't been tested
on mounted files yet, but the core functionality is there and working
correctly.
I've also got a set of patches to update the 1.6 branch with tweaks to
keep the branches mostly in sync, but Elena will kill me if I import them
before the 1.6.5 release is out... :-)
Platforms tested:
FreeBSD 4.11 (sleipnir)
h5committested
Code cleanup/reorganization
Description:
Merge back some more changes extracted from the "compact group" set.
This bunch cleans up and prepares the H5G_* routines for eventual import of
new features.
Platforms tested:
FreeBSD 4.11 (sleipnir)
Linux 2.4
Mac OS X.4
Code cleanup
Description:
Clean up internals of group creation & iteration code.
Platforms tested:
FreeBSD 4.11 (sleipnir)
Mac OS X (nile)
Too minor to require h5committest
Code cleanup
Description:
Trim trailing whitespace, which is making 'diff'ing the two branches
difficult.
Solution:
Ran this script in each directory:
foreach f (*.[ch] *.cpp)
sed 's/[[:blank:]]*$//' $f > sed.out && mv sed.out $f
end
Platforms tested:
FreeBSD 4.11 (sleipnir)
Too minor to require h5committest
Big fix
Description:
A group opened by dereferencing a object reference would not work for
H5Giterate(), due to the local heap & B-tree information not being cached.
Solution:
Get the local heap & B-tree info & point to that structure instead of
the group entry for the group.
Platforms tested:
FreeBSD 4.11 (sleipnir)
Too minor to require h5committest
Bug fix
Description:
Correct problems when querying information about a group that was opened
by dereferencing an object reference.
Solution:
Read in symbol table information instead of rely on it being cached.
Platforms tested:
FreeBSD 4.11 (sleipnir)
Too minor to require h5committest
Bug fix
Description:
Rewrite code for mounting files to clean up layers of kludges and implement
a much cleaner and more maintainable design.
Platforms tested:
FreeBSD 4.11 (sleipnir)
Linux 2.4
New feature
Description:
Add group creation & access property lists, dataset access property lists
and named datatype creation & access property lists. Currently have
<foo>_extend() API names, which will need to be changed for the final release.
Platforms tested:
FreeBSD 4.11 (sleipnir)
Linux 2.4 (heping)
Bug Fix/Code Cleanup/Doc Cleanup/Optimization/Branch Sync :-)
Description:
Generally speaking, this is the "signed->unsigned" change to selections.
However, in the process of merging code back, things got stickier and stickier
until I ended up doing a big "sync the two branches up" operation. So... I
brought back all the "infrastructure" fixes from the development branch to the
release branch (which I think were actually making some improvement in
performance) as well as fixed several bugs which had been fixed in one branch,
but not the other.
I've also tagged the repository before making this checkin with the label
"before_signed_unsigned_changes".
Platforms tested:
FreeBSD 4.10 (sleipnir) w/parallel & fphdf5
FreeBSD 4.10 (sleipnir) w/threadsafe
FreeBSD 4.10 (sleipnir) w/backward compatibility
Solaris 2.7 (arabica) w/"purify options"
Solaris 2.8 (sol) w/FORTRAN & C++
AIX 5.x (copper) w/parallel & FORTRAN
IRIX64 6.5 (modi4) w/FORTRAN
Linux 2.4 (heping) w/FORTRAN & C++
Misc. update:
Purpose:
Feature
Description:
Datatypes and groups now use H5FO "file object" code that was previously
only used by datasets. These objects will hold a file open if the file
is closed but they have not yet been closed. If these objects are unlinked
then relinked, they will not be destroyed. If they are opened twice (even
by two different names), both IDs will "see" changes made to the object
using the other ID.
When an object is opened using two different names (e.g., if a dataset was
opened under one name, then mounted and opened under its new name), calling
H5Iget_name() on a given hid_t will return the name used to open that hid_t,
not the current name of the object (this is a feature, and a change from the
previous behavior of datasets).
Solution:
Used H5FO code that was already in place for datasets. Broke H5D_t's, H5T_t's,
and H5G_t's into a "shared" struct and a private struct. The shared structs
(H5D_shared_t, etc.) hold the object's information and are used by all IDs
that point to a given object in the file. The private structs are pointed
to by the hid_t and contain the object's group entry information (including its
name) and a pointer to the shared struct for that object.
This changed the naming of structs throughout the library (e.g., datatype->size
is now datatype->shared->size). I added an updated H5Tinit.c to windows.zip.
Platforms tested:
Visual Studio 7, sleipnir, arabica, verbena
Misc. update:
Bug fix
Description:
H5Gget_num_objs, H5Gget_objname_by_idx and H5Gget_objtype_by_idx were
only accepting a group ID, instead of a location ID, as our documentation for
them stated.
Solution:
Allow them to accept a location ID.
Platforms tested:
FreeBSD 4.8 (sleipnir)
h5committest
Code cleanup
Description:
Remove "UNUSED" keyword from function prototypes in header files.
Platforms tested:
IA64 Linux cluster (titan)
too small to need commit test
Bug fix (backward compatibility)
Description:
Changes we've made during development of the 1.5.x branch had broken the
feature of allowing user's callbacks to H5Giterate to return a value
through the library back to the application which called H5Giterate.
Solution:
Correctly pass along iterator callback return value and adjust internal
library code to conform to this design.
Platforms tested:
FreeBSD 4.8 (sleipnir)
h5committest
Code cleanup
Description:
Limit the scope on more function prototypes/macros/typedefs.
Platforms tested:
FreeBSD 4.8 (sleipnir)
h5committest not necessary.
Purpose: A little code rewriting
Description: object types were defined as macros in H5Gpublic.h
Solution: changed them to enumerate type
Platforms tested: h5committtest
Code cleanup
Description:
Stop using hard-coded constant for the initial size of the root group's
symbol table local heap and use macro.
Also, changed existing (unused) macro for the initial heap size to be the
same as the hard-coded constant.
Platforms tested:
FreeBSD 4.8 (sleipnir) w/C++
Linux 2.4 (burrwhite) w/FORTRAN
Solaris 2.7 (arabica) w/FORTRAN
IRIX64 6.5 (modi4) w/parallel & FORTRAN
(h5committest not run due to my ongoing difficulties with C++ on burrwhite).
Bug Fix
Description:
Metadata cache in parallel I/O can cause hangs in applications which
perform independent I/O on chunked datasets, because the metadata cache
can attempt to flush out dirty metadata from only a single process, instead
of collectively from all processes.
Solution:
Pass a dataset transfer property list down from every API function which
could possibly trigger metadata I/O.
Then, split the metadata cache into two sets of entries to allow dirty
metadata to be set aside when a hash table collision occurs during
independent I/O.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
FreeBSD 4.7 (sleipnir) serial & parallel
Misc. update:
Updated release_docs/RELEASE
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
Purpose:
__DLL__ is a keyword in some platforms and __DLL__ is also defined as a macro for windows DLL applications.
That causes problems.
Description:
Solution:
Use H5_DLL*** to replace __DLL***__ at all header files.
Change the macro defination at H5api_adpt.h.
Platforms tested:
linux2.2.18smp, irix64, solaris 2.7 and windows 2000
Code cleanup
Description:
Clean up the H5B_iterate code to not have a single hard-wired iterator for
each interface which uses a B-tree, instead accept a function pointer which
determines the callback function. This allows additional iterator
callbacks to be defined without requiring additional H5B functions to be
created.
In that spirit, remove the H5B_prune_by_extent call and convert the
H5F_istore callback routine for it into a callback routine for H5B_iterate.
Platforms tested:
FreeBSD 4.5 (sleipnir)
Code cleanup.
Description:
Fix a few compiler warnings from the file creation property list -> generic
property list conversion. Also change a hard-wired value (8) for the
number of B-tree key values to a value that uses the enum's generated by
the compiler.
Platforms tested:
FreeBSD 4.4 (hawkwind)
Code cleanup (sorta)
Description:
When the first versions of the HDF5 library were designed, I remembered
vividly the difficulties of porting code from a 32-bit platform to a 16-bit
platform and asked that people use intn & uintn instead of int & unsigned
int, respectively. However, in hindsight, this was overkill and
unnecessary since we weren't going to be porting the HDF5 library to
16-bit architectures.
Currently, the extra uintn & intn typedefs are causing problems for users
who'd like to include both the HDF5 and HDF4 header files in one source
module (like Kent's h4toh5 library).
Solution:
Changed the uintn & intn's to unsigned and int's respectively.
Platforms tested:
FreeBSD 4.4 (hawkwind)
Update
Description:
Changed
#include <hdf_file.h>
construct to
#include "hdf_file.h"
so that the GNU compiler can more easily pick up the dependencies
which it places in the .depend and Dependencies files. Also
regenerated the Dependencies to go along with this.
Platforms tested:
Linux
cached instead of in separate structures. This reduces the amount of memory
the hash table uses by about half. This is the initial step along the path of
speeding up the metadata caching.