New feature/enhancement
Description:
Chunked datasets are handled poorly in several circumstances involving
certain selections and chunks that are too large for the chunk cache and/or
chunks with filters, causing the chunk to be read from disk multiple times.
Solution:
Rearrange raw data I/O infrastructure to handle chunked datasets in a much
more friendly way by creating a selection in memory and on disk for each chunk
in a chunked dataset and performing all of the I/O on that chunk at one time.
There are still some scalability (the current code attempts to
create a selection for all the chunks in the dataset, instead of just the
chunks that are accessed, requiring portions of the istore.c and fillval.c
tests to be commented out) and performance issues, but checking this in will
allow the changes to be tested by a much wider audience while I address the
remaining issues.
Platforms tested:
h5committested, FreeBSD 4.8 (sleipnir) serial & parallel, Linux 2.4 (eirene)
Purpose: Cray T3E maintenance with Raymond's help
Description: fillval test failed for compact dataset since the
size of the dataset was bigger than 64K.
Solution: Reduced the dataspace of the compact dataset to 1024 elements.
Platforms tested: T3E; it was also tested with semi-manual h5committest.
(I had to built and test manually on modi4 parallel because
of some weird failure of h5committest on modi4)
Misc. update:
Update
Description:
Updated the Copyright statement
Platforms tested:
Linux (This change is only in the comments, so I just check that the
modules still compile)
Misc. update:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
Improvement
Description:
fillval will fail to find the pre-exist data file if it is run
in --srcdir mode without setting $srcdir properly. This is
setup properly in the Makefile but unsuspecting users trying just
./fillval were puzzled by the failure.
Solution:
put in a more descriptive error message with a possible remedy.
Platforms tested:
Modi4 only since this is just adding a printf statment.
API name change
Description:
Change all "space time" references to "alloc time", including API functions
and macro definitions, etc.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/C++
Solaris 2.7 (arabica) w/FORTRAN
IRIX64 6.5 (modi4) w/parallel & FORTRAN
Code cleanup/More tests
Description:
Cleaned up some compiler warnings and wrote additional tests for space
allocation and storage size routines.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/serial & parallel. Will be testing on IRIX64
6.5 (modi4) in serial & parallel shortly.
Purpose:
Design for compact dataset
Description:
Compact dataset is stored in the header message for dataset layout.
Platforms tested:
arabica, eirene.
Purpose:
Bug fix.
Description:
This test fails on TRUE64 system because a compound variable(fill_ctype
in test_rdwr) wasn't initialized.
Solution:
Initialize to zero.
Platforms tested:
Pittsburg's True64(lemieux) system.
Purpose:
Bug Fix
Description:
Reading fill_old.h5 from fillval.c has problem to find from building
directory.
Solution:
prepend source directory into file name.
Platforms tested:
Linux 2.2
Purpose:
New feature
Description:
Fill-value's behaviors for contiguous dataset have been redefined.
Basicly, dataset won't allocate space until it's necessary. Full details
are available at http://hdf.ncsa.uiuc.edu/RFC/Fill_Value, at this moment.
These two file test backward compatibility with 1.4.
Platforms tested:
Linux 2.2.
Code cleanup
Description:
Windows is generating hundreds of warnings from some of the practices in
the library. Mostly, they are because size_t is 32-bit and hsize_t is
64-bit on Windows and we were carelessly casting the larger values down to
the smaller ones without checking for overflow.
Also, some other small code cleanups,etc.
Solution:
Re-worked some algorithms to eliminate the casts and also added more
overflow checking for assignments and function parameters which needed
casts.
Kent did most of the work, I just went over his changes and fit them into
the the library code a bit better.
Platforms tested:
FreeBSD 4.4 (hawkwind)
Update
Description:
Changed includes of the form:
#include <hdf5_file.h>
to
#include "hdf5_file.h"
so that gcc can pick them up easier without including the system
header files since we don't care about them.
Platforms tested:
Linux
Clean up warnings
Description:
The "FAILED" macro is defined by Windows and is causing warnings and
potential errors when compiled on that platform.
Solution:
Change our macro from FAILED to H5_FAILED.
Platforms tested:
FreeBSD 4.2 (hawkwind)
Code cleanup.
Description:
Fixed _lots_ (I mean _tons_) of warnings spit out by the gcc with the
extra warnings. Including a few show-stoppers for compression on IRIX
machines.
Solution:
Changed lots of variables' types to more sensible and consistent types,
more range-checking, more variable typecasts, etc.
Platforms tested:
FreeBSD 4.2 (hawkwind), IRIX64-64 (modi4)
The "FILENAME" declared extern in h5test.h is not always used.
It was used in h5_cleanup to remove temporary files created
during tests. Not all tests codes have used this routine.
Indeed, quite a few of test programs do "#define FILENAME ".
Also, h5_cleanup needs to work in tandem with h5_fixname.
h5_fixname accepts an explicite base_name argument instead
of using the global variable FILENAME. That is cleaner.
Solution:
Added char *base_name[] as a new argument to h5_cleanup, in
the same style as h5_fixname. Removed "extern char *FILENAME..."
from use. Also, undo some unnecessary declaration of "char *FILENAME"
from some tests which don't use it at all (yet).
Platforms tested:
modi4-64(irix64), arabica(solari2.7), eirene(linux)
(arabica could not launch tests automatically. I had to hack
in LD_LIBRARY_PATH to make them run.)
Changes since 19990614
----------------------
./src/H5D.c
Changed the way the plist_id argument of H5Dvlen_reclaim() is
checked so that it's more specific and works when debugging is
turned off.
./src/H5TB.c
Removed an unused local variable.
./test/fillval.c
./test/h5test.c
./test/h5test.h
Changed `basename' variables to `base_name' to prevent a
warning about a global with the same name.
./tools/h5ls.c
Changed `indent' variables to `ind' to prevent a warning about
a global with the same name.
./tools/h5toh4.c
Commented out declarations for things that normally appear in
system header files since our definitions might be
incompatible with the system and prevent h5toh4 from
compiling. If all looks good on other systems then we can
permanently remove these declarations...
----------------------
./test/dtypes.c
./test/enum.c [NEW]
Added support for enumeration data types.
./test/fillval.c
./test/istore.c
Fixed memory leaks during error handling.
----------------------
./src/H5T.c
Fixed a typo in the registration of the `unsigned char' to
`unsigned long long' type conversion that caused it to not be
registered, falling back to software whenever that conversion
path was taken.
./MANIFEST
./test/Makefile.in
./test/testhdf5.c
./test/testhdf5.h
./test/theap.c [REMOVED]
./test/lheap.c [NEW]
./test/tohdr.c [REMOVED]
./test/ohdr.c [NEW]
./test/tstab.c [REMOVED]
./test/stab.c [NEW]
Removed the `t' from the front of these names and made each
test a stand-alone program following the format of most of the
other tests.
./test/big.c
Uses libh5test.a but always sets the low-level driver to 1GB
file family.
The `#if' near the top to set the data space to 8GB has been
simplified now that `long_long' is always defined and the
error message is improved when `long_long' isn't wide enough.
Cleanup code was added to the error handling.
./test/gheap.c
./test/istore.c
Uses libh5test.a. Added error cleanup code.
./test/dtypes.c
./test/h5test.c
Added 68 new tests that check hardware and software
conversions between `long long' and `unsigned long long' and
the other integer types. The tests only run on machines where
sizeof(long_long)!=sizeof(long). We test a total of 180
different integer conversions, half in hardware and half in
software.
Cut down the number of times each test is run from 5 to 1 so
it doesn't take so long. If you want to run more times
there's a constant that can be changed at the top of the file.
./test/extend.c
Removed unused variable.
./test/h5test.c
./test/h5test.h
./test/external.c
./test/fillval.c
The h5_cleanup() returns true/false so it can be used in an `if'
statement to clean up additional files.
./doc/html/Environment.html
Indented. Added HDF5_PREFIX and HDF5_DRIVER descriptions.
./src/H5P.c
Changed the trace type for the second argument from `Iu' to
`x' since it's an output parameter.
./INSTALL
Added a warning that the GNU zlib that comes with the latest
version of HDF4 is too old to use with HDF5 and must be
renamed so configure doesn't see it when `--enable-hdf4' is
used.
----------------------
./MANIFEST
./test/Makefile.in
./test/shtype.c [REMOVED]
Removed shtype.c because it was all commented out. Besides,
these tests are done in dtypes.c now anyway.
./test/external.c
./test/fillval.c
./test/flush1.c
./test/flush2.c
./test/links.c
./test/mount.c
./test/mtime.c
./test/unlink.c
The tests that check the HDF5 API use the h5test support
functions. For one thing, that means that you can specify the
file driver that thay use by the HDF5_DRIVER environment
variable. Possible values are:
HDF5_DRIVER='sec2' Use read() and write()
HDF5_DRIVER='stdio' Use fread() and fwrite()
HDF5_DRIVER='core' Use malloc() and free()
HDF5_DRIVER='split' Split meta and raw data
HDF5_DRIVER='family N' Use file families with each
member being N megabytes (N
can be fractional, defaults to
one).
Some tests might fail for certain drivers: for instance, the
mount and link tests fail for the `core' driver because
they must be able to close and then reopen a file.
----------------------
./test/flush2.c
./test/overhead.c
Removed carriage-returns inserted by a broken operating system.
./test/big.c
./test/mtime.c
./test/ragged.c
./tools/h5ls.c
Removed inclusion of <H5config.h>, system header files, and definition
of __unused__ since this all happens in <H5private.h>.
./test/chunk.c
./test/cmpd_dset.c
./test/dsets.c
./test/dtypes.c
./test/extend.c
./test/external.c
./test/fillval.c
./test/flush1.c
./test/flush2.c
./test/iopipe.c
./test/links.c
./test/mount.c
./test/overhead.c
./test/shtype.c
./test/unlink.c
./tools/h5import.c
./tools/h5repart.c
Removed inclusion of <H5config.h> since <hdf5.h> includes it.
./test/flush1.c
Includes <stdlib.h>, protects inclusion of <unistd.h> by using
HAVE_UNISTD_H instead of STDC_HEADERS.
----------------------
./MANIFEST
Added new Pablo files HDF5record_RT.h and ProcIDs.h
./acconfig.h
./configure [REGENERATED]
./configure.in
./src/H5.c
./src/H5Vprivate.h
./src/H5config.h.in [REGENERATED]
./src/H5private.h
./src/H5public.h
./test/big.c
Added more configuration stuff for the Win32 environment. Removed all
the #ifdef WIN32 from the source and replaced them with OS-independent
stuff. Specifics follow:
Check for non-Posix.1 `st_blocks' field in `struct stat' which is used
by the big file test to decide if the file system supports holes. If
the st_blocks field isn't present then we just skip the test.
Configure checks for <io.h> <sys/resource.h> <sys/time.h> and
<winsock.h> and defines HAVE_IO_H, HAVE_SYS_RESOURCE_H,
HAVE_SYS_TIME_H and HAVE_WINSOCK_H when they're found.
Configure checks whether both <sys/time.h> and <time.h> can be
included and defines SYS_TIME_WITH_TIME if so. Otherwise include only
<sys/time.h> or <time.h> even if both exist.
Configure checks sizeof(__int64) and defines SIZEOF___INT64 to the
result or to zero if __int64 isn't defined. The source uses `long
long' in preference to `__int64'.
Removed null WIN32 definition for `inline' since such a definition
already exists in H5config.h
Protected gettimeofday() calls in debugging code with
HAVE_GETTIMEOFDAY instead of WIN32.
./src/H5F.c
./src/H5Flow.c
./src/H5Fmpio.c
./src/H5Fsec2.c
./src/H5Fstdio.h
./src/H5P.c
./src/H5Tconv.c
./src/H5private.h
Removed #include of system files from library source files and
consolodated them into H5private.h where they're protected by various
configuration macros (most of them were duplicated there already
anyway).
./test/big.c
./test/chunk.c
./test/cmpd_dset.c
./test/dsets.c
./test/dtypes.c
./test/extend.c
./test/external.c
./test/fillval.c
./test/flush1.c
./test/flush2.c
./test/iopipe.c
./test/links.c
./test/mount.c
./test/mtime.c
./test/overhead.c
./test/ragged.c
./test/shtype.c
./test/unlink.c
Protected system #include's with #ifdef's from H5config.h.
Undefined NDEBUG since some of the tests rely on assert() to check
return values.
Removed WIN32 definitions for __unused__ since this can be controlled
by the definition of HAVE_ATTRIBUTE in H5config.h
./test/testhdf5.h
Removed the CLEAN_CMD definition because we no longer use it.
Albert's cleanup() functions replaced it.
./test/fillval.c
Initialized auto hid_t variables to fix warnings in error recovery
code when data flow analysis is turned on in compilers.
./test/h5tools.c
Initialized an auto variable to fix a compiler warning.
./test/chunk.c
./test/ragged.c
The WIN32 had some unsigned variables changed to signed because the
compiler generates warnings when coercing unsigned to double(?). I
changed them back to unsigned because they really are unsigned
quantities. If this the change was just to shut up extraneous warnings
then perhaps a compiler flag can do the same; otherwise if the
compiler generates bad code then we should supply a patch file instead
messing up source code with bug work-arounds.
./src/H5detect.c
Protected system #include's with #ifdef's from H5config.h thereby
removing a WIN32.
If getpwuid() doesn't exist (HAVE_GETPWUID) then we assume that
`struct passwd' doesn't exist either (we don't really need it in that
case).
The H5T_NATIVE_LLONG and H5T_NATIVE_ULLONG are defined in terms of
`long long' or else `__int64' or else `long' depending on what's
available.
./src/H5Flow.c
./src/H5Ofill.c
Added __unused__ to some function arguments that aren't used when
assertions are turned off.
./src/H5V.c
Changed an auto variable name in some hand-inlined code to get rid of
a warning about the variable shadowing a previous auto.
----------------------
./doc/html/H5.format.html
./src/H5HG.c
Fixed a bug in the global heap that caused H5HG_read() to
write past the end of the buffer in certain cases.
./test/big.c
The test is skipped if hdf5 was configured with
`--disable-hsizet'.
./src/H5Ofill.c
Data type conversions are implemented for the fill value.
./src/H5.c
Tracing prints one of H5P_FILE_CREATE, H5P_FILE_ACCESS,
H5P_DATASET_CREATE, H5P_DATASET_XFER, or H5P_MOUNT instead of
the more cryptic H5I_TEMPLATE_* constants.
./src/H5D.c
Removed prototype for H5D_find_name().
./src/H5I.c
The GROUP_MASK and ID_MASK are both calculated from GROUP_BITS
instead of being set by hand.
We don't use the sign bit of hid_t; all valid hid_t values are
positive so we can say things like `if ((file=H5Fopen(...))<0)'.
Changed `(int)pow(2.0,x)' to `1<<x' so we don't have to worry
about rounding.
Fixed H5I_get_type() so it doesn't always fail an assertion.
./src/H5E.c
./src/H5Epublic.h
Added minor error H5E_MOUNT
./src/H5F.c
./src/H5Fprivate.h
Added H5Fmount() and H5Funmount(). Mounting and unmounting
works as documented but some of the other things aren't
implemented yet, the biggest being current working groups
always acting on the root of the mount tree, and H5Fclose()
closing the entire tree. The rest of the stuff will be added
shortly...
./src/H5P.c
./src/H5Ppublic.h
Added the H5P_MOUNT property list but haven't implemented any
particular properties for it yet.
./src/H5Gstab.c
Hard links across files return an error instead of failing an
assertion.
----------------------
./src/H5D.c
Fill values are working for contiguous datasets now except
there are two things that need more support from the data
space layer, specifically the ability to form a selection from
the difference of two selections. They are (1) extending an
external contiguous dataset, (2) optimization by delaying the
fill until after the first H5Dwrite().
Renamed H5D_allocate() to H5D_init_storage() since allocation
is only part of the story. Added a data space argument so it
doesn't have to query the space from the object header -- the
space is always available in the caller anyway.
Removed `#ifdef HAVE_PARALLEL' from a few places where it
wasn't necessary. We don't need it around code that doesn't
compile anything from mpi.h or mpio.h.
./src/H5Fistore.c
Uncommented H5F_istore_alloc() for non-parallel and moved the
`#ifdef HAVE_PARALLEL' just around Kim's barrier.
./src/H5Fmpio.c
Wrapped a couple long lines.
Got rid of two signed vs. unsigned comparison warnings.
./MANIFEST
./test/Makefile.in
./test/fillval.c [NEW]
Added tests for fill values. The contiguous dataset extend
test is disabled until H5S_SELECT_DIFF is implemented.
./tools/Makefile.in
Fixed a bug where `make test' didn't build the executables
first. This should cause the snapshots to start up again.
./Makefile.in
Changed to build in `test' directory before `tools'
directory. We want the library tests to pass before we even
start considering the tools. You can still build and/or test
the tools independent of the library tests passing.